Shape-Based Coarse Graining
Shape-Based Coarse Graining: an Approach to Model Large Macromolecular Assemblies
The shape-based CG method is designed to model the large-scale motions of macromolecular assemblies, representing unit molecules that the studied systems consist of by as few point-like particles as possible. Biomolecules, and proteins in particular, have a variety of shapes, often a single protein is comprised of both compact domains and disordered and elongated linkers or tails, the compact regions and tails often being equally important. To our knowledge, all existing CG methods assign the CG beads to represent a fixed group of atoms or a fixed number of atoms, but this is not efficient for the coarse-graining of molecules with complex shapes, because with such an approach either the tails are misrepresented, or too many CG beads are used for the compact domains. Efficient topology conserving algorithms have been developed for neural computations, and we take advantage of one of them, the topology representing network (Martinetz and Schulten, 1994).
A unit molecule (for example, a protein chain) is represented by a number N of CG beads that are considered to be point masses. A self-organizing neural networking algorithm is used to distribute the beads so that they optimally represent the shape of the all-atom protein. The CG beads are considered as nodes of the neural network and S adaptation steps are performed. At each step the program carries out the following procedures. First, the n-th atom of the full structure is chosen randomly; coordinates of the atom, r, are used as an input pattern to adapt the neural network. Second, for each CG bead i (i = 1,2,...,N), one determines the number Ki of CG beads that obey the condition |r - Rj| < |r - Ri|, where Rj is the position vector of the j-th CG bead. Third, position of each CG bead is updated according to the rule
Ri(new) = Ri(old) + eps exp{-Ki/lambda}(v - Ri(old)),
with eps and lambda being adapted at each step, according to the functional form f(s) = f(0)(f(S)/f(0))s/S, where s is the current step and lambda(0) = 0.2N, lambda(S) = 0.01, eps(0) = 0.3, and eps(S) = 0.05. We use S=200N, which is usually enough to obtain a well-converged distribution of CG beads. A CG model built for a single molecule is repeatedly mapped on the other copies of the same molecule.
Once the CG beads are placed, an all-atom domain, the so-called Voronoi cell, is determined for each bead: this cell entails the set of atoms such that every atom from this set is closer to the considered bead than to any other bead. The total mass and charge of each domain is then assigned to the corresponding CG particle. Neighboring beads are connected by harmonic springs. Separate protein chains are not connected by bonds, instead, they interact through non-bonded forces, modeled by Lennard-Jones and Coulomb potentials.
Interaction potentials are parameterized using all-atom simulations, using the Boltzmann inversion. In regard to non-bonded interactions, a CG bead should represent an object approximately the size of the group of atoms (Voronoi cell) that the bead stands for. Therefore, the Lennard-Jones radius for each CG bead is calculated as the radius of gyration of the respective all-atom domain, perhaps extended by some value, depending on the application.
The dynamics of the CG system is described through the Langevin equation
dp/dt = F - g p + h G(t),
where p is the momentum of a bead, F is the force acting on the bead due to the chosen interaction potentials, g is a damping constant, G(t) is a univariate Gaussian random process, and h is connected to g through the dissipation-fluctuation theorem, h2 = 2 g kT/m, with m being the bead's mass, k the Boltzmann constant, and T the temperature. Use of the Langevin equation allows one to introduce a simple implicit solvent model, where the viscous damping and random forces from the solvent are described by the second and third terms on the right-hand side of the Langevin equation. For a single bead without force, this equation describes free diffusion. All-atom molecular dynamics simulations of the drift of protein segments, similar in size to the ones represented by the CG beads, in explicit water, can be used to infer the effective value of g.
Computational speed-up.
Simulations using the shape-based CG model can be performed employing the parallel molecular dynamics program NAMD (Phillips et al., 2005), or other molecular dynamics packages. The scaling of the performance, when increasing the number of parallel processors used for CG simulations, was found to be the same as for an average all-atom simulation with NAMD. With the dramatic reduction in the number of particles and with the simple solvent model used, the shape-based CG model achieves 200-500-fold increase in the integration time step (depending on the chosen level of coarse-graining), allowing one to reach time scales of tens of microseconds and beyond.
