A CHARMM forcefield topology file contains all of the information needed to convert a list of residue names into a complete PSF structure file. It also contains internal coordinates that allow the automatic assignment of coordinates to hydrogens and other atoms missing from a crystal PDB file.
The current versions of the CHARMM forcefield are CHARMM22 for proteins and
CHARMM27 for lipids and nucleic acids.
The individual topology files are named, respectively,
To enable hybrid systems, combinations are also provided, named
While the tools used with NAMD allow multiple topology and parameter files
to be used simultaneously, it is preferable to use these pre-combined files.
The CHARMM topology files are available for download from the MacKerell web site
We will examine
top_all27_prot_lipid.inp; the other files are similar.
At the beginning of the file is the header, indicated by lines beginnning
with *s, followed in this case by "27 1" to indicate the version of CHARMM
that generated the file:
*>>>>>> Combined CHARMM All-Hydrogen Topology File for <<<<<<<<< *>>>>>>>>> CHARMM22 Proteins and CHARMM27 Lipids <<<<<<<<<< *from *>>>>>>>>CHARMM22 All-Hydrogen Topology File for Proteins <<<<<< *>>>>>>>>>>>>>>>>>>>>> August 1999 <<<<<<<<<<<<<<<<<<<<<<<<<<<<< *>>>>>>> Direct comments to Alexander D. MacKerell Jr. <<<<<<<<< *>>>>>> 410-706-7442 or email: alex,mmiris.ab.umd.edu <<<<<<<<< *and * \\\\\\\ CHARMM27 All-Hydrogen Lipid Topology File /////// * \\\\\\\\\\\\\\\\\\ Developmental ///////////////////////// * Alexander D. MacKerell Jr. * August 1999 * All comments to ADM jr. email: alex,mmiris.ab.umd.edu * telephone: 410-706-7442 * 27 1
Comments in the topology file are indicated by ! anywhere on a line. Important usage information is often contained in these comments, so it is always a good idea to inspect the topology file in a text editor when using it to build a structure. The next part of the file is a long set of comments containing references to the papers and other sources of the parameters in the file:
! ! references ! !PROTEINS ! !MacKerell, Jr., A. D.; Bashford, D.; Bellott, M.; Dunbrack Jr., R.L.; !Evanseck, J.D.; Field, M.J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; !Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F.T.K.; Mattos, !C.; Michnick, S.; Ngo, T.; Nguyen, D.T.; Prodhom, B.; Reiher, III, !W.E.; Roux, B.; Schlenkrich, M.; Smith, J.C.; Stote, R.; Straub, J.; !Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-atom !empirical potential for molecular modeling and dynamics Studies of !proteins. Journal of Physical Chemistry B, 1998, 102, 3586-3616. !
The topology file must define the type, mass, and charge of every atom in every residue, so that a PSF file can be constructed. While the partial charges assigned to atoms of the same type vary between residues, their masses do not. Therefore, the mass of every atom type is declared once at the beginning of the file in a MASS statement. This statement also pairs an integer with each type name, which is used in CHARMM formatted PSF files, but not in the X-PLOR formatted PSF files used by NAMD. The type indices are unique but not necessarily consecutive. Notice in the following except that there are many types of hydrogen and carbon atoms defined, but the atomic masses are the same:
MASS 1 H 1.00800 H ! polar H MASS 2 HC 1.00800 H ! N-ter H MASS 3 HA 1.00800 H ! nonpolar H MASS 4 HT 1.00800 H ! TIPS3P WATER HYDROGEN MASS 5 HP 1.00800 H ! aromatic H MASS 6 HB 1.00800 H ! backbone H MASS 7 HR1 1.00800 H ! his he1, (+) his HG,HD2 MASS 8 HR2 1.00800 H ! (+) his HE1 MASS 9 HR3 1.00800 H ! neutral his HG, HD2 MASS 10 HS 1.00800 H ! thiol hydrogen MASS 11 HE1 1.00800 H ! for alkene; RHC=CR MASS 12 HE2 1.00800 H ! for alkene; H2C=CR MASS 20 C 12.01100 C ! carbonyl C, peptide backbone MASS 21 CA 12.01100 C ! aromatic C MASS 22 CT1 12.01100 C ! aliphatic sp3 C for CH MASS 23 CT2 12.01100 C ! aliphatic sp3 C for CH2 MASS 24 CT3 12.01100 C ! aliphatic sp3 C for CH3
When specifying the connectivity of a chain of residues in a protein, it is necessary to refer to atoms in the previous or succeeding residue. The CHARMM topology file declares those atom types that will be referenced in adjoining residues as such:
DECL -CA DECL -C DECL -O DECL +N DECL +HN DECL +CA
The first and last residues of a chain obviously have different connectivity from those in the center, since they have one fewer neighbor. This is handled by applying patch residues, normally referred to as patches, to the terminal residues. As will be seen, any residue definition can specify the patch to be applied when it is the first or last residue in a segment. However, a default set is declared for the entire file, as in the following where the default patch is NTER for the first residue of a segment and CTER for the last:
DEFA FIRS NTER LAST CTER
While the covalent bond connectivity between atoms must necessarily be provided by the topology file, enumerating all of the required angles and dihedrals would be tedious and error-prone, as well as enormously complicated since every combination of residues joined by a peptide bond would require a different set. Therefore, angles and dihedrals are automatically generated for every pair or triple of connected bonds when a segment is built. This autogeneration may be enabled or disabled on a per-segment basis as it should never be used on segments of water, but the default is defined in the topology file:
AUTO ANGLES DIHE
We are now ready for the actual residue definitions, beginning with alanine, as shown below. A residue is indicated by the RESI statement with the residue name (ALA) and total charge (0.00). Next are listed all of the atoms in the residue in ATOM statements with the atom name (N, HN, CA), type (NH1, H, CT1), and partial charge (-0.47, 0.31, 0.07). The GROUP statements, dividing the atoms into integer-charge groups, are not used by NAMD and should not be confused with the em hydrogen groups, each a non-hydrogen atom and all hydrogens bonded to it, that NAMD uses to accelerate distance-testing for nonbonded calculations.
RESI ALA 0.00 GROUP ATOM N NH1 -0.47 ! | ATOM HN H 0.31 ! HN-N ATOM CA CT1 0.07 ! | HB1 ATOM HA HB 0.09 ! | / GROUP ! HA-CA--CB-HB2 ATOM CB CT3 -0.27 ! | \ ATOM HB1 HA 0.09 ! | HB3 ATOM HB2 HA 0.09 ! O=C ATOM HB3 HA 0.09 ! | GROUP ! ATOM C C 0.51 ATOM O O -0.51
The ALA residue continues by defining connectivity, with each BOND statement followed by a list of pairs of atoms to be connected with bonds. The DOUBLE statement is a synonym for BOND and does not affect the resulting PSF file. Observe that the atom C is bonded to +N, the N of the following residue. A bond between N and -C will be provided by the preceeding residue. The order of bonds, or of the atoms within a bond, is not significant.
BOND CB CA N HN N CA BOND C CA C +N CA HA CB HB1 CB HB2 CB HB3 DOUBLE O C
As noted above, the angle and dihedral terms will be autogenerated and are therefore not listed for this residue. The less common improper dihedrals (normally just called impropers), however, must be listed explicitly. In this case there are two impropers, which maintain the planarity of the peptide bonds. As with dihedrals, the order of atoms within an improper may by reversed. As shown below, impropers are specified by the IMPR statement followed by sets of four atoms, with the central atom to which the other three are bonded typically listed first.
IMPR N -C CA HN C CA +N O
Explicit hydrogen bond terms are no longer present in the CHARMM force field and are therefore not calculated by NAMD. The DONOR and ACCEPTOR statements, shown below, specify pairs of atoms eligible to form hydrogen bonds. The psfgen module in VMD ignores these statements and does not incorporate hydrogen bonding information into the PSF file.
DONOR HN N ACCEPTOR O C
Finally in the residue definition are the internal coordinate IC statements. For each set of four atoms 1 2 3 4, the IC specifies in order the bond length 1-2, the angle 1-2-3, the dihedral 1-2-3-4, the angle 2-3-4, and the bond length 3-4. With this set of data, the position of atom 1 may be determined based on the positions of atoms 2-4, and the position of atom 4 may be determined from the positions of atoms 1-3, allowing the recursive generation of coordinates for all atoms in the structure based on a three-atom seed. Improper IC statements are indicated by a * preceeding the third atom, the atom to which the other three are bonded, as in 1 2 *3 4. The order of atoms in an IC statement is different from that of an IMPR statement, and values provide the length 1-3, the angle 1-3-2, the dihedral 1-2-3-4, the angle 2-3-4, and the length 3-4.
IC -C CA *N HN 1.3551 126.4900 180.0000 115.4200 0.9996 IC -C N CA C 1.3551 126.4900 180.0000 114.4400 1.5390 IC N CA C +N 1.4592 114.4400 180.0000 116.8400 1.3558 IC +N CA *C O 1.3558 116.8400 180.0000 122.5200 1.2297 IC CA C +N +CA 1.5390 116.8400 180.0000 126.7700 1.4613 IC N C *CA CB 1.4592 114.4400 123.2300 111.0900 1.5461 IC N C *CA HA 1.4592 114.4400 -120.4500 106.3900 1.0840 IC C CA CB HB1 1.5390 111.0900 177.2500 109.6000 1.1109 IC HB1 CA *CB HB2 1.1109 109.6000 119.1300 111.0500 1.1119 IC HB1 CA *CB HB3 1.1109 109.6000 -119.5800 111.6100 1.1114
It was mentioned above that special treatment is required for the first or last residue in a chain, and that this is implemented as a patch. The following is the default first-residue patch for proteins, NTER. The syntax is almost identical to a normal residue, but the ATOM statements may refer to either add a new atom (HT1, HT2, HT3) or modify the type and charge of an existing atom of the given name (N, CA, HA). The DELETE statement operates as one would expect, removing the atom (HN) and any bonds that include it.
PRES NTER 1.00 ! standard N-terminus GROUP ! use in generate statement ATOM N NH3 -0.30 ! ATOM HT1 HC 0.33 ! HT1 ATOM HT2 HC 0.33 ! (+)/ ATOM HT3 HC 0.33 ! --CA--N--HT2 ATOM CA CT1 0.21 ! | \ ATOM HA HB 0.10 ! HA HT3 DELETE ATOM HN BOND HT1 N HT2 N HT3 N DONOR HT1 N DONOR HT2 N DONOR HT3 N IC HT1 N CA C 0.0000 0.0000 180.0000 0.0000 0.0000 IC HT2 CA *N HT1 0.0000 0.0000 120.0000 0.0000 0.0000 IC HT3 CA *N HT2 0.0000 0.0000 120.0000 0.0000 0.0000
The comment ``use in generate statement'' indicates that the NTER patch is used during segment generation, and is applied before angles and dihedrals are generated. An example of the other type of patch, those applied after segment generation, is the LINK patch given below. The patch statement requires, in this case, two residues as arguments, the first being the last residue of a segment generated without the CTER patch and the second being the first residue of a segment generated without the NTER patch (the comments have the N and C termini reversed). The numbers 1 and 2 preceeding atom names indicate which argument residue the named atom belongs to. Since angles and dihedrals will not be regenerated, they are enumerated.
PRES LINK 0.00 ! linkage for IMAGES or for joining segments ! 1 refers to previous (N terminal) ! 2 refers to next (C terminal) ! use in a patch statement BOND 1C 2N ANGLE 1C 2N 2CA 1CA 1C 2N ANGLE 1O 1C 2N 1C 2N 2HN DIHE 1C 2N 2CA 2C 1C 2N 2CA 2HA 1C 2N 2CA 2CB DIHE 1HA 1CA 1C 2N 1N 1CA 1C 2N 1CB 1CA 1C 2N DIHE 1CA 1C 2N 2HN 1CA 1C 2N 2CA DIHE 1O 1C 2N 2HN 1O 1C 2N 2CA IMPR 2N 1C 2CA 2HN 1C 1CA 2N 1O IC 1N 1CA 1C 2N 0.0000 0.0000 180.0000 0.0000 0.0000 IC 2N 1CA *1C 1O 0.0000 0.0000 180.0000 0.0000 0.0000 IC 1CA 1C 2N 2CA 0.0000 0.0000 180.0000 0.0000 0.0000 IC 1C 2N 2CA 2C 0.0000 0.0000 180.0000 0.0000 0.0000 IC 1C 2CA *2N 2HN 0.0000 0.0000 180.0000 0.0000 0.0000
These types of patches are used to alter protonation states (ASPP, GLUP, HS2), create disulphide bonds (DISU), attach HEME groups (PHEM) and their ligands (PLO2, PLIG), and even remove unwanted autogenerated angles (FHEM).
The following is the complete residue definition for GLY, the smallest of the amino acids. Compare GLY to the ALA residue disected above.
RESI GLY 0.00 GROUP ATOM N NH1 -0.47 ! | ATOM HN H 0.31 ! N-H ATOM CA CT2 -0.02 ! | ATOM HA1 HB 0.09 ! | ATOM HA2 HB 0.09 ! HA1-CA-HA2 GROUP ! | ATOM C C 0.51 ! | ATOM O O -0.51 ! C=O ! | BOND N HN N CA C CA BOND C +N CA HA1 CA HA2 DOUBLE O C IMPR N -C CA HN C CA +N O DONOR HN N ACCEPTOR O C IC -C CA *N HN 1.3475 122.8200 180.0000 115.6200 0.9992 IC -C N CA C 1.3475 122.8200 180.0000 108.9400 1.4971 IC N CA C +N 1.4553 108.9400 180.0000 117.6000 1.3479 IC +N CA *C O 1.3479 117.6000 180.0000 120.8500 1.2289 IC CA C +N +CA 1.4971 117.6000 180.0000 124.0800 1.4560 IC N C *CA HA1 1.4553 108.9400 117.8600 108.0300 1.0814 IC N C *CA HA2 1.4553 108.9400 -118.1200 107.9500 1.0817 PATCHING FIRS GLYP
Unlike ALA and most protein residues, in which CA is bonded to HA and CB, in GLY it is bonded to a pair of hydrogens, HA1 and HA2. Also, the hydrogen bonded to N is named H rather than HN. For these reasons, the default NTER patch cannot be applied to GLY and the PATCHING statement is used to change the default first-residue patch for GLY to GLYP. Similarly, the LINK patch above cannot be used for GLY residues, so the additional patches LIG1, LIG2, and LIG3 are provided to link GLY to non-GLY, non-GLY to GLY, and GLY to GLY.
Water is also defined as a residue, as shown below, but some care is required for its use. A third bond, not required by NAMD, is included to allow CHARMM to make the water molecule rigid. This ring structure would confuse angle and dihedral autogeneration, so segments of water must be generated with autogeneration disabled, and therefore an explicit angle is also included. If this third bond is removed from the water topology, then autogeneration must still be disabled to avoid duplicating the angle, unless the angle is also removed.
RESI TIP3 0.000 ! tip3p water model, generate using noangle nodihedral GROUP ATOM OH2 OT -0.834 ATOM H1 HT 0.417 ATOM H2 HT 0.417 BOND OH2 H1 OH2 H2 H1 H2 ! the last bond is needed for shake ANGLE H1 OH2 H2 ! required ACCEPTOR OH2 PATCHING FIRS NONE LAST NONE
The residue definitions for the ions below are exceedingly simple.
RESI SOD 1.00 ! Sodium Ion GROUP ATOM SOD SOD 1.00 PATCHING FIRST NONE LAST NONE RESI CLA -1.00 ! Chloride Anion GROUP ATOM CLA CLA -1.00 PATCHING FIRST NONE LAST NONE
We have discussed only those parts of the topology files associated with proteins and solvent. There is much additional information regarding proteins, not to mention lipids and nucleic acids, included in the comments in the topology files themselves.