Next: Parameter Files Up: NAMD Tutorial Previous: PSF Files

Topology Files

A CHARMM forcefield topology file contains all of the information needed to convert a list of residue names into a complete PSF structure file. It also contains internal coordinates that allow the automatic assignment of coordinates to hydrogens and other atoms missing from a crystal PDB file.

The current versions of the CHARMM forcefield are CHARMM22 for proteins and CHARMM27 for lipids and nucleic acids including CMAP correction to proteins. The individual topology files are named, respectively, top_all22_prot_cmap.inp, top_all27_lipid.rtf, and top_all27_na.rtf. To enable computation on hybrid systems, combinations are also provided, named top_all27_na_lipid.rtf, top_all27_prot_lipid.rtf, and top_all27_prot_na.rtf which can all be found in the CHARMM31 release. While the tools used with NAMD allow multiple topology and parameter files to be used simultaneously, it is preferable to use these pre-combined files. The CHARMM31 release is available for download from the MacKerell web site:

http://www.pharmacy.umaryland.edu/faculty/amackere/force_fields.htm

We will examine top_all27_prot_lipid.rtf; the other files are similar. At the beginning of the file is the header, indicated by lines beginning with *s, followed in this case by ``31 1" to indicate the version of CHARMM that generated the file:

*> CHARMM22 All-Hydrogen Topology File for Proteins and Lipids <<
*>>>>>> Includes phi, psi cross term map (CMAP) correction <<<<<<
*>>>>>>>>>>>>>>>>>>>>>>   July 2004    <<<<<<<<<<<<<<<<<<<<<<<<<<
* All comments to ADM jr. via the CHARMM web site: www.charmm.org
*               parameter set discussion forum
*
31  1

Comments in the topology file are indicated by ! anywhere on a line. Important usage information is often contained in these comments, so it is always a good idea to inspect the topology file in a text editor when using it to build a structure. The next part of the file is a long set of comments containing references to the papers and other sources of the parameters in the file:

! references
!
!PROTEINS
!
!MacKerell, A.D., Jr,. Feig, M., Brooks, C.L., III, Extending the
!treatment of backbone energetics in protein force fields: limitations
!of gas-phase quantum mechanics in reproducing protein conformational
!distributions in molecular dynamics simulations, Journal of
!Computational Chemistry, 25: 1400-1415, 2004.
!
!MacKerell, Jr., A. D.; Bashford, D.; Bellott, M.; Dunbrack Jr., R.L.;
!Evanseck, J.D.; Field, M.J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.;
!Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F.T.K.; Mattos,
!C.; Michnick, S.; Ngo, T.; Nguyen, D.T.; Prodhom, B.; Reiher, III,
!W.E.; Roux, B.; Schlenkrich, M.; Smith, J.C.; Stote, R.; Straub, J.;
!Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M.  All-atom
!empirical potential for molecular modeling and dynamics Studies of
!proteins.  Journal of Physical Chemistry B, 1998, 102, 3586-3616.
!

The topology file must define the type, mass, and charge of every atom in every residue, so that a PSF file can be constructed. While the partial charges assigned to atoms of the same type vary between residues, their masses do not. Therefore, the mass of every atom type is declared once at the beginning of the file in a MASS statement. This statement also pairs an integer with each type name, which is used in CHARMM formatted PSF files, but not in the X-PLOR formatted PSF files used by NAMD. The type indices are unique but not necessarily consecutive. Notice in the following except that there are many types of hydrogen and carbon atoms defined, but the atomic masses are the same:

MASS     1 H      1.00800 H ! polar H
MASS     2 HC     1.00800 H ! N-ter H
MASS     3 HA     1.00800 H ! nonpolar H
MASS     4 HT     1.00800 H ! TIPS3P WATER HYDROGEN
MASS     5 HP     1.00800 H ! aromatic H
MASS     6 HB     1.00800 H ! backbone H
MASS     7 HR1    1.00800 H ! his he1, (+) his HG,HD2
MASS     8 HR2    1.00800 H ! (+) his HE1
MASS     9 HR3    1.00800 H ! neutral his HG, HD2
MASS    10 HS     1.00800 H ! thiol hydrogen
MASS    11 HE1    1.00800 H ! for alkene; RHC=CR
MASS    12 HE2    1.00800 H ! for alkene; H2C=CR
MASS    13 HA1    1.00800 H ! alkane, CH, new LJ params (see toppar_all22_prot_aliphatic_c27.str)
MASS    14 HA2    1.00800 H ! alkane, CH2, new LJ params (see toppar_all22_prot_aliphatic_c27.str)
MASS    15 HA3    1.00800 H ! alkane, CH3, new LJ params (see toppar_all22_prot_aliphatic_c27.str)
MASS    16 HF1    1.00800 H ! Aliphatic H on fluorinated C (see toppar_all22_prot_fluoro_alkanes.str)
MASS    17 HF2    1.00800 H ! Aliphatic H on fluorinated C (see toppar_all22_prot_fluoro_alkanes.str)
MASS    20 C     12.01100 C ! carbonyl C, peptide backbone
MASS    21 CA    12.01100 C ! aromatic C
MASS    22 CT1   12.01100 C ! aliphatic sp3 C for CH
MASS    23 CT2   12.01100 C ! aliphatic sp3 C for CH2
MASS    24 CT3   12.01100 C ! aliphatic sp3 C for CH3

When specifying the connectivity of a chain of residues in a protein, it is necessary to refer to atoms in the previous or succeeding residue. The CHARMM topology file declares those atom types that will be referenced in adjoining residues as such:

DECL -CA
DECL -C
DECL -O
DECL +N
DECL +HN
DECL +CA

The first and last residues of a chain obviously have different connectivity from those in the center, since they have one fewer neighbor. This is handled by applying patch residues, normally referred to as patches, to the terminal residues. As will be seen, any residue definition can specify the patch to be applied when it is the first or last residue in a segment. However, a default set is declared for the entire file, as in the following where the default patch is NTER for the first residue of a segment and CTER for the last:

DEFA FIRS NTER LAST CTER

While the covalent bond connectivity between atoms must necessarily be provided by the topology file, enumerating all of the required angles and dihedrals would be tedious and error-prone, as well as enormously complicated since every combination of residues joined by a peptide bond would require a different set. Therefore, angles and dihedrals are automatically generated for every pair or triple of connected bonds when a segment is built. This autogeneration may be enabled or disabled on a per-segment basis as it should never be used on segments of water, but the default is defined in the topology file:

AUTO ANGLES DIHE

We are now ready for the actual residue definitions, beginning with alanine, as shown below. A residue is indicated by the RESI statement with the residue name (ALA) and total charge (0.00). Next are listed all of the atoms in the residue in ATOM statements with the atom name (N, HN, CA), type (NH1, H, CT1), and partial charge (-0.47, 0.31, 0.07). The GROUP statements, dividing the atoms into integer-charge groups, are not used by NAMD and should not be confused with the em hydrogen groups, each a non-hydrogen atom and all hydrogens bonded to it, that NAMD uses to accelerate distance-testing for nonbonded calculations.

RESI ALA          0.00
GROUP
ATOM N    NH1    -0.47  !     |
ATOM HN   H       0.31  !  HN-N
ATOM CA   CT1     0.07  !     |     HB1
ATOM HA   HB      0.09  !     |    /
GROUP                   !  HA-CA--CB-HB2
ATOM CB   CT3    -0.27  !     |    \
ATOM HB1  HA      0.09  !     |     HB3
ATOM HB2  HA      0.09  !   O=C
ATOM HB3  HA      0.09  !     |
GROUP                   !
ATOM C    C       0.51
ATOM O    O      -0.51

The ALA residue continues by defining connectivity, with each BOND statement followed by a list of pairs of atoms to be connected with bonds. The DOUBLE statement is a synonym for BOND and does not affect the resulting PSF file. Observe that the atom C is bonded to +N, the N of the following residue. A bond between N and -C will be provided by the preceding residue. The order of bonds, or of the atoms within a bond, is not significant.

BOND CB CA  N  HN  N  CA
BOND C  CA  C  +N  CA HA  CB HB1  CB HB2  CB HB3
DOUBLE O  C

As noted above, the angle and dihedral terms will be autogenerated and are therefore not listed for this residue. The less common improper dihedrals (normally just called impropers), however, must be listed explicitly. In this case there are two impropers, which maintain the planarity of the peptide bonds. As with dihedrals, the order of atoms within an improper may be reversed. As shown below, impropers are specified by the IMPR statement followed by sets of four atoms, with the central atom to which the other three are bonded typically listed first.

IMPR N -C CA HN  C CA +N O

The CMAP correction terms should also be listed explicitly since they are only applied to the backbone dihedrals, indicated following the CMAP statement.

CMAP -C  N  CA  C   N  CA  C  +N

Explicit hydrogen bond terms are no longer present in the CHARMM force field and are therefore not calculated by NAMD. The DONOR and ACCEPTOR statements, shown below, specify pairs of atoms eligible to form hydrogen bonds. The psfgen module in VMD ignores these statements and does not incorporate hydrogen bonding information into the PSF file.

DONOR HN N
ACCEPTOR O C

Finally in the residue definition are the internal coordinate IC statements. For each set of four atoms 1 2 3 4, the IC specifies in order the bond length 1-2, the angle 1-2-3, the dihedral 1-2-3-4, the angle 2-3-4, and the bond length 3-4. With this set of data, the position of atom 1 may be determined based on the positions of atoms 2-4, and the position of atom 4 may be determined from the positions of atoms 1-3, allowing the recursive generation of coordinates for all atoms in the structure based on a three-atom seed. Improper IC statements are indicated by a * preceding the third atom, the atom to which the other three are bonded, as in 1 2 *3 4. The order of atoms in an IC statement is different from that of an IMPR statement, and values provide the length 1-3, the angle 1-3-2, the dihedral 1-2-3-4, the angle 2-3-4, and the length 3-4.

IC -C   CA   *N   HN    1.3551 126.4900  180.0000 115.4200  0.9996
IC -C   N    CA   C     1.3551 126.4900  180.0000 114.4400  1.5390
IC N    CA   C    +N    1.4592 114.4400  180.0000 116.8400  1.3558
IC +N   CA   *C   O     1.3558 116.8400  180.0000 122.5200  1.2297
IC CA   C    +N   +CA   1.5390 116.8400  180.0000 126.7700  1.4613
IC N    C    *CA  CB    1.4592 114.4400  123.2300 111.0900  1.5461
IC N    C    *CA  HA    1.4592 114.4400 -120.4500 106.3900  1.0840
IC C    CA   CB   HB1   1.5390 111.0900  177.2500 109.6000  1.1109
IC HB1  CA   *CB  HB2   1.1109 109.6000  119.1300 111.0500  1.1119
IC HB1  CA   *CB  HB3   1.1109 109.6000 -119.5800 111.6100  1.1114

It was mentioned above that special treatment is required for the first or last residue in a chain, and that this is implemented as a patch. The following is the default first-residue patch for proteins, NTER. The syntax is almost identical to a normal residue, but the ATOM statements may refer to either add a new atom (HT1, HT2, HT3) or modify the type and charge of an existing atom of the given name (N, CA, HA). The DELETE statement operates as one would expect, removing the atom (HN) and any bonds that include it.

PRES NTER         1.00 ! standard N-terminus
GROUP                  ! use in generate statement
ATOM N    NH3    -0.30 !
ATOM HT1  HC      0.33 !         HT1
ATOM HT2  HC      0.33 !     (+)/
ATOM HT3  HC      0.33 ! --CA--N--HT2
ATOM CA   CT1     0.21 !   |    \
ATOM HA   HB      0.10 !   HA    HT3
DELETE ATOM HN
BOND HT1 N HT2 N HT3 N
DONOR HT1 N
DONOR HT2 N
DONOR HT3 N
IC HT1  N    CA   C     0.0000  0.0000  180.0000  0.0000  0.0000
IC HT2  CA   *N   HT1   0.0000  0.0000  120.0000  0.0000  0.0000
IC HT3  CA   *N   HT2   0.0000  0.0000  120.0000  0.0000  0.0000

The comment ``use in generate statement'' indicates that the NTER patch is used during segment generation, and is applied before angles and dihedrals are generated. An example of the other type of patch, those applied after segment generation, is the LINK patch given below. The patch statement requires, in this case, two residues as arguments, the first being the last residue of a segment generated without the CTER patch and the second being the first residue of a segment generated without the NTER patch (the comments have the N and C termini reversed). The numbers 1 and 2 preceding atom names indicate which argument residue the named atom belongs to. Since angles and dihedrals will not be regenerated, they are enumerated.

PRES LINK         0.00 ! linkage for IMAGES or for joining segments
                       ! 1 refers to previous (N terminal)
                       ! 2 refers to next (C terminal)
                       ! use in a patch statement
                       ! follow with AUTOgenerate ANGLes DIHEdrals command
BOND 1C 2N   
!the need for the explicit specification of angles and dihedrals in
!patches linking images has not been tested
!ANGLE 1C 2N 2CA  1CA 1C 2N   
!ANGLE 1O 1C 2N   1C  2N 2HN   
!DIHE 1C  2N  2CA 2C   1C  2N  2CA 2HA  1C  2N  2CA 2CB   
!DIHE 1HA 1CA 1C  2N   1N  1CA 1C  2N   1CB 1CA 1C  2N   
!DIHE 1CA 1C  2N  2HN  1CA 1C  2N  2CA   
!DIHE 1O  1C  2N  2HN  1O  1C  2N  2CA   
IMPR 2N 1C 2CA 2HN  1C 1CA 2N 1O   
IC 1N   1CA  1C   2N    0.0000  0.0000  180.0000  0.0000  0.0000
IC 2N   1CA  *1C  1O    0.0000  0.0000  180.0000  0.0000  0.0000
IC 1CA  1C   2N   2CA   0.0000  0.0000  180.0000  0.0000  0.0000
IC 1C   2N   2CA  2C    0.0000  0.0000  180.0000  0.0000  0.0000
IC 1C   2CA  *2N  2HN   0.0000  0.0000  180.0000  0.0000  0.0000

These types of patches are used to alter protonation states (ASPP, GLUP, HS2), create disulphide bonds (DISU), attach HEME groups (PHEM) and their ligands (PLO2, PLIG), and even remove unwanted autogenerated angles (FHEM).

The following is the complete residue definition for GLY, the smallest of the amino acids. Compare GLY to the ALA residue dissected above.

RESI GLY          0.00
GROUP
ATOM N    NH1    -0.47  !     |
ATOM HN   H       0.31  !     N-H
ATOM CA   CT2    -0.02  !     |
ATOM HA1  HB      0.09  !     |
ATOM HA2  HB      0.09  ! HA1-CA-HA2
GROUP                   !     |
ATOM C    C       0.51  !     |
ATOM O    O      -0.51  !     C=O
                        !     |
BOND N HN  N  CA  C CA
BOND C +N  CA HA1 CA HA2
DOUBLE O  C
IMPR N -C  CA HN  C CA   +N O
CMAP -C  N  CA  C   N  CA  C  +N
DONOR HN N
ACCEPTOR O C
IC -C   CA   *N   HN    1.3475 122.8200  180.0000 115.6200  0.9992
IC -C   N    CA   C     1.3475 122.8200  180.0000 108.9400  1.4971
IC N    CA   C    +N    1.4553 108.9400  180.0000 117.6000  1.3479
IC +N   CA   *C   O     1.3479 117.6000  180.0000 120.8500  1.2289
IC CA   C    +N   +CA   1.4971 117.6000  180.0000 124.0800  1.4560
IC N    C    *CA  HA1   1.4553 108.9400  117.8600 108.0300  1.0814
IC N    C    *CA  HA2   1.4553 108.9400 -118.1200 107.9500  1.0817
PATCHING FIRS GLYP

Unlike ALA and most protein residues, in which CA is bonded to HA and CB, in GLY it is bonded to a pair of hydrogens, HA1 and HA2. Also, the hydrogen bonded to N is named H rather than HN. For these reasons, the default NTER patch cannot be applied to GLY and the PATCHING statement is used to change the default first-residue patch for GLY to GLYP. Similarly, the LINK patch above cannot be used for GLY residues, so the additional patches LIG1, LIG2, and LIG3 are provided to link GLY to non-GLY, non-GLY to GLY, and GLY to GLY.

Water is also defined as a residue, as shown below, but some care is required for its use. A third bond, not required by NAMD, is included to allow CHARMM to make the water molecule rigid. This ring structure would confuse angle and dihedral autogeneration, so segments of water must be generated with autogeneration disabled, and therefore an explicit angle is also included. If this third bond is removed from the water topology, then autogeneration must still be disabled to avoid duplicating the angle, unless the angle is also removed.

RESI TIP3         0.000 ! tip3p water model, generate using noangle nodihedral
GROUP
ATOM OH2  OT     -0.834
ATOM H1   HT      0.417
ATOM H2   HT      0.417
BOND OH2 H1 OH2 H2 H1 H2    ! the last bond is needed for shake
ANGLE H1 OH2 H2             ! required
ACCEPTOR OH2
PATCHING FIRS NONE LAST NONE

The residue definitions for the ions below are exceedingly simple.

RESI SOD          1.00 ! Sodium Ion
GROUP
ATOM SOD  SOD     1.00
PATCHING FIRST NONE LAST NONE

RESI CLA         -1.00 ! Chloride Anion
GROUP
ATOM CLA  CLA    -1.00
PATCHING FIRST NONE LAST NONE

We have discussed only those parts of the topology files associated with proteins and solvent. There is much additional information regarding proteins, not to mention lipids and nucleic acids, included in the comments in the topology files themselves.

Next: Parameter Files Up: NAMD Tutorial Previous: PSF Files

namd@ks.uiuc.edu