Paratool User's Guide

Introduction

Paratool provides a graphical interface for force field parametrizations of molecules that are not contained in your force field. It is designed to generate CHARMM compliant parameters, while it is also possible to adhere to the AMBER protocol. Note that the recommended tool for AMBER parameters is Antechamber, a collection of commandline programs that can (among other things) find missing AMBER force field parameters .

The plugin helps you to generate the molecule or the moleular fragment that should be parametrized and to set up the necessary quantumchemical calculations (you'll need Gaussian, later versions will also support GAMESS). Paratool reads the logfiles of the QM simulations, computes force field parameters for bonds, angles dihedrals and impropers by transforming the Hessian into internal coordinates and converting all values into units based on kcal/mol, Å and degrees. You can assign atom types and the corresponding VDW parameters from a list of already existing types or generates new types (all chemical elements H - Rn supported). The charges can be determined using restricted electrostatic potential fitting RESP (AMBER style) or CHARMM style charges using their supramolecular approach which requires another Gaussian calculation.

Paratool generates input for all necessary external programs collects and compiles all the necessary data, organizes and displays them neatly in lists and projects them onto your molecule in VMD. Finally it will write the topology and parameter files you'll need to build the molecule using psfgen and run your simulation in your favourite simulation package, e.g. NAMD.

Outline of the Parametrization Scheme

In the following it will be assumed that you are familiar with the parametrization scheme of your favourite force field. It is highly recommended that you read the original publications [1-4]. You can refresh your memory about the differences in the general parametrization scheme in CHARMM and AMBER here. Another worthwile reading is [5] and the tutorials found here and here. An absolute must-read is Alex Mackerell's parameter development tutorial!

Parametrizing a novel component requires the following steps which will be explained in detail further below: First a molecular model compound containing unparametrized parts has to be defined. Then the atom type and its according VDW parameters have to be chosen for each atom from a list of existing atom types of the selected force field. Further, all transferable parameters from the existing force field have to be assigned to the model. If missing parameters remain then these will be determined de novo. In that case the model compound is the starting point for a quantumchemical geometry optimization. A QM frequency calculation based on the optimized geometry yields the Hessian matrix which can be transformed into the set of internal coordinates describing the bonded interactions. The diagonal elements of this internal Hessian denote the force constants related to these coordinates. The basis for the charges is obtained from a QM single point calculation from which Mulliken charges, natural population analysis (NPA) charges [7] and an electrostatic potential (ESP) charge fit [1,8] can be extracted. These raw data can subsequently be used to compute the CHARMM type charges according to the supramolecular approach or to setup a restricted electrostatic potential fit (RESP). Subsequently, the parameters should be refined in order to account for the implicit double accounting of electrostatics in 1-4 interactions. Finally the new parameters must be tested in a molecular dynamics simulation.

Limitations

Force fields that are desiged for a wide range of purposes covering many different molecular entities like CHARMM and AMBER must be parametrized in a very thorough manner and with great care following a coherent protocol. For the parametrization of additional compounds one might deviate from these protocols but one should always keep in mind that the quality of the results can depend on the consistency with the other force field parameters. Thus a few important aspects of the parametrization strategy of CHARMM and AMBER will be metioned. For a complete discussion we refer the reader to the original publications [1-4].

The current version of Paratool only considers quantum chemical target data such as equilibrium geometry, vibrational frequencies, electrostatic field and dipole moment. Nevertheless, you can manually replace the quantum chemical equilibrium geometries and force constants for bonds, angles, dihedrals and impropers by experimental values.

Paratool does not yet implement the aforementioned automatic selfconsistent iterative fitting against these data but you can of course manually loop through several cycles of refinement and hand-tune the data any time. Though in many cases the first iteration will already yield satisfactory results.

How to parametrize a new molecule in PARATOOL

The very first thing you should do is visiting Alex Mackerell's website and getting the latest CHARMM force field. Inspect the topology files and see if your molecule has been parametrized already. Paratool offers default top/par files top_all27_prot_lipid_na.inp and par_all27_prot_lipid_na.inp containing amino acids, nucleic acids, lipids and a few other common compounds. But there's more to find in the CHARMM files. Especially take a look at the file for the model compounds top_all22_model.inp.

Defining a model compound

While you're inspecting the molecules of the existing force field you'll most likely notice that some moieties of your molecule are the same as in one of the already parametrized molecules. We'll make use of that fact. The CHARMM and AMBER force fields are both designed for an additive approach. They consist of molecular building blocks, so called model compounds which can be combined into form bigger units. I.e. the parameters are meant to be transferable to similar molecules. For instance take a look at aspartate:

RESI ASP         -1.00
GROUP
ATOM N    NH1    -0.47  !     |
ATOM HN   H       0.31  !  HN-N
ATOM CA   CT1     0.07  !     |   HB1   OD1
ATOM HA   HB      0.09  !     |   |    //
GROUP                   !  HA-CA--CB--CG
ATOM CB   CT2    -0.28  !     |   |    \
ATOM HB1  HA      0.09  !     |   HB2   OD2(-)
ATOM HB2  HA      0.09  !   O=C
ATOM CG   CC      0.62  !     |
ATOM OD1  OC     -0.76
ATOM OD2  OC     -0.76
GROUP
ATOM C    C       0.51
ATOM O    O      -0.51

It is stitched together from two smaller model compounds acetate and glycine (found in top_all22_model.inp and top_all27_prot_lipid.inp) where H2 and HA2 are removed and C1 and CA are connected by a bond (C1 then become CB in the new compound etc.) The charges of the two removed hydrogens were simply added to their mother atoms.

RESI ACET       -1.00 ! acetate, K. Kuczera
GROUP
ATOM  C1  CT3   -0.37 !
ATOM  C2  CC     0.62 !     H1    O1 (-)
ATOM  H1  HA     0.09 !     |    /
ATOM  H2  HA     0.09 ! H2--C1--C2
ATOM  H3  HA     0.09 !     |    \\
ATOM  O1  OC    -0.76 !     H3    O2
ATOM  O2  OC    -0.76 !

RESI GLY          0.00
GROUP   
ATOM N    NH1    -0.47  !     |
ATOM HN   H       0.31  !     N-H
ATOM CA   CT2    -0.02  !     |  
ATOM HA1  HB      0.09  !     |  
ATOM HA2  HB      0.09  ! HA1-CA-HA2
GROUP                   !     |  
ATOM C    C       0.51  !     |  
ATOM O    O      -0.51  !     C=O
                        !     |

But where do the parameters for the peptide bonds that are formed when the multiple residues are joined in a polypeptide come from? They were obtained from the model compound N-methylacetamide. For CR the peptide bond would be in N-terminal position for CL it would be C-terminal.

RESI NMA         0.00 ! N-methylacetamide, Louis Kuchnir
GROUP
ATOM CL   CT3   -0.27           !      HR2          
ATOM HL1  HA     0.09           !       |       
ATOM HL2  HA     0.09           !  HR1--CR--HR3 
ATOM HL3  HA     0.09           !	|   
ATOM C    C      0.51           !	|   
ATOM O    O     -0.51           !       N-H
ATOM N    NH1   -0.47           !       |      
ATOM H    H      0.31           !     O=C   	   
ATOM CR   CT3   -0.11           !       |		   
ATOM HR1  HA     0.09           !       |		   
ATOM HR2  HA     0.09           !  HL1--CL--HL3
ATOM HR3  HA     0.09           !       |
                                !      HL2

If one combines two NMA molecules the in the same manner we merged GLY and ACET we obtain a model compound with two peptice bonds. Clipping out the middle amino acid at the peptide bonds yields GLY as seen above. Try to split your molecule into fragments that are parametrized already! This saves a lot of time, you might end up with a much smaller and simpler molecule to parametrize which you then combine with the other pieces. This is also makes your parameters more CHARMM compliant. Unfortunately Paratool does not yet support automatic combining of model compounds so you'll have to do that last step manually. One thing has to be kept in mind when selecting the model compounds. When they are combined then parameters spanning the linkage like the link bond and the involved angles have to be present, too. However in most cases these links are something simple like a bond between two unpolar sp3 hybridized carbons for which numerous templates exist.

Preparing a model compound (base molecule)

To begin a new Paratool project you can simply start Paratool from the Tkcon command line or from the Extensions Menu in VMD's Main window. In the GUI that pops up (see fig. 1), you'll see entries for IDs and file names for 4 different molecules. The ID is the same as in the VMD Main window. The molecular model compound to which all further work will be related and into which all data will be displayed is the "base molecule". You can load it using the File menu. In the file open dialog you can choose between different file types. PDB is the default. XBGF is an internal file format derived from "biograf". It is similar to the PDB format, but is able to store additional info such as atom types, VDW parameters and partial charges.

Throughout the parametrization process a quantum mechanically (QM) optimized structure of the base molecule and a QM single point calculation will be generated. They'll populate the last two entries.

The parent molecule is the molecule embedding the base molecule, e.g. the protein containing a nonstandard aminoacid or an organic substrate. The parent molecule is actually only needed if there are covalent bonds or coordination complex bonds (e.g. metal complexes or iron-sulfur clusters) exist between the unparametrized part and the rest of the molecule. In this case the size of the environment which is added to the base molecule can be controlled interactively using the entry labelled with "Selection of atoms from parent molecule:" in the Paratool Main window. The parent molecule is set automatically if you start Paratool using AutoPSF.

While it is probably a good idea to prepare the model compound beforehannd in your favourite molecule editor you can also edit your base molecule on the fly using VMD's built in editor Molefacture. You can for example cap open valences where covalent bonds were chopped or delete atom groups that you want to parametrize separately. I'll repeat myself here: Generally you should try to split larger molecules into smaller fragments and parametrize these separately. You can save a lot of CPU time during the quantumchemical calculations and if the fragments are chosen properly you can generate transferable parameters for molecular building blocks. The CHARMM and AMBER force fields are both designed for this additive approach. These model compounds can finally be combined to real molecules again.

Note that whenever you change the selection of the base molecule or you modify it using Molefacture the previous base molecule will be deleted and a new one generated and loaded. Thus the molecule ID will not stay the same.

Editing atom properties

Now that your model compound has been determined, you can already set many atom based properties. Use "Edit->Atom properties" to open the appropriate interface. You'll see a list of properties for each atom from which you can select single or multiple atoms to edit them using the entry fields at the bottom of the window. There are three ways of selecting atoms. 1) You can directly click on atoms in the OpenGL window, the atom becomes marked with an orange sphere and seleccted in the listbox. Holding the shift key while picking adds the atom to the selection rather that replacing it. 2) You can select atoms from the list using the mouse or the arrow keys. Dragging or shift selects contiguous groups from the list, holding the control key allows to select arbitrary atoms. 3) You can use VMD's atom selection language.

The first property you want to set is the residue name (Rname). Select all atoms and type a name in the adequate field, confirming with "Submit". Make sure that the residue name is unique among the topology files you are planning to use. The residue number (Resid) and the segment ID (Segid) are only relevant for transition metal complexes and iron-sulfur clusters but they must be identical for all atoms of a model compound. If you are unhappy with the default atom names from you can change them but each name must be unique in a residue. Also note that atom names and the aforementioned strings should be in capital letters.

Assign atom types and VDW parameters

Next you have to choose the atom types. They describe a combination of periodic element and chemical environment. Thus equivalent atoms such as hydrogens in methyl groups of carboxylate oxygen have the same type. In simulations the parameters for all bonded and VDW interactions are looked up by the atom types. The user can choose types and VDW parameters from a list of atom types of the same chemical element existing in the current force field, i.e. that occur in the loaded parameter files. You'll get there pressing the "Choose type" button in the "Edit atom properties" window. It is up to you to decide which given atom type matches the chemical properties of the selected atom best. If you want these VDW params but need a different type, you can simply rename it. For elements that are not contained in the force field there are at the bottom of the list atom types from the Universal Force Field (UFF) [6]. This asserts that there are VDW params for each chemical element from H-Rn. These will also be chosen when you autoassign VDW parameters from the "Edit" menu. Here's a simple guideline for the assignments: Try to find an equivalent existing type of the same chemical element, Paratool lets you pick that from a list. Many of the types for a certain element have the same VDW params, e.g. CD and CC.

C      0.000000  -0.110000     2.000000 ! ALLOW   PEP POL ARO
                ! NMA pure solvent, adm jr., 3/3/93
CA     0.000000  -0.070000     1.992400 ! ALLOW   ARO
                ! benzene (JES)
CC     0.000000  -0.070000     2.000000 ! ALLOW   PEP POL ARO
                ! adm jr. 3/3/92, acetic acid heat of solvation
CD     0.000000  -0.070000     2.000000 ! ALLOW  POL
                ! adm jr. 3/19/92, acetate a.i. and dH of solvation

The reason why there are two types is that they are used in different contexts so that you can for instance distinguish dihedrals including CC or CD. Which type to use has to be decided using chemical insight. Just try to find the most similar atomtype by comparing structures where it appears. Factors influencing this similarity are for example hybridization state, polarization, same neighboring atoms. Always try to use an existing atom type, unless you end up with bonded parameters that are not equivalent enough. In the latter case you just change the type name, but keeping the VDW params. In case there is no type for the atom's chemical element you can use VDW parameters from the Universal Force Field (they are at the end of the list). You can of course change the predefined typename if you want. Assinging types implies the automatic assignment of bonded parameters for which all types are known and for which a matching entry could be found in the currently loaded parameter files.

Assigning bonded parameters

Choosing internal coordinates

Before bonded parameters can be assigned it has to be specified for which internal coordinates, i.e. bonds, angles, dihedrals and impropers bonded interactions must be determined. For each of these coordinates parameters for bonded interaction will either be taken from existing model compounds, or, in case no equivalent exists, be derived from the Hessian matrix (see below). "Edit->Intenal coordinates" opens a window containing a (initially empty) list of defined internal coordinates and a plethora of buttons. A good starting point is to autogenerate the internal coordinates. In this procedure all covalent bonds of a molecule (as displayed in VMD) are defined as bonds. Then all possible angles and dihedrals that can be constructed from the given bonds are generated. In many cases this is already sufficient but you can add and delete arbitrary coordinates. You can for example add impropers to stabilize planar geometries. More info about the choice of internal coordinates can be found here. Note: After you have generated internal coordinates you can already write valid topology files using the accordant button in the main window.

Finding equivalent parameters

As it was mentioned above already paratool tries to find known parameters automatically based on the type names. Assume you assign the type known CT3 to an atom then the according VDW params are already set automatically. But if you then assign the also known type CT2 to a neighboring atom then the CT3-CT2 parameters can be determind and are assigned to that bond. Similar things happen with angles and dihedrals if you continue assigning known atom types. By the way, in case the given atom name is found in the topology of the given resname then we assume that we are using this RESI as a template and the correct type, VDW params and even the charge are assigned.

We can make even more use of the existing force field through parametrization by analogy. In other words we are utilizing parameters of parts that have a similar chemical environment. You must decide what is chemically equivalent using your chemical insight. Among the properties you should compare are periodic elementss of the involved atoms, charges, bond orders and polarity of bonds. This technique has been applied a lot within the force field already. This is the reason why you find so many identical parameter entries. Here's an example:

CT1  CC    200.000     1.5220 ! ALLOW   POL
CT2  CC    200.000     1.5220 ! ALLOW   POL
CT3  CC    200.000     1.5220 ! ALLOW   POL

Only the type names of the involved atoms differ but the chemical nature of this bond is considered the same. The different type names were needed to distinguish other properties like angles or dihedrals.

QM geometry optimization

For the calculation of partial charges and the parameters for bonded interactions we need the QM optimized structure of the base molecule. In the current version Paratool supports the autogeneration of input files for Gaussian but later versions will also provide an interface to GAMESS. (For this task Paratool uses QMtool which you can also use stand-alone to setup and analyze QM simulations.)

You can obtain the respective interface through the "File->Setup QM optimization". The meaning of the buttons can be found here. In most cases the default values provided by Paratool should be fine, but it is important that you check the total charge and the multiplicity since these are difficult or impossible to guess automatically. a good choice for the simulation method and the basis set are RHF/6-31G* and UB3LYP/6-31+G* for metal containing molecules.

After the Gausian setup file has been written you have to run it yourself. When you load the resulting logfile, the last frame denoting the optimized structure will be appended to the base molecule which makes it possible to easily switch the view between the original and the optimized geometry. Additionally, a new molecule containing all optimization steps, will be loaded in VMD. It will not be shown automatically but you can enable it by hand in the VMD Main window.

Hessian and atomic charges from QM single point calculation

Based on the optimized geometry another QM calculation has to be performed. (In future versions these two steps might be merged). Choose "Hessian->Setup QM single point calc. (Hessian, charges)" to open the same form as for the QM optimization but now with different default values. Again, after writing the setup file you must run the simulation manually and subsequently load the logfile using the appropriate entry in the "File" menu.

...

Assigning charges

Suppose you want charmm style charges: First you must use copy base charges (e.g. Mulliken) into the charge field in the "Edit atom properties" window. These charges will be scaled such that the force field interaction with TIP3 water matches the QM interaction energy.

CHARMM style charges

Unfortunately Paratool does not support CHARMM type charge development yet but the following gives an idea what you have to do. For more info refer to Alex Mackerell's parameter optimization tutorial In the CHARMM protocol the charges for the polar groups are chosen such that the interaction energy in the force field between the molecule in question and a TIP3 water molecule is the same as the interaction energy determined by quantum mechanical calculations (HF/6-31G*). The interaction energy is determined by subtracting the energy of a water molecule and the energy of the system from the energy of the supramolecular complex:

  E(system+water)
- E(system)
- E(water)
----------------
= E(interaction)

Before comparision the QM interaction energy should be scaled with the empirical factor of 1.16. At the same time the force field based optimal distance should be 0.2A shorter that the HF/6-31G* value. The hydration sites are all polar subgroups of the model compound. One typically sets the charges of the direct interaction partner and its mother atom(s). Further all nonpolar hydrogens should obtain a charge of 0.09e. If your model compound is designed properly there will be carbons to use to "neutralize" the entire charge; this could be aliphatic or aromatic carbons that are not involved in significant hydrogen bonding.

AMBER style charges (RESP)

Charges for the AMBER force field are based on the restricted electrostatic potential fit (RESP) method. The charges are fitted in two stages against the molecular electrostatic potential (MEP) that was determined before in the QM single point calculation: 1) Fit to the potential is carried out with weak restraints (weight=0.0005au) but without forced symmetry on groups with equivalent nonpolar hygrogens e.g. methyl groups. The target charge of the restraint is normally zero, in order to counteract the tendency of ESP to produce very high charge values on buried atoms. No restraints on hydrogens because they are all well solvent exposed. Equivalent atoms other than these nonpolar hydrogens are constrained to have the same charge. 2) Refit of only those nonpolar hydrogen bearing groups (e.g. methyl, methylene, methine) but now using forced symmetry and strong restraints (weight=0.001au). Charges of all other atoms (e.g. polar groups) are fixed in the second state.

Parameter refinement

Another problem for automatic parametrization procedures is the the double accounting of electrostatics, especially in 1-4 interactions. It is explicitly modeled in the Coulomb term but it's is also implicitely included in the bonded terms since the internal Hessian is an equilibrium approximation of the totality of all interactions. In order to account for this problem Paratool offers a method to correct the bonded terms. The difficulty here is that the corresponding terms are often mutually dependent. Paratool computes an effective target potential of each internal coordinate for the refinement process. This means that not all parameters have to be fitted simultaneously which would not be practicable but only linear dependent subgroups.

To be continued...

References

[1] Besler, B. H., K. M. Merz, and P. A. Kollman. 1990. Atomic charges derived from semiempirical methods. J. Comp. Chem. 11:431-439.

[2] Cornell, W. D., P. Cieplak, C. I. Bayly, and P. A. Kollman. 1993. Application of RESP charges to calculate conformational energies, hydrogen bond energies, and free energies of solvation. J. Am. Chem. Soc. 115:9620-9631.

[3] MacKerell Jr., A. D., D. Bashford, M. Bellott, R. L. Dunbrack Jr., J. Evanseck, M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, I. W. E. Reiher, B. Roux, M. Schlenkrich, J. Smith, R. Stote, J. Straub, M.Watanabe, J. Wiorkiewicz-Kuczera, D. Yin, and M. Karplus. 1998. All-hydrogen empirical potential for molecular modeling and dynamics studies of proteins using the CHARMM22 force field. J. Phys. Chem. B 102:3586-3616.

[4] MacKerell Jr., A. D., J. Wiorkiewicz-Kuczera and M. Karplus 1995 An All-Atom Empirical Energy Function for the Simulation of Nucleic Acids J. Am. Chem, Soc. 117:11946-11975

[5] Mackerell Jr., A. D. 2004. Empirical Force Fields for Biological Macromolecules: Overview and Issues J. Comp. Chem. 25:1584--1604

[6] Rappé, A. K., C. J. Casewit, K. S. Colwell, W. A. Goddard III, and W. M. Skiff. 1992. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114:10024-10035.

[7] Reed, A., R. B. Weinstock, and F. Weinhold. 1985. Natural population analysis. J. Chem. Phys. 83:735-746.

[8] Singh, U. C. and P. A. Kollman. 1984. An approach to computing electrostatic charges for molecules. J. Comp. Chem. 5:129.