Large scale Molecular Dynamics simulations produce an immense quantity of data. To study realistically a medium sized protein requires the determination of the positions of at least 10,000 atoms, every 10-15s. Principal Component Analysis is a standard mathematical tool used to detect correlations in large data sets. We are investigating how to use best this technique to automatically extract information from a molecular dynamics simulation. This is helpful in analyzing the motions of flexible regions in proteins. It can also be used for the detection of ill-equilibrated regions of a protein.

Most MD studies are one shot experiments. MD simulations usually require months of computation and, most of the time, is not practical to repeat them. It becomes, then, not trivial to differentiate systematic changes of the structure of a protein from artifacts that will not reproduce themselves if the same study could be repeated many times. Principal Component Analysis can be used, combined with physical models of the protein motion, to define objective criteria able to discriminate relevant conformational changes in a protein from the background of atomic fluctuations.

Principal components can be used to compare the motions of two MD trajectories.
Systematic displacements can then be identified.

The shortness of MD simulations compared with the characteristic relaxation times of proteins impose, however, severe constrains on the information that can be extracted from a Principal Component analysis. For example, it cannot be used to extrapolate motions detected on a given simulation to longer time scales. We are working to clarify this issues and define what exactly can be learned from MD simulations by the use of Principal Component analysis.


Manel Balsera
Willy Wriggers

References