Re: Correctness of Simulation Results

From: Peter Freddolino (petefred_at_ks.uiuc.edu)
Date: Tue Jun 10 2008 - 11:45:03 CDT

Rahul,
for points 1/2, you said you observed "significant" differences, but
didn't mention what these differences were. How large are they?
Also, were you using a thermostat or barostat?

For point 3, this is on the order of the difference you'd expect
converting between double precision (cpu namd) and single precision
(cuda) floating point numbers, and thus should not be surprising.

Best,
Peter

Rahul wrote:
> Thank you for your replies. Does anyone have an explanation for the
> three observations I have mentioned?
>
> On Mon, Jun 9, 2008 at 10:33 PM, Axel Kohlmeyer
> <akohlmey_at_cmm.chem.upenn.edu <mailto:akohlmey_at_cmm.chem.upenn.edu>> wrote:
>
> On Mon, 9 Jun 2008, Rahul wrote:
>
> RM> Hi all,
>
> hi rahul,
>
>
> RM> I am a college student currently working on a research project
> aimed at
> RM> porting NAMD simulations to the NVIDIA Tesla. I have basic
> questions
> RM> regarding the importance of the accuracy of MD results, and
> the motivation
> RM> for these questions arises from the following observations.
> RM>
> RM> 1. I have run NAMD for the same input coordinates and other
> simulation
> RM> parameters on different machines. Even when NAMD is
> compiled without any
> RM> optimizations, output coordinates are significantly
> different for different
> RM> machines. This leads to the basic question: How important
> is the correctness
> RM> of the trajectories as far as molecular dynamics simulation
> is concerned,
> RM> and depending on this, how does one determine which result
> is correct, given
> RM> that results are different on different machines?
>
> a classical MD is practically the numerical solution to a
> system of coupled partial differential equations and as such
> not lyapunov stable, i.e. the tinest errors will accumulate
> to exponential divergence of the trajectories. since you are
> using floating point math (and particularly in single precision)
> numerical errors (e.g. through summing up differently due to
> different load balancing) accumulate fast.
>
> the resulting trajectories should still be statistical mechanical
> meaningful and for that you have to use parameters related to that.
> the easiest property to follow is energy conservation (without
> using a thermostat!). there are other properties that should be
> preserved as well (average volume in variable cell, average pressure,
> individual forces for a given, identical conformation, the
> distribution
> of forces over time, radial distribution functions, velocity
> auto-correlation functions etc.).
>
>
> RM> 2. Running the publicly available binaries of
> CUDA-accelerated NAMD gives
> RM> significantly different results from those obtained by
> running the
> RM> non-accelerated versions. For reference, I carried out
> tests on the ApoA1
> RM> benchmark, freely available on the NAMD website. Does this
> mean that the
> RM> CUDA accelerated version of NAMD gives inaccurate results?
> Or alternatively,
> RM> are results that important?
> RM> 3. I am trying to look for opportunities for CUDAfication
> within the
> RM> ComputeNonbondedUtil class, and just for experience, I am
> currently trying
> RM> to port the first j loop within the function 'calc_pair' to
> the GPU. This
> RM> loop is the one that follows the first call to
> 'pairlist_from_pairlist', and
> RM> is defined under the NORMAL flag. So I have created a
> kernel function and
> RM> enclosed the loop within it. I am ensuring that all
> required parameters are
> RM> being correctly passed and any changes in variables having
> a bigger scope
> RM> are reflected as the control comes out of the function. The
> odd thing is
> RM> that the output is equivalent to the reference output (i.e.
> the output I get
> RM> on the same machine using pre-compiled binaries) when this
> kernel function
> RM> is run on the host. However, when it is run on the GPU
> device, the output
> RM> becomes different. Analysis using gdb shows that during the
> first run of the
> RM> kernel function itself, the values pointed by 'f_1' are
> different from what
> RM> they are in the reference (unmodified source) compilation.
> The difference is
> RM> very small though, of the order of 0.0000000000001 %. I am
> not sure whether
> RM> the overall difference in output is due to this very
> reason, or due to other
> RM> factors also. But all other variables in the scope of the
> kernel are
> RM> absolutely acurate in their values.
> RM>
> RM> Does anyone have an idea about the level of accuracy required
> in MD
> RM> simulations, and the reason for the above discrepancies?
>
> hope this helps. you may want to have a look into a text book
> on MD and its errors, too.
>
> axel.
>
> RM>
> RM> Thank you.
> RM>
> RM> Rahul
> RM>
>
> --
> =======================================================================
> Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu
> <mailto:akohlmey_at_cmm.chem.upenn.edu> http://www.cmm.upenn.edu
> Center for Molecular Modeling -- University of Pennsylvania
> Department of Chemistry, 231 S.34th Street, Philadelphia, PA
> 19104-6323
> tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
> =======================================================================
> If you make something idiot-proof, the universe creates a better
> idiot.
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:34 CST