Re: numerical inaccuracy upon restart

From: nordgren_at_sas.upenn.edu
Date: Wed Aug 25 2004 - 22:40:31 CDT

Next message: Chris Samuel: "Re: Compilation on Opteron"
Previous message: Leandro Martinez: "Re: Compilation on Opteron"
In reply to: Blake Charlebois: "numerical inaccuracy upon restart"
Next in thread: Harald Tepper: "Re: Re: numerical inaccuracy upon restart"
Reply: Harald Tepper: "Re: Re: numerical inaccuracy upon restart"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Dear Blake (and list):

I believe there is a simple and logical answer here. Anytime a NAMD run
begins, the random number generator is seeded (either by the user-supplied
seed or by the current system time). From then on, during a single run,
the (psuedo)random numbers needed by the program will be taken from a
deterministic list.

Thus, if you split one run into two separate ones, you are effectively
choosing a different random seed for the start of the second run (compared
to where you would be halfway through the single run), and any random
numbers after that will be divergent in the two cases. This is why the
two cases are identical at first, but differ slightly as soon as the second
shorter run begins.

This would be the case using *any* of the constant-temperature algorithms
(except, I believe, velocity rescaling) and not just the Langevin method,
since they all involve using random numbers. If you really aren't using
any CT algorithm, then I'm not sure where the random numbers are being used,
but I strongly suspect that this is still the cause of your observations.

In any case, seeing discrepancies at the 10^-7 level isn't exactly
catastrophic for most applications.

- Erik

C. Erik Nordgren, Ph.D.
Department of Chemistry
University of Pennsylvania

> Hello everyone,
>
> This is further to a post by Harald Tepper
> (http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/0972.html) in
> which he noted that one run of length t does not produce the same results
> as
> two consecutive runs of length t/2 each, which suggests that it is best
> to
> avoid splitting a run into several smaller runs. The details of my own
> similar test, in which I found that restarting introduces small
> differences
> in the total system energy at the 8th significant digit, are at the end
> of
> this message.
>
> This interests me because I am using a computational facility at which I
> cannot run jobs for more than 8 hours.
>
> >From the discussion on shadow orbits, etc., by Frenkel & Smit
> (Understanding
> Molecular Dynamics from Algorithms to Applications, 2002, pp. 71-74) and
> from the discussion of single and double precision in Appendix A of the
> GROMACS manual (pp. 169-170), I would expect that neither slight
> imprecision
> nor slight inaccuracy leads to trajectories that are more inaccurate than
> would result from high numerical precision and accuracy. Therefore,
> repeatedly terminating and restarting simulations will not decrease
> accuracy.
>
> Am I correct in assuming that repeatedly terminating and restarting
> simulations is ok for free dynamics runs and for steered MD runs?
>
> Thank you,
> Blake Charlebois
>
> DETAILS:
>
> System: a protein solvated in water
> Atoms in system: 12696
> Number of processors: 4 (on one machine)
> Timestep: 1.0 fs
> Temperature control: none
>
> I tried several runs using binary coordinate and velocity files:
> A) a 200-fs run (initial step 0) starting from an initial set of
> coordinates
> and velocities
> B1) a 100-fs run (initial step 0) starting from the same initial state as
> A
> B2) a 100-fs run (initial step 100) starting from the end of B1
>
> I used the same random number seed for all runs.
>
> I specified an output every timestep:
> outputEnergies 1
> outputPressure 1
> stepspercycle 1
>
> I monitored total system energy.
>
> The first 100 fs of run A seems to be identical to run B1. However, the
> second 100 fs of run A is different from run B2. For instance, energies in
> A
> and B1 are the same while energies in A and B2 are the same to at least 7
> significant figures. The discrepancy does not seem to increase over 100
> fs.
> I would have expected a them to be the same to a few more digits. I
> suppose
> errors propagate quickly in such a large system.
>
> The backbone RMSD between A and B1 seems to be zero while the backbone
> RMSD
> between A and B2 seems to be (very roughly) 4e-7 Angstroms. It is 6e-7
> for
> the entire protein and 1e-6 Angstroms for all atoms including solvent.
>
> Loosely related:
> http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/0585.html
> http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/0780.html
> http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/0677.html
> http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/0937.html
>

Next message: Chris Samuel: "Re: Compilation on Opteron"
Previous message: Leandro Martinez: "Re: Compilation on Opteron"
In reply to: Blake Charlebois: "numerical inaccuracy upon restart"
Next in thread: Harald Tepper: "Re: Re: numerical inaccuracy upon restart"
Reply: Harald Tepper: "Re: Re: numerical inaccuracy upon restart"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:37:49 CST