**From:** Peter Freddolino (*petefred_at_ks.uiuc.edu*)

**Date:** Thu Nov 29 2007 - 13:25:35 CST

**Next message:**Ilya Chorny: "Re: Is there solution to numerical inaccuracy"**Previous message:**Ilya Chorny: "Re: Is there solution to numerical inaccuracy"**In reply to:**Alok Juneja: "Re: Is there solution to numerical inaccuracy"**Next in thread:**Alok Juneja: "Re: Is there solution to numerical inaccuracy"**Reply:**Alok Juneja: "Re: Is there solution to numerical inaccuracy"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi Alok,

it sounds like there are several different issues here. Let's work our

way down the hierarchy... at the end I'll discuss how this actually

affects simulation reliability/accuracy.

If you run constant temperature in parallel, you should not expect

identical results even using identical runs with the same seed. This was

mentioned earlier in this discussion (see my first reply) and in the

manual (http://www.ks.uiuc.edu/Research/namd/2.6/ug/node26.html) and

occurs because of nondeterminism in the order of different processes

communicating with the head node.

If you run constant temperature on one processor, and you run the SAME

simulation twice, you should expect the same results; for example,

running B1 twice.

If you run constant temperature on one processor but you restart, you

should not expect identical results to a non-restarted run because the

same seed is being applied to a different point. So, for example,

running A with a seed and B1 with the same seed *should* give them same

result, but because you don't know the internal state of the

simulation's random number generator at the end of B1, you should *not*

expect the second half of A to correspond to B2. Note that this is the

one place where namd differs from charmm, since charmm's restart files

carry a seed. It sounds like this is what you're running into here.

If you run NVE simulations, you should always get the same results

(limited by floating point imprecision, if you're running in parallel)

from the same input coordinates and velocities.

Because of the third point above, what you did isn't really a fair test

of the input precision; the proper test of the input precision would be

to run A, B1, and B2 all as *NVE* simulations; if they're NVT, then even

on one processor, I believe you'd expect different effects from the RNG.

The differences between the second half of A and B2 should then be

compared to two separate B2 runs.

Perhaps the most important question is, does this matter. For NVT

simulations, the nondeterminism of langevin dynamics between

serial/parallel runs and across restarts should not matter if you do

sufficient sampling, since either way you're sampling from the same

ensemble. As long as you do enough sampling to get meaningful results,

all of your observables should come out identical. The precision of

restarts themselves *does* matter, since imprecision here actually

changes the physics of what you're doing (this is particularly important

in NVE).

So, by my best understanding, barring any input/output imprecision

(which will only be apparent to tests in NVE), B1 and B2 should be

considered as good as A because they're both sampling from the same

thermodynamic ensemble, and the only differences are in things that are

supposed to be random (ie, the Langevin random forces); there's nothing

that makes the particular random force in a given timestep of A more or

less correct that that in B2. I just spoke with Jim Phillips, who

confirmed that the old imprecision-on-restart issues were fixed

immediately after the old discussion thread you linked to, so there

should be no problems with the restart files themselves.

Please let me know if any of this is unclear.

Best,

Peter

Alok Juneja wrote:

*> Dear Peter,
*

*>
*

*> Yes, I am specifying the identical seed value in A (complete run), B1
*

*> (1st half) and B2 (2nd half). A is one complete run where as B1 & B2
*

*> simulations are serial that means that I am using the restart file of
*

*> B1 for the B2 run.
*

*> Peter, I am not clear with what do you mean by serial or parallel? As
*

*> I mentioned earlier my runs BI and B2 are serial. This simulation I am
*

*> running on single same processor. Kindly mention the link where the
*

*> non-determinism of the Langevin thermostat in parallel has been talked
*

*> about.
*

*>
*

*> So comming back to square one, after reading all the comments in this
*

*> discussion, I believe there exist NO solution to this problem that is
*

*> occuring either because of numerical inaccuracy or non-determinism.
*

*>
*

*> Could the B1 and B2 MD runs be considered as good as single A MD run.
*

*>
*

*> -Alok
*

*>
*

*> Peter Freddolino wrote:
*

*>
*

*>> Hi Alok,
*

*>> just to verify, since you're running NVT, did you specify a seed value
*

*>> in your config file for the A-B1-B2 simulations? And were your
*

*>> production runs serial or parallel? If your production runs are done in
*

*>> parallel then the differences you observe in the first part of your
*

*>> email are really unremarkable, and have nothing to do with precision and
*

*>> everything to do with the nondeterminism of the langevin thermostat in
*

*>> parallel that has been mentioned earlier.
*

*>> Best,
*

*>> Peter
*

*>>
*

*>> Alok Juneja wrote:
*

*>>
*

*>>
*

*>>> Dear Peter, Dave, Himanshu & other list member,
*

*>>>
*

*>>> Sorry for not answering ealier though I was regularly following the
*

*>>> discussion on this issue. As requested by Peter, I am providing my
*

*>>> findings about this issue..
*

*>>>
*

*>>> I am running constant temperature 50 ns dynamics, total of 25000000
*

*>>> steps with time step of 0.002ps and dcdfreq of 100 however restartfreq
*

*>>> of 100000. Somehow my MD crashed at 5459300 but my last restrart was
*

*>>> 5400000. I restarted with this. I am doing this MD to see the protein
*

*>>> behavious and am calculating the N and C terminal distance (Ang.).
*

*>>> Following is the N-C terminal distance before crash and after crash. I
*

*>>> am running this simulation in parallel.
*

*>>>
*

*>>> # TIME(PS) Before-Crash After-Crash
*

*>>> 10800 10.833
*

*>>> 10800.2 11.3259 11.0924
*

*>>> 10800.4 11.2417 11.1039
*

*>>> 10800.6 10.985 10.9962
*

*>>> 10800.8 10.7715 11.1593
*

*>>> 10801 11.3783 11.4828
*

*>>> 10801.2 11.1862 10.9861
*

*>>> 10801.4 11.3925 10.9671
*

*>>> 10801.6 10.8473 10.9287
*

*>>> (*) 10801.8 10.5789 11.013
*

*>>> 10802 10.8792 10.4324
*

*>>> 10802.2 10.6182 10.4422
*

*>>> 10802.4 10.8918 10.6541
*

*>>> 10802.6 10.9267 10.7829
*

*>>> 10802.8 10.6352 10.8386
*

*>>> 10803 10.8069 10.4295
*

*>>> (*) 10803.2 11.3242 10.5952 (*) 10803.4
*

*>>> 11.3397 10.4784
*

*>>> (*) 10803.6 11.5822 10.4696
*

*>>> (*) 10803.8 11.023 10.8231
*

*>>> 10804 10.9887 10.4586
*

*>>> 10804.2 10.5118 10.3266
*

*>>> (*) 10804.4 10.4329 9.95989
*

*>>> 10804.6 10.6863 10.2366
*

*>>> (*) 10804.8 11.3551 10.2149
*

*>>> (*) 10805 11.3445 9.88589
*

*>>> 10805.2 10.7702 10.1757
*

*>>> 10805.4 10.4436 10.3636
*

*>>> 10805.6 10.3206 10.2086
*

*>>> 10805.8 10.8214 10.5937
*

*>>> 10806 11.2742 10.3849
*

*>>> 10806.2 11.44 10.2721
*

*>>> (*) 10806.4 11.2566 10.1909
*

*>>> 10806.6 10.9381 10.7606
*

*>>> 10806.8 11.5617 10.8286
*

*>>> 10807 11.7283 11.246
*

*>>> 10807.2 11.4038 11.2901
*

*>>> 10807.4 10.5862 10.708
*

*>>> 10807.6 10.61 10.6308
*

*>>> 10807.8 11.1818 10.2391
*

*>>> 10808 11.3433 10.5278
*

*>>> 10808.2 11.1947 11.0142
*

*>>> 10808.4 10.9988 11.2578
*

*>>> (*) 10808.6 10.447 11.334
*

*>>> 10808.8 10.3205 10.9368
*

*>>> 10809 10.7634 10.9165
*

*>>> 10809.2 10.7874 11.1041
*

*>>> 10809.4 11.011 11.15
*

*>>> 10809.6 10.8222 10.9214
*

*>>> 10809.8 10.8731 10.2806
*

*>>> 10810 11.0003 10.908
*

*>>>
*

*>>> You will find so many time steps where the difference is remarkable
*

*>>> (indicated by *). I believe that these difference is too much for me.
*

*>>> I checked this and found that this is not the case with CHARMM where
*

*>>> you get the identical results even after restart.
*

*>>>
*

*>>> For your ready reference, I am attaching the total energy graph for
*

*>>> comparision (comparision.pdf
*

*>>> [http://www.geocities.com/junejaalok/comparision.pdf]).
*

*>>> As requested by Dave, I am attaching file A-B1-B2.pdf
*

*>>> [http://www.geocities.com/junejaalok/A-B1-B2.pdf], the job run on
*

*>>> single same processor.
*

*>>>
*

*>>> Test A energy profile on
*

*>>> [http://www.geocities.com/junejaalok/testA.txt]
*

*>>> TestB1 energy profile on
*

*>>> [http://www.geocities.com/junejaalok/testB1.txt]
*

*>>> TestB2 energy profile on
*

*>>> [http://www.geocities.com/junejaalok/testB2.txt]
*

*>>>
*

*>>> since, i am restricted the with the amount of characters that one can
*

*>>> write in NAMD forum and the size of attachments, I am putting an extra
*

*>>> links for you to see the files and results..hope you understand.
*

*>>>
*

*>>> I appreciate your efforts to get into the depth. But I believe the
*

*>>> NAMD developers should really think over this issue..however, any
*

*>>> solution and suggestions in this regard would be of great help for
*

*>>> others as well..
*

*>>>
*

*>>>
*

*>>> Best Wishes,
*

*>>> Alok
*

*>>>
*

*>>
*

*>>
*

*>>
*

*>>
*

**Next message:**Ilya Chorny: "Re: Is there solution to numerical inaccuracy"**Previous message:**Ilya Chorny: "Re: Is there solution to numerical inaccuracy"**In reply to:**Alok Juneja: "Re: Is there solution to numerical inaccuracy"**Next in thread:**Alok Juneja: "Re: Is there solution to numerical inaccuracy"**Reply:**Alok Juneja: "Re: Is there solution to numerical inaccuracy"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

*
This archive was generated by hypermail 2.1.6
: Wed Feb 29 2012 - 15:45:37 CST
*