Re: Unpredictably Crashes upon Restarting

From: Victor Kwan (vkwan8_at_uwo.ca)
Date: Fri Mar 27 2020 - 17:32:01 CDT

Dear Matthew,

If you use a stochastic thermostat, the trajectory will diverge very
quickly unless you start each run with the same random seed.

On Fri, Mar 27, 2020 at 5:57 PM Matthew Guberman-Pfeffer <
matthew.guberman-pfeffer_at_yale.edu> wrote:

> Dear NAMD community,
>
> I have restarted my simulation from the same point (at 13.8 ns) and end up
> with a different outcome each time. Most times the simulation crashes with
> an error message, but the messages are always slightly different. I detail
> the messages and what I've tried below.
>
> 1) First restart: ran from 13.8 to 14.4 ns:
>
> ERROR: Atom 119 velocity is 63537.5 -53607.4 -29502 (limit is 12000, atom
> 197 of 651 on patch 16 pe 9)ERROR: Atom 127 velocity is -63953.9 53411.6
> 29213.9 (limit is 12000, atom 200 of 651 on patch 16 pe 9)ERROR: Atoms
> moving too fast; simulation has become unstable (2 atoms on patch 16 pe 9).
>
> 2) Restarted from 13.8 ns saving every 1 fs to the DCD to visual the
> issue. However, the simulation did not crash, and I was forced to
> terminate the job at 18.9 ns because the dcd was consuming nearly a TB of
> space.
>
> 3) Restarted from 13.8 ns saving less frequently, hoping to repeat the
> previous good performance while using less memory. But, at 16.3 ns, I got
> the below error:
>
> ERROR: Atom 125 velocity is -113436 -53195.6 -112022 (limit is 12000, atom
> 448 of 632 on patch 14 pe 19)
> ERROR: Atoms moving too fast; simulation has become unstable (1 atoms on
> patch 14 pe 19).
>
> 4) Restarted fromm 13.8 ns again. Now it crashed at 15.1 ns with:
>
> ERROR: Atom 120 velocity is 20172.4 -16771.5 -221899 (limit is 12000, atom
> 83 of 609 on patch 16 pe 9)
> ERROR: Atom 127 velocity is -20158.6 16781.3 221594 (limit is 12000, atom
> 93 of 609 on patch 16 pe 9)
> ERROR: Atoms moving too fast; simulation has become unstable (2 atoms on
> patch 16 pe 9).
>
> 5) Restarted from 13.8 ns. The simulation now crashed at 18.8 ns with:
>
> ERROR: Margin is too small for 1 atoms during timestep 18841762.
> ERROR: Incorrect nonbonded forces and energies may be calculated!
> ERROR: Atom 284 velocity is -17748.9 688.258 -64061.4 (limit is 12000,
> atom 124 of 616 on patch 17 pe 22)
> ERROR: Atoms moving too fast; simulation has become unstable (1 atoms on
> patch 17 pe 22).
>
> I get the point that the simulation is unstable. But why does it become
> unstable after 13+ ns? Why do the time at which the simulation crashes and
> the error message vary from one restart to the next? More importantly, what
> can I try to resolve whatever the problem is preventing this simulation
> from continuing?
>
> Best,
> Matthew
>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:13 CST