Re: failure in RATTLE

From: Sangamesh B (forum.san_at_gmail.com)
Date: Wed Dec 17 2008 - 03:02:38 CST

Thanks for the suggetsions..
On Wed, Dec 17, 2008 at 12:56 AM, Axel Kohlmeyer
<akohlmey_at_cmm.chem.upenn.edu> wrote:
> On Tue, 16 Dec 2008, Sangamesh B wrote:
>
> SB> Hello NAMD users,
> SB>
> SB> I'm running NAMD-2.6 built with MPICH2 & GNU compilers on Rocks-5
> SB> Linux cluster. There were no errors during compilation.
> SB> The examples given with NAMD distribution were also working.
> SB>
> SB> For one of the input file, the following error is appearing:
>
> for one the the input files from where?
> are they provided by a user, did you write
> them or did you download them?
>
One of our customer gave this input file for benchmark, saying that
job has run for 7 days with 16 processes on his cluster.
> SB> /opt/apps/namd26_gnu/Linux-amd64-MPI/charmrun +p4
> SB> /opt/apps/namd26_gnu/Linux-amd64-MPI/namd2 npt02.inp | tee
> SB> npt_result_out_ll
>
> you obviously didn't read the documentation well enough.
> when you compile with MPI, you _have_ to use mpirun and
> _not_ charmrun. howeve, if you simply run over GigE, there
> is no need to recompile, but you can use the provided
> precompiled binaries (and _with_ charmrun).
>
Ok. But what might be the wrong with the current install? I've used
FFTW-2.1.5 and TCL-8.4.
FFTW is built with --enable-float --enable-type-prefix
--enabled-shared and with the same GNU compilers.

> SB> ..
> SB>
> SB> ERROR: Atom 3601 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 3602 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 3603 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 20926 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 20927 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 20928 velocity is nan nan nan (limit is 10000)
>
> as an HPC consultant you should know that "nan" is an
> sign of indication of invalid math and that can either
> mean and invalid or bad input or a miscompiled binary.
>
>
> SB> ERROR: Atoms moving too fast; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 60!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 60!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 60!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 60!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 277!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 277!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 277!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB>
> SB> ..
> SB> ERROR: Atom 30757 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 30758 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 30759 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 34330 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 34331 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atom 34332 velocity is nan nan nan (limit is 10000)
> SB> ERROR: Atoms moving too fast; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 480!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 480!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 480!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 480!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 1082!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 1082!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 1082!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 1082!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 703!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 703!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Exiting prematurely.
> SB> ==========================================
> SB> WallClock: 4.136847 CPUTime: 4.136847 Memory: 128778 kB
> SB> End of program
> SB> ERROR: Exiting prematurely.
> SB> ==========================================
> SB> WallClock: 4.144087 CPUTime: 4.144086 Memory: 128778 kB
> SB> End of program
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 703!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Exiting prematurely.
> SB> ==========================================
> SB> WallClock: 4.156466 CPUTime: 4.156467 Memory: 128778 kB
> SB> End of program
> SB> ERROR: Constraint failure in RATTLE algorithm for atom 703!
> SB> ERROR: Constraint failure; simulation has become unstable.
> SB> ERROR: Exiting prematurely.
> SB> ==========================================
> SB> WallClock: 4.164990 CPUTime: 4.164990 Memory: 128778 kB
> SB> End of program
> SB>
> SB> I'm not understanding why these errors are appearing.
>
> simple: garbage in, garbage out.
>
> SB> Are these errors due to wrong parameters in the input file or NAMD not
> SB> installed properly or else due to lower memory on the computer? (4 Gb
> SB> mem, run on single node)
>
> classical MD does not require a lot of memory. please also note,
> that the quoted output indicates that you are running 4 (serial)
> copies instead of a proper parallel run (it should be even more
> obvious from the first part of the output, that you didn't quote).
>
> for the reason, see above.
>
Initial lines indicate that it's using mpirun:

$ cat npt_result_out_ll

Running on 4 processors: /opt/apps/namd26_gnu/Linux-amd64-MPI/namd2 npt02.inp
charmrun> mpirun -np 4 /opt/apps/namd26_gnu/Linux-amd64-MPI/namd2 npt02.inp
Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
Info: NAMD 2.6 for Linux-amd64-MPI-MPI
> SB>
> SB> Thanks in advance,
> SB> Sangamesh,
> SB> Consulatnt - HPC
>
> please note the typo. ;-)
>
> cheers,
> axel.
>
>
>
> --
> =======================================================================
> Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu
> Center for Molecular Modeling -- University of Pennsylvania
> Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
> tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
> =======================================================================
> If you make something idiot-proof, the universe creates a better idiot.
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:16 CST