Re: failure in RATTLE

From: Sangamesh B (forum.san_at_gmail.com)
Date: Thu Dec 18 2008 - 03:51:54 CST

On Wed, Dec 17, 2008 at 9:22 PM, Axel Kohlmeyer
<akohlmey_at_cmm.chem.upenn.edu> wrote:
>
> On Wed, 17 Dec 2008, Sangamesh B wrote:
>
> SB> One of our customer gave this input file for benchmark, saying that
> SB> job has run for 7 days with 16 processes on his cluster.
>
> yes. but was it the _exact_ same input? i doubt it.
> be it as it may, the output indicates that it is very
> likely that the input is starting from a high potential
> energy configuration and that this does not run stable.
>
Right. The input file might be differing from the actuals.
Today I downloaded binaries from NAMD site. The same error produced here also:

$ /opt/apps/NAMD_2.6_Linux-amd64/charmrun
/opt/apps/NAMD_2.6_Linux-amd64/namd2 ++local +p4 npt02.inp | tee
npt02_charm_out
Info: NAMD 2.6 for Linux-amd64
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: and send feedback or bug reports to namd_at_ks.uiuc.edu
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 50900 for net-linux-amd64-iccstatic
Info: Built Wed Aug 30 12:54:51 CDT 2006 by jim on belfast.ks.uiuc.edu
Info: 1 NAMD 2.6 Linux-amd64 4 locuzcluster.org user1
Info: Running on 4 processors.
Info: 7608 kB of memory in use.
Info: Memory usage based on mallinfo
Info: Configuration file is npt02.inp
TCL: Suspending until startup complete.
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 1
Info: NUMBER OF STEPS 500000
Info: STEPS PER CYCLE 8
Info: PERIODIC CELL BASIS 1 103.131 0 0
Info: PERIODIC CELL BASIS 2 0 47.997 0
Info: PERIODIC CELL BASIS 3 0 0 70.482
Info: PERIODIC CELL CENTER 0 0 0
Info: LOAD BALANCE STRATEGY Other
Info: LDB PERIOD 1600 steps
Info: FIRST LDB TIMESTEP 40
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: PME BACKGROUND SCALING 1
Info: MAX SELF PARTITIONS 50
Info: MAX PAIR PARTITIONS 20
Info: SELF PARTITION ATOMS 125
Info: PAIR PARTITION ATOMS 200
Info: PAIR2 PARTITION ATOMS 400
Info: MIN ATOMS PER PATCH 100
Info: INITIAL TEMPERATURE 300
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 SCALE FACTOR 1
Info: DCD FILENAME npt02.dcd
Info: DCD FREQUENCY 1000
Info: DCD FIRST STEP 1000
Info: DCD FILE WILL CONTAIN UNIT CELL DATA
Info: XST FILENAME npt02.xst
Info: XST FREQUENCY 10000
Info: NO VELOCITY DCD OUTPUT
Info: OUTPUT FILENAME npt02
Info: BINARY OUTPUT FILES WILL BE USED
Info: RESTART FILENAME npt02
Info: RESTART FREQUENCY 10000
Info: BINARY RESTART FILES WILL BE USED
Info: SWITCHING ACTIVE
Info: SWITCHING ON 12
Info: SWITCHING OFF 13.5
Info: PAIRLIST DISTANCE 15
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 0.525
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 18.025
Info: ENERGY OUTPUT STEPS 1000
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 10000
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 300
Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
Info: LANGEVIN DYNAMICS APPLIED TO HYDROGENS
Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
Info: TARGET PRESSURE IS 1 BAR
Info: OSCILLATION PERIOD IS 200 FS
Info: DECAY TIME IS 500 FS
Info: PISTON TEMPERATURE IS 300 K
Info: PRESSURE CONTROL IS GROUP-BASED
Info: INITIAL STRAIN RATE IS 0 0 0
Info: CELL FLUCTUATION IS ISOTROPIC
Info: PARTICLE MESH EWALD (PME) ACTIVE
Info: PME TOLERANCE 1e-06
Info: PME EWALD COEFFICIENT 0.227942
Info: PME INTERPOLATION ORDER 4
Info: PME GRID DIMENSIONS 120 60 80
Info: PME MAXIMUM GRID SPACING 1.5
Info: Attempting to read FFTW data from FFTW_NAMD_2.6_Linux-amd64.txt
Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
Info: Writing FFTW data to FFTW_NAMD_2.6_Linux-amd64.txt
Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 4
Info: USING VERLET I (r-RESPA) MTS SCHEME.
Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
Info: RIGID BONDS TO HYDROGEN : ALL
Info: ERROR TOLERANCE : 1e-08
Info: MAX ITERATIONS : 100
Info: RIGID WATER USING SETTLE ALGORITHM
Info: NONBONDED FORCES EVALUATED EVERY 2 STEPS
Info: RANDOM NUMBER SEED 1229582922
Info: USE HYDROGEN BONDS? NO
Info: COORDINATE PDB dimer_11087wat.pdb
Info: STRUCTURE FILE dimer_11087wat.psf
Info: PARAMETER file: CHARMM format!
Info: PARAMETERS par_all22_prot.inp
Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
Info: SUMMARY OF PARAMETERS:
Info: 139 BONDS
Info: 345 ANGLES
Info: 452 DIHEDRAL
Info: 43 IMPROPER
Info: 0 CROSSTERM
Info: 95 VDW
Info: 0 VDW_PAIRS
Warning: Ignored 11087 bonds with zero force constants.
Warning: Will get H-H distance in rigid H2O from H-O-H angle.
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 34521 ATOMS
Info: 23440 BONDS
Info: 13363 ANGLES
Info: 3294 DIHEDRALS
Info: 232 IMPROPERS
Info: 0 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 33877 RIGID BONDS
Info: 69686 DEGREES OF FREEDOM
Info: 11731 HYDROGEN GROUPS
Info: TOTAL MASS = 208897 amu
Info: TOTAL CHARGE = 1.00583e-06 e
Info: *****************************
Info: Entering startup phase 0 with 14988 kB of memory in use.
Info: Entering startup phase 1 with 14992 kB of memory in use.
Info: Entering startup phase 2 with 21124 kB of memory in use.
Info: Entering startup phase 3 with 21124 kB of memory in use.
Info: PATCH GRID IS 5 (PERIODIC) BY 2 (PERIODIC) BY 3 (PERIODIC)
Info: REMOVING COM VELOCITY -0.011864 -0.00227754 -0.0185928
Info: LARGEST PATCH (13) HAS 1301 ATOMS
Info: CREATING 5741 COMPUTE OBJECTS
Info: Entering startup phase 4 with 26208 kB of memory in use.
Info: PME using 4 and 4 processors for FFT and reciprocal sum.
Info: PME GRID LOCATIONS: 0 1 2 3
Info: PME TRANS LOCATIONS: 0 1 2 3
Info: Optimizing 4 FFT steps. 1... 2... 3... 4... Done.
Info: Entering startup phase 5 with 27732 kB of memory in use.
Info: Entering startup phase 6 with 27764 kB of memory in use.
Measuring processor speeds... Done.
Info: Entering startup phase 7 with 27764 kB of memory in use.
Info: CREATING 5741 COMPUTE OBJECTS
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 769 POINTS
Info: ABSOLUTE IMPRECISION IN VDWA TABLE ENERGY: 2.46519e-32 AT 13.4884
Info: RELATIVE IMPRECISION IN VDWA TABLE ENERGY: 1.58832e-16 AT 13.4884
Info: ABSOLUTE IMPRECISION IN VDWA TABLE FORCE: 2.46519e-32 AT 13.4884
Info: RELATIVE IMPRECISION IN VDWA TABLE FORCE: 1.35812e-16 AT 13.4884
Info: ABSOLUTE IMPRECISION IN VDWB TABLE ENERGY: 2.06795e-25 AT 13.4884
Info: RELATIVE IMPRECISION IN VDWB TABLE ENERGY: 2.141e-16 AT 13.4884
Info: Entering startup phase 8 with 27764 kB of memory in use.
Info: Finished startup with 27764 kB of memory in use.
ERROR: Constraint failure in RATTLE algorithm for atom 711!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 1!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 17!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 258!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 783!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 703!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 762!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 634!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 1082!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 480!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 1074!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 60!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 277!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 76!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 459!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Exiting prematurely.
==========================================
WallClock: 30.211748 CPUTime: 29.200562 Memory: 27764 kB

I'll enquire about this with the customer.

Thanks,
Sangamesh
> [...]
>
> SB> Ok. But what might be the wrong with the current install? I've used
>
> depending on the optimization level of compilers codes can
> give slightly different results. for a marginal input that
> may just be enough. as i was writing before, somebody in your
> line of work should be well aware of that.
>
> [...]
>
> SB> Initial lines indicate that it's using mpirun:
> SB>
> SB> $ cat npt_result_out_ll
> SB>
> SB> Running on 4 processors: /opt/apps/namd26_gnu/Linux-amd64-MPI/namd2 npt02.inp
> SB> charmrun> mpirun -np 4 /opt/apps/namd26_gnu/Linux-amd64-MPI/namd2 npt02.inp
> SB> Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
> SB> Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
> SB> Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
>
> this is not correct. how else would you get each output 4 times?
> this only indicates, that you _compiled_ for MPI.
>
> if you look a little bit deeper in the output file (from the mail
> that didn't make it through the mailing list), you'll see that
> NAMD reports running on one processor four times.
>
> cheers,
> axel.
>
> --
> =======================================================================
> Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu
> Center for Molecular Modeling -- University of Pennsylvania
> Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
> tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
> =======================================================================
> If you make something idiot-proof, the universe creates a better idiot.
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:21:36 CST