From: Dong Luo (us917_at_yahoo.com)
Date: Tue Mar 29 2011 - 11:00:22 CDT
I have the same problem of NAMD 2.7/2.8b1 only run for limited steps on the
platform of BlueGeneL. The number of steps only depends on the system size and
quite repeatable for the same system. It looks like a memory issue. Both NAMD
2.7/2.8b1 are compiled according to http://bluegene.bnl.gov/comp/buildnamd.html.
NAMD 2.7 hang with error message:
"FATAL ERROR: Memory allocation failed on processor 0."
NAMD 2.8b1 does not have any error message. The only Warning message is about
binary file convert. Below are parts of the log:
Charm++> Running on MPI version: 2.0 multi-thread support: 0 (max supported: -1)
[0] isomalloc.c> Disabling isomalloc because no free virtual address space
Charm++> Running on 128 unique compute nodes (1-way SMP).
Info: NAMD CVS-2011-03-28 for BlueGeneL-MPI
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60303 for mpi-bluegenel-xlc
Info: Built Mon Mar 28 14:50:58 EDT 2011 by dongluo on lee
Info: Running on 256 processors, 256 nodes, 128 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 17.6502 s
Info: 17.3047 MB of memory in use based on /proc/self/stat
Info: STRUCTURE SUMMARY:
Info: 179021 ATOMS
Info: 139534 BONDS
Info: 156216 ANGLES
Info: 163705 DIHEDRALS
Info: 1990 IMPROPERS
Info: 0 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 537063 DEGREES OF FREEDOM
Info: 63886 HYDROGEN GROUPS
Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
Info: 63886 MIGRATION GROUPS
Info: 4 ATOMS IN LARGEST MIGRATION GROUP
Info: TOTAL MASS = 1.06581e+06 amu
Info: TOTAL CHARGE = 5.83753e-05 e
Info: MASS DENSITY = 1.04144 g/cm^3
Info: ATOM DENSITY = 0.105341 atoms/A^3
Info: Entering startup at 55.472 s, 17.3047 MB of memory in use
Info: Startup phase 0 took 0.000834967 s, 17.3047 MB of memory in use
Info: Startup phase 1 took 3.16115 s, 17.3047 MB of memory in use
Info: Startup phase 2 took 0.00247623 s, 17.3047 MB of memory in use
Info: Startup phase 3 took 0.000712251 s, 17.3047 MB of memory in use
Info: PATCH GRID IS 15 (PERIODIC) BY 7 (PERIODIC) BY 6 (PERIODIC)
Info: PATCH GRID IS 2-AWAY BY 1-AWAY BY 1-AWAY
Info: LARGEST PATCH (112) HAS 330 ATOMS
Info: Startup phase 4 took 0.566039 s, 17.3047 MB of memory in use
Info: PME using 68 and 63 processors for FFT and reciprocal sum.
Info: PME GRID LOCATIONS: 3 7 11 15 19 23 27 31 35 39 ...
Info: PME TRANS LOCATIONS: 1 5 9 13 17 21 25 29 33 37 ...
Info: Startup phase 5 took 0.111723 s, 17.3047 MB of memory in use
Info: Startup phase 6 took 0.101994 s, 17.3047 MB of memory in use
LDB: Central LB being created...
Info: Startup phase 7 took 1.2017 s, 17.3047 MB of memory in use
Info: CREATING 19170 COMPUTE OBJECTS
Info: useSync: 0 useProxySync: 0
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 769 POINTS
Info: Benchmark time: 256 CPUs 0.0585523 s/step 0.677689 days/ns 17.3047 MB
memory
Info: Benchmark time: 256 CPUs 0.0526625 s/step 0.60952 days/ns 17.3047 MB
memory
Info: Benchmark time: 256 CPUs 0.0528017 s/step 0.611131 days/ns 17.3047 MB
memory
The last position output (seq=23000) takes 0.122 seconds, 17.305 MB of memory in
use
WRITING VELOCITIES TO RESTART FILE AT STEP 23000
FINISHED WRITING RESTART VELOCITIES
The last velocity output (seq=23000) takes 0.105 seconds, 17.305 MB of memory in
use
above is last message in the log. Only 23000 steps are run before hanging.
Dong
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:52 CST