namd 2.62b FATAL ERROR: Memory allocation failed on processor 0 or higher

From: Thomas Caulfield (thomas.caulfield_at_chemistry.gatech.edu)
Date: Sun Sep 03 2006 - 20:54:06 CDT

Hello All (NAMD community):

For a large system, run on LinuX NetworX Evolocity II cluster with 60
nodes (120 processors). My question relates to whether this is a
hardware problem, or if it is a software problem.

I am running into a memory error. When I ran a smaller simulation
that was scaling up to this full system one (which had 1,000,000
atoms) there were no problems. Sometimes it gets to processor 6 or 7
before the crash occurs.

Each slave node has the following:

*Evolocity (.8U wide) Intel Rackmount Compute Module, incl P/S
*EIDE hard drive (120GB) 7200RPM 120GB PATA 7200 RPM
* 2 Pentium Xeon 2.8 GHz, PC533 processor, 512k L2 Cache
* 2 512MB PC2700 DDR Memory ECC REG Incl
* 1 Super Micro X5DPR−8G2+, 6 DIMM slots Dual Intel Xeon
(533/400MHz FSB)
* Intel E7501 chipset
* (1) 64−bit 133MHz PCI−X
* Adaptec AIC−7902 Ultra320 SCSI controller
* Intel 82546EB dual port Gigabit
* ATI Rage XL 8MB PCI graphic controller

HERE is an OVERVIEW of the ERROR: (I am assuming that this system
size is just exceeding the memory capacity per node?)

For Full System:
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 2524826 ATOMS
Info: 1774106 BONDS
Info: 1224931 ANGLES
Info: 691809 DIHEDRALS
Info: 43710 IMPROPERS
Info: 0 EXCLUSIONS
Info: 250859 FIXED ATOMS
Info: 6821901 DEGREES OF FREEDOM
Info: 911489 HYDROGEN GROUPS
Info: 148790 HYDROGEN GROUPS WITH ALL ATOMS FIXED
Info: TOTAL MASS = 1.60067e+07 amu
Info: TOTAL CHARGE = 19.9999 e
Info: *****************************
Info: Entering startup phase 0 with 685641 kB of memory in use.
Info: Entering startup phase 1 with 685641 kB of memory in use.
FATAL ERROR: Memory allocation failed on processor 0.

It did work for the partial system below though:
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 251459 ATOMS
Info: 262671 BONDS
Info: 470453 ANGLES
Info: 693542 DIHEDRALS
Info: 43830 IMPROPERS
Info: 0 EXCLUSIONS
Info: 106193 FIXED ATOMS
Info: 435798 DEGREES OF FREEDOM
Info: 149190 HYDROGEN GROUPS
Info: 53056 HYDROGEN GROUPS WITH ALL ATOMS FIXED
Info: TOTAL MASS = 2.21721e+06 amu
Info: TOTAL CHARGE = -3835 e
Info: *****************************
Info: Entering startup phase 0 with 88793 kB of memory in use.
Info: Entering startup phase 1 with 88793 kB of memory in use.
Info: Entering startup phase 2 with 174897 kB of memory in use.
Info: Entering startup phase 3 with 174897 kB of memory in use.
Info: PATCH GRID IS 13 BY 11 BY 9
Info: REMOVING COM VELOCITY 0 0 0
Info: Entering startup phase 4 with 194193 kB of memory in use.
Info: Entering startup phase 5 with 194193 kB of memory in use.
Info: Entering startup phase 6 with 194193 kB of memory in use.
Info: Entering startup phase 7 with 194193 kB of memory in use.
Info: COULOMB TABLE R-SQUARED SPACING: 0.0625
Info: COULOMB TABLE SIZE: 2309 POINTS
Info: Entering startup phase 8 with 194193 kB of memory in use.
Info: Finished startup with 194193 kB of memory in use.
TCL: Minimizing for 50 steps
ETITLE: TS BOND ANGLE DIHED IMPRP
ELECT VDW BOUNDARY MISC
       KINETIC TOTAL TEMP
<More Output continues.....aka it works in this case>

Thanks for any valuable insights in advance.

Best regards,

-Tom Caulfield
****************************************
Tom Caulfield, Ph.D. Candidate
School of Chemistry & Biochemistry
Cherry Emerson Bldg., RM 329
Georgia Institute
of Technology
Atlanta, GA 30332-0400
Harvey Laboratory:
http://rumour.biology.gatech.edu
****************************************

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:32 CST