Re: NAMD hangs at "OPENING EXTENDED SYSTEM TRAJECTORY FILE" / PME error

From: Jim Pfaendtner (jpfaendt_at_hec.utah.edu)
Date: Mon Aug 06 2007 - 14:13:54 CDT

Hi Cesar,

Thanks for your note. The system has about 250,000 atoms. So its
big but not too big. I have submitted the job with anywhere between
180 and 250 processors and I have obtained the same result.

Here is some of the text from my logfile:

Info: *****************************
Info: Entering startup phase 0 with 257619 kB of memory in use.
Info: Entering startup phase 1 with 259206 kB of memory in use.
Info: Entering startup phase 2 with 293414 kB of memory in use.
Info: Entering startup phase 3 with 295499 kB of memory in use.
Info: PATCH GRID IS 8 (PERIODIC) BY 6 (PERIODIC) BY 8 (PERIODIC)
Info: REMOVING COM VELOCITY 0 0 0
Info: LARGEST PATCH (73) HAS 771 ATOMS
Info: Entering startup phase 4 with 330320 kB of memory in use.
Info: PME using 156 and 114 processors for FFT and reciprocal sum.
Info: PME GRID LOCATIONS: 1 2 3 5 6 7 9 10 11 12 ...
Info: PME TRANS LOCATIONS: 1 2 3 5 6 7 9 10 11 12 ...
Info: Entering startup phase 5 with 330332 kB of memory in use.
Info: Entering startup phase 6 with 298704 kB of memory in use.
Measuring processor speeds... Done.
Info: Entering startup phase 7 with 298716 kB of memory in use.
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 769 POINTS
Info: ABSOLUTE IMPRECISION IN FAST TABLE FORCE: 2.64698e-23 AT 11.9138
Info: RELATIVE IMPRECISION IN FAST TABLE FORCE: 1.40436e-16 AT 11.9138
Info: ABSOLUTE IMPRECISION IN SCOR TABLE FORCE: 4.10282e-22 AT 11.9138
Info: RELATIVE IMPRECISION IN SCOR TABLE FORCE: 3.56912e-15 AT 11.9138
Info: ABSOLUTE IMPRECISION IN VDWA TABLE ENERGY: 1.75 AT 0.0441942
Info: RELATIVE IMPRECISION IN VDWA TABLE ENERGY: 6.75473e-14 AT 11.9138
Info: ABSOLUTE IMPRECISION IN VDWA TABLE FORCE: 19968 AT 0.0441942
Info: RELATIVE IMPRECISION IN VDWA TABLE FORCE: 6.16499e-15 AT 0.0441942
Info: ABSOLUTE IMPRECISION IN VDWB TABLE ENERGY: 4.1359e-25 AT 11.8295
Info: RELATIVE IMPRECISION IN VDWB TABLE ENERGY: 7.56853e-15 AT 11.9138
Info: ABSOLUTE IMPRECISION IN VDWB TABLE FORCE: 3.87741e-26 AT 11.9138
Info: RELATIVE IMPRECISION IN VDWB TABLE FORCE: 5.97409e-16 AT 11.9138
Info: Entering startup phase 8 with 303991 kB of memory in use.
Info: Finished startup with 303991 kB of memory in use.
TCL: Running for 8000 steps
REASSIGNING VELOCITIES AT STEP 0 TO 0 KELVIN.
PRESSURE: 0 -1392.67 -17.5046 -1.16142 -17.5046 -1447.22 34.7197
-1.16142 34.7197 -1420.01
GPRESSURE: 0 -1418.28 -20.6952 8.56921 -14.6756 -1476.51 40.2623
-4.83769 34.4974 -1438.89
ETITLE: TS BOND ANGLE DIHED
IMPRP ELECT VDW BOUNDARY
MISC KINETIC TOTAL TEMP
       TOTAL2 TOTAL3 TEMPAVG PRESSURE
GPRESSURE VOLUME PRESSAVG GPRESSAVG

ENERGY: 0 1625.1381 4944.6790 7293.3891
165.6977 -1189127.0325 144389.1296 0.0000
0.0000 0.0000 -1030708.9991 0.0000 -
1030687.5292 -1030687.5292 0.0000 -1419.9673
-1444.5611 2811452.6146 -1419.9673 -1444.5611

OPENING EXTENDED SYSTEM TRAJECTORY FILE
<this is where it locks>

So it seems like NAMD wants about 300MB of memory and each compute
node has 16GB of memory on the cluster where I am submitting. I
tried to submit the job (we use LSF here) with:

#BSUB -R "rusage[Scratch=1:Memory=800]"

But that didn't work - so I'm checking with the sysadmins at the
supercomputing site. I will report back to the namd-l if it is a
memory error.

On Aug 6, 2007, at 10:16 AM, Cesar Luis Avila wrote:

> Are you running on a single machine? How large is the simulated
> system? Perhaps you are running out of memory?
>
> Jim Pfaendtner escribió:
>> Hi,
>>
>> NAMD is hanging at the start of a job with the error "OPENING
>> EXTENDED SYSTEM TRAJECTORY FILE" at the first time step. The
>> error does not occur when I use "PME off". However, I need to
>> have PME on - so I have to figure out what's going on.
>>
>> I am using an identical input file that I have previously used,
>> and I'm just changing the box size and PME grid size to fit the
>> system I'm using.
>>
>> Has anyone experienced this problem before? Or does anyone have
>> any suggestions on how to fix this?
>>
>> thank you,
>>
>> Jim
>>
>>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:45:03 CST