NAMD2.6 appears to be freezing at or during load balancing

From: Jonathan Bourne (jwb268_at_gmail.com)
Date: Thu Nov 06 2008 - 12:33:23 CST

Hello All,
     I'm following up on an e-mail I sent to the list on 11/12008
about 4 PM (est). Our lab recently added a custom built desktop for
dedicated NAMD simulations. The PC has an Intel Core 2 Quad Processor
Q6600 @ 2.40 GHz, Kingston 2 GB RAM, Asus P5N-T Deluxe motherboard,
aftermarket cooling, Windows XP Pro (SP3), and is running NAMD 2.6
i686 version from www.ks.uiuc.edu/Research/namd/
     Using this new desktop described above, I am running a simulation
that equilibrated successfully on another computer. The simulation
proceeded fine for a variable length of time (minutes hours) and
then NAMD seems to spontaneously hang (system clock stops, mouse input
fails to register) with no error message, or system error log
generated by NAMD or Windows on the new system, and requires the
computer to be hard restarted. This occurs identically whether NAMD
program is run alone, or with Charmrun. I used a Kubuntu 8.04.1
LiveCD, and re-run the program under Linux (NAMD_Linux-amd64 port also
from ks.uiuc.edu), and got an identical spontaneous hang results, so
the freeze happens in both the Linux and Win_xp NAMD ports.
     After carefully going through the log files, it looks like the
hang is occurring during the start of load balancing. My last entries
in the log before everything hangs are:
     WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 6000
     WRITING COORDINATES TO DCD FILE AT STEP 6000
     WRITING COORDINATES TO RESTART FILE AT STEP 6000
     FINISHED WRITING RESTART COORDINATES
     WRITING VELOCITIES TO RESTART FILE AT STEP 6000
     FINISHED WRITING RESTART VELOCITIES

     This led me to suspect that something related to load balancing
was causing the freeze. By changing the ldb period from 2000 steps,
to 50,000 steps I was able to complete 1 ns of equilibration of
ubq_ws_eq, which previously could not be achieved. Because of this, I
now strongly suspect that the system hang is somehow related to load
balancing.

My questions are:
1) Beyond the performance penalty I seem to be incurring, does
increasing the ldb period in someway compromise my simulations?
2) Does anyone know why load balancing may be causing my computer to hang?
3) Are there any other suggested work arounds or fixes for this type
of a problem beyond changing ldbPeriod?
4) Does anyone have NAMD2.6 working on a Core2 Quad system?

Thank you for your time and consideration.

Sincerely,
Jonathan (jwb268(at)gmail.com)

Graduate Student
Physiology, Biophysics, and Systems Biology Program
Weill Graduate School of Medical Sciences
Cornell University

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:03 CST