Re: hanging at startup phase

From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Fri May 21 2004 - 00:59:55 CDT

Hi,

Have you tried running on only one node but with 2 processes (using
++local charmrun option)?

Step 0 only creates some internal data structures for communication
followed by a quiescence detection. I suspect it hangs at the quiescence
detection where the parallel system is waiting for all processors to
finish processing all messages.

"Info: REMOVING COM VELOCITY -0.0259478 -0.0273245 -0.014764" printout
should appear in step 3. Each startup phase/step follows by the quiescence
detection. I suspect somehow the quiescence detection fails on your
machine, but I don't know why.

btw, can other people using the same cluster run namd without problem?

Gengbin

On Wed, 19 May 2004, allison wrote:

> Hello,
> I am fairly new to using NAMD. I am trying to run it on a cluster running
> Debian where each node has 2 processors. If I launch two processes on one
> node using charmrun everything works fine. If I try to use multiple nodes
> everything works fine until:
> Info: Entering startup phase 0 with 22577 kB of memory in use
>
> If I just try to use two nodes it gets further into the startup phase with:
> Info: REMOVING COM VELOCITY -0.0259478 -0.0273245 -0.014764
>
> Both times it just hangs at the those lines. I haven't let it run for more
> than about an hour before just killing it. One of the nodes shows two namd
> processes running full tilt, but all of the other spawned processes on the
> other nodes are sleeping. I asked a few people I know who have run similar
> namd simulations and they said they had never seen it take this long.
>
> Any idea about what's going on? or a way I could get more information about
> what it's trying to do at these points?
>
> Thank you
>
> Allison Heath
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:38:40 CST