"Timeout waiting for node-program to connect" error, only sometimes

From: Himanshu Khandelia (hkhandel_at_memphys.sdu.dk)
Date: Wed May 09 2007 - 02:58:29 CDT

Hello,

I have been running NAMD on our linux cluster fairly successfully.
(NAMD_2.6_Linux-i686). However, once in a while, NAMD crashes with the
following error:

#########
Charmrun> error 22 attaching to node:
Timeout waiting for node-program to connect
#########

The hardware is Dell PowerEdge 2950, 2x 2,66Ghz Intel Woodcrest CPUs, 8 GB
Ram. So that is 4 cpus per node. The jobs in question run on 8 nodes (32
cpus)

A more detailed overview of the cluster hardware is available here:
http://www.dcsc.sdu.dk/overview.php.

Are there any hints why this might be happening ?

Thank you,

-Himanshu

----------------------------
Himanshu Khandelia, PhD
Research Assistant Professor (Postdoc)
MEMPHYS, Center for Membrane Physics: www.memphys.sdu.dk
University of Southern Denmark (SDU)
Odense M 5230, Denmark

Phone: +4565503510
email: hkhandel_at_memphys.sdu.dk
WWW: www.memphys.sdu.dk/~hkhandel
-----------------------------

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:44:39 CST