cluseter node freezes while running namd 2.5/2.5b1

From: Richard Brown (richardlbj_at_yahoo.com)
Date: Sat Oct 18 2003 - 22:48:30 CDT

I have been try to figure this out for the past two
month with no luck.

I have a 8-node PC cluster that consists of 16 athlon
mp2200+, msi k7d master-l mb, intel i82557/i82558
10/100 on-board lan, 500mb kingston ddr266 pc2100
unbuffered, 3com superstack III baseline 24 port
10/100 switch.

The cluster was built using oscar2.1/redhat7.3 w/ the
kernel update 2.4.20-20. namd used includes 2.5b1 and
the latest 2.5, both linux binary distributions and
source code builds. the simulation tested is apoa1
benchmark examples.

namd/apoa1 only runs w/o problems on a single cluster
node, either with one or two cpus. Every time it runs
on two or more nodes, either using one or two cpus
from each node, namd/apoa1 stops somewhere in the
middle of run. One of the nodes freezes and does not
respond to ping, ssh or the directly attached
keyboard. Most of the time there were no error
messages. A few times I received apic error or sorket
receive failure. I tried plugging a ps/2 mouse into
the nodes as some people suggested for a bug of the
motherboad but it did not help.

I don't know how to proceed from here. Any suggestions
would be appreciated.

Thanks,
Richard

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:37:04 CST