socket closed error

From: brady chang (brady_chang_at_hotmail.com)
Date: Fri Mar 25 2005 - 14:54:01 CST

Hi all, I'm having a very perculiar problem with NAMD.

I was wondering if anybody have see this?

Platform Rocks 3.3:
dual xeon; ASUS PRDL533 MOBO.

command:
#!/bin/csh -f

setenv CONV_RSH ssh

~~/apps/NAMD/NAMD_2.5_Linux-i686-TCP/charmrun
~~/apps/NAMD/NAMD_2.5_Linux-i686-TCP/namd2 +p26 ++verbose ++nodelist
./.nodelist md_1ns.inp >logmd

after running for ~12 hours I get

Charmrun: error on request socket--
Socket closed before recv.

and brought the node down

modified the command to exclude the downed node in my .nodelist.
then after running for ~ 4 hours I got the same error and brought down
another node.
So I'm running it again excluding the downed nodes.

temperature is normal, load is average. I'm not seeing anything that could
cause the node to go down.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:17 CST