From: brady chang (brady_chang_at_hotmail.com)
Date: Fri Mar 25 2005 - 14:54:01 CST
Hi all, I'm having a very perculiar problem with NAMD.
I was wondering if anybody have see this?
Platform Rocks 3.3:
dual xeon; ASUS PRDL533 MOBO.
command:
#!/bin/csh -f
setenv CONV_RSH ssh
~~/apps/NAMD/NAMD_2.5_Linux-i686-TCP/charmrun
~~/apps/NAMD/NAMD_2.5_Linux-i686-TCP/namd2 +p26 ++verbose ++nodelist
./.nodelist md_1ns.inp >logmd
after running for ~12 hours I get
Charmrun: error on request socket--
Socket closed before recv.
and brought the node down
modified the command to exclude the downed node in my .nodelist.
then after running for ~ 4 hours I got the same error and brought down
another node.
So I'm running it again excluding the downed nodes.
temperature is normal, load is average. I'm not seeing anything that could
cause the node to go down.
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:17 CST