Premature stop

From: paco ty (typaco_at_inbox.com)
Date: Thu Mar 08 2007 - 03:57:15 CST

Hi fellows,
I need some advice from friends with cluster experience.
I programmed a 500000 steps simulation of a 18000 atom system
in order to test a 2-nodes cluster. Short simulations run
perfectly with good scaling, but the 500000 simulation stoped
prematurely at step 135300. The non-TCP version stops at the first
or second step.
Another thing that makes me warry about it is that my expensive
3COM gigabit switch does not perform better than its small cheap
brother at 10/100. According to the led color of the switch, it
seems that my network adapter works at 1000. I don't know how to
test the rate alternatively.

I run namd2 with "./charmrun namd2 +p2 configfile.txt > logfile.txt"

Here is my machine configuration:

OS: Scientific Linux 4.4 i386 (kernel 2.6.9-42.0.3-ELsmp)
Software: NAMD2_2.6_Linux-i686-TCP (precompiled)
rsh as root works with no password
CPU: Intel(R) Pentium(R) 4 CPU 3.8 GHz
RAM: 1 G
Network adapter: Linksys EG1032 (10/100/1000)

I attach the first part of the logfile.

Well, does anybody knows the reason for such a behaviour?

Thank you in advance

Georgios


This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:44:27 CST