Re: linux cluster trouble

From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Tue Nov 09 2004 - 08:19:10 CST

Hi,

  The error says that one of process on a compute node died and charmrun
detected it. I guess the binary may not be fully compatible with your
machines. The best chance is that you compile NAMD from source.

Gengbin

Mounir Tarek wrote:

>Hi
>
>
>I have a similarr problem that I have submitted to the list a couple of weeks
>ago.
>I have setu up my cluster using the ROCKS NPACI distribution.
>
>I have not solved it yet,
>tought at some point it was the rsh protocol.
>
>I have got it to run on 4 procs passed the first iteration once in a while (at
>random), results not reproduceable.
>
>Any suggestions please don't hesitate
>
>
>
>Selon mpappala_at_unict.it:
>
>
>
>>Hi,
>>I build up a small linux cluster (red hat 9.0) configurated as follow:
>>
>>pentium IV 2.8 Ghz 1Gb ram connected each other with
>>switch Hp procurve 2724 (giga)
>>
>>I use namd2.5 ver. Linux-i686-TCP, other version does not work.
>>
>>When i start my simulation on 2 processors, it works well. If I increase the
>>number of processors 4, 6, 8 etc. the program generates the following error
>>message:
>>
>>Charmrun: error on request socket--
>>Socket closed before recv.
>>
>>this message appear faster if i increase the number of processors
>>
>>Could someone help me?
>>
>>greetings
>>
>>Matteo Pappalardo Ph.D.
>>Biophysical Chemistry Lab.
>>Department of Chemical Science
>>University of Catania
>>Viale A.Doria 6, 95125 Catania
>>ITALY
>>Tel. +39-95-7385204
>>Fax +39-95-580138
>>Email:mpappala_at_dipchi.unict.it
>>
>>
>>
>>----------------------------------------------------------------
>>UniversitÓ di Catania - C.E.A.
>>Servizio di Posta Elettronica
>>http://www.cea.unict.it
>>
>>
>>
>>
>
>
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:37:59 CST