Re: linux cluster trouble

From: Mounir Tarek (Mounir.Tarek_at_edam.uhp-nancy.fr)
Date: Tue Nov 09 2004 - 08:43:03 CST

Thanks Gengbin
I am not sure that's the case;

Namd is running on any compute node .

Charmrun (NAMD charmm++) runs on the master biproc, and any other biproc if
I use a single tower

NAMD_2.5_Linux-i686-TCP/charmrun +p 2 ..... ok runs on node 1 or node 4 or
...

now if I use more than a single tower (node) (3 procs and above) I get the
following

.........................
.........................
after initial info
the energies are calculated ...

then
I have the following message

Charmrun: error on request socket--
Socket closed before recv......

________________________________________________
during this I see the procs runing namd2, initialization is done, ....

why recompiling would fix this?

thanks

Selon Gengbin Zheng <gzheng_at_ks.uiuc.edu>:

>
> Hi,
>
> The error says that one of process on a compute node died and charmrun
> detected it. I guess the binary may not be fully compatible with your
> machines. The best chance is that you compile NAMD from source.
>
> Gengbin
>
> Mounir Tarek wrote:
>
> >Hi
> >
> >
> >I have a similarr problem that I have submitted to the list a couple of
> weeks
> >ago.
> >I have setu up my cluster using the ROCKS NPACI distribution.
> >
> >I have not solved it yet,
> >tought at some point it was the rsh protocol.
> >
> >I have got it to run on 4 procs passed the first iteration once in a while
> (at
> >random), results not reproduceable.
> >
> >Any suggestions please don't hesitate
> >
> >
> >
> >Selon mpappala_at_unict.it:
> >
> >
> >
> >>Hi,
> >>I build up a small linux cluster (red hat 9.0) configurated as follow:
> >>
> >>pentium IV 2.8 Ghz 1Gb ram connected each other with
> >>switch Hp procurve 2724 (giga)
> >>
> >>I use namd2.5 ver. Linux-i686-TCP, other version does not work.
> >>
> >>When i start my simulation on 2 processors, it works well. If I increase
> the
> >>number of processors 4, 6, 8 etc. the program generates the following
> error
> >>message:
> >>
> >>Charmrun: error on request socket--
> >>Socket closed before recv.
> >>
> >>this message appear faster if i increase the number of processors
> >>
> >>Could someone help me?
> >>
> >>greetings
> >>
> >>Matteo Pappalardo Ph.D.
> >>Biophysical Chemistry Lab.
> >>Department of Chemical Science
> >>University of Catania
> >>Viale A.Doria 6, 95125 Catania
> >>ITALY
> >>Tel. +39-95-7385204
> >>Fax +39-95-580138
> >>Email:mpappala_at_dipchi.unict.it
> >>
> >>
> >>
> >>----------------------------------------------------------------
> >>Università di Catania - C.E.A.
> >>Servizio di Posta Elettronica
> >>http://www.cea.unict.it
> >>
> >>
> >>
> >>
> >
> >
> >
> >
>

-- 
Mounir Tarek
Equipe de dynamique des assemblages membranaires
Unité Mixte de Recherches CNRS UHP  7565
Université Henri-Poincaré, Nancy I
BP 239, 54506 Vandoeuvre-lès-Nancy, cedex France
tel: (33) 3 83 68 40 95 
Fax: (33) 3 83 68 43 87  

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:38:58 CST