Re: linux cluster trouble

From: Brian Bennion (brian_at_youkai.llnl.gov)
Date: Wed Nov 10 2004 - 11:26:36 CST

Hello,
Just as a test, try compiling charm++ from cvs, and try the megatests on
one tower, two towers etc....

brian

On Tue, 9 Nov 2004, Mounir Tarek wrote:

> Thanks Gengbin
> I am not sure that's the case;
>
> Namd is running on any compute node .
>
> Charmrun (NAMD charmm++) runs on the master biproc, and any other biproc if
> I use a single tower
>
> NAMD_2.5_Linux-i686-TCP/charmrun +p 2 ..... ok runs on node 1 or node 4 or
> ...
>
> now if I use more than a single tower (node) (3 procs and above) I get the
> following
>
> .........................
> .........................
> after initial info
> the energies are calculated ...
>
> then
> I have the following message
>
> Charmrun: error on request socket--
> Socket closed before recv......
>
> ________________________________________________
> during this I see the procs runing namd2, initialization is done, ....
>
> why recompiling would fix this?
>
>
> thanks
>
>
>
> Selon Gengbin Zheng <gzheng_at_ks.uiuc.edu>:
>
> >
> > Hi,
> >
> > The error says that one of process on a compute node died and charmrun
> > detected it. I guess the binary may not be fully compatible with your
> > machines. The best chance is that you compile NAMD from source.
> >
> > Gengbin
> >
> > Mounir Tarek wrote:
> >
> > >Hi
> > >
> > >
> > >I have a similarr problem that I have submitted to the list a couple of
> > weeks
> > >ago.
> > >I have setu up my cluster using the ROCKS NPACI distribution.
> > >
> > >I have not solved it yet,
> > >tought at some point it was the rsh protocol.
> > >
> > >I have got it to run on 4 procs passed the first iteration once in a while
> > (at
> > >random), results not reproduceable.
> > >
> > >Any suggestions please don't hesitate
> > >
> > >
> > >
> > >Selon mpappala_at_unict.it:
> > >
> > >
> > >
> > >>Hi,
> > >>I build up a small linux cluster (red hat 9.0) configurated as follow:
> > >>
> > >>pentium IV 2.8 Ghz 1Gb ram connected each other with
> > >>switch Hp procurve 2724 (giga)
> > >>
> > >>I use namd2.5 ver. Linux-i686-TCP, other version does not work.
> > >>
> > >>When i start my simulation on 2 processors, it works well. If I increase
> > the
> > >>number of processors 4, 6, 8 etc. the program generates the following
> > error
> > >>message:
> > >>
> > >>Charmrun: error on request socket--
> > >>Socket closed before recv.
> > >>
> > >>this message appear faster if i increase the number of processors
> > >>
> > >>Could someone help me?
> > >>
> > >>greetings
> > >>
> > >>Matteo Pappalardo Ph.D.
> > >>Biophysical Chemistry Lab.
> > >>Department of Chemical Science
> > >>University of Catania
> > >>Viale A.Doria 6, 95125 Catania
> > >>ITALY
> > >>Tel. +39-95-7385204
> > >>Fax +39-95-580138
> > >>Email:mpappala_at_dipchi.unict.it
> > >>
> > >>
> > >>
> > >>----------------------------------------------------------------
> > >>Università di Catania - C.E.A.
> > >>Servizio di Posta Elettronica
> > >>http://www.cea.unict.it
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > >
> > >
> >
>
>
> --
> Mounir Tarek
> Equipe de dynamique des assemblages membranaires
> Unité Mixte de Recherches CNRS UHP 7565
> Université Henri-Poincaré, Nancy I
> BP 239, 54506 Vandoeuvre-lès-Nancy, cedex France
> tel: (33) 3 83 68 40 95
> Fax: (33) 3 83 68 43 87
>

*****************************************************************
**Brian Bennion, Ph.D. **
**Computational and Systems Biology Division **
**Biology and Biotechnology Research Program **
**Lawrence Livermore National Laboratory **
**P.O. Box 808, L-448 bennion1_at_llnl.gov **
**7000 East Avenue phone: (925) 422-5722 **
**Livermore, CA 94550 fax: (925) 424-6605 **
*****************************************************************

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:38:59 CST