Re: Rif: Re: Rif: Re: Rif: Re: linux cluster trouble

From: Brian Bennion (brian_at_youkai.llnl.gov)
Date: Thu Nov 18 2004 - 10:56:17 CST

Hello Matteo

I am out of ideas here. It might be something really simple that I am
missing.

Jim, Gengbin, Sameer any ideas?

Brian

On Thu, 18 Nov 2004, max wrote:

> hello brian,
>
> yes i see pgm running on other three machines
> here enclosed you will found the output of namd;
> bash output is:
>
> /Linux-i686-TCP-icc/namd2 /home/matteo/NAMD_2
> 5_Source/Linux-i686-TCP-icc/PrP.namd > namd.out
> Charmrun> charmrun started...
> Charmrun> using ./nodelist as nodesfile
> Charmrun> rsh (ctcfgr6:0d) started
> Charmrun> rsh (ctcfgr10:1d) started
> Charmrun> rsh (ctcfgr11:2d) started
> Charmrun> rsh (ctcfgr9:3d) started
> Charmrun> node programs all started
> Charmrun> node programs all connected
> Charmrun: error on request socket--
> Socket closed before recv.
> [matteo_at_ctcfgr6 megatest]$
>
> i can not use rsh, because red hat 9.0 disables it by default, and instead
> of rsh i use ssh; i can ssh to each node without pwd;
> as suggested in notes.text i inserted "setenv CONV_RSH ssh" in my .bashrc
>
>
>
> i tried to use command strace chramrun .....ecc.. It show a lacking of send
> and receive data before namd stop
>
> matteo
>
> -------Messaggio originale-------
>
> Da: Brian Bennion
> Data: 11/18/04 08:31:54
> A: max
> Oggetto: Re: Rif: Re: Rif: namd-l: Re: linux cluster trouble
>
> HI Matteo
>
> Okay, things seem okay here.
> try this.
> ../charmrun +p4 ++verbose /pathtonamd2/ namd.configfile
> replace the pathtonamd and namdconfigfile with real names and paths
> and tell me what happens.
>
> can you rsh into each node without a password?
>
> can you see the pgm tests running on the other three machines in your pgm
> tests?
>
>
> On Thu, 18 Nov 2004, max wrote:
>
> > hi brian,
> >
> > i used exactly the command reported on notes.txt that you can found on
> namd
> > site:
> >
> > ./charmrun ++local +p1 ./pgm
> > first, and secondarly
> >
> > ./charmrun ++p 4 ++verbose ./pgm
> > with the nodelistfile in ./ directory
> >
> > this two test show a strange results:
> > 1 processor test inished in about 0.23 s.
> > 4 processor test finished in about 3,2 min.
> > it is usefull?
> >
> > My cluster it is connected with a ethernet switch 10/100/1000 hp procurve
> > 2724 j4897a; each pc is equiped with 3 Com giga
> >
> > yes the linux tcp, should be the net-linux-tcp, anyway i compile the
> > net-linux-tcp-icc
> >
> > thanks for the help, i am becoming crazy with this problem
> >
> >
> > matteo
> >
> >
> > -------Messaggio originale-------
> >
> > Da: Brian Bennion
> > Data: 11/17/04 19:56:22
> > A: max
> > Oggetto: Re: Rif: namd-l: Re: linux cluster trouble
> >
> >
> > Hi Matteo,
> >
> > Thanks for the info. Could you also share the exact commands used to
> > launch the megatest and your namd jobs?
> > Is your cluster connected by a special switch or vanilla ethernet 100MB
> > etc...
> >
> > Finally you state below that you downloaded the linux-tcp, is this the
> > net-linux-tcp version?
> >
> > Regards
> > Brian
> >
> > On Wed, 17 Nov 2004, max wrote:
> >
> > > hello,
> > > i downloaded namd 2.5 from namd home page, both linux-tcp version both
> > > source code
> > > my cluster work under linux red hat 9.0
> > > i launched simulation on two single pc processors , and the simualtion
> is
> > > ok;
> > > conversely, when i launched the same simulation on three, four or more
> > > processors namd stopped.
> > > i tryed the charmrun pgm test on four machines and it is ok
> > >
> > > I remain looking forward any further suggestion......
> > > Bye
> > >
> > > matteo
> > >
> > > -------Messaggio originale-------
> > >
> > > Da: Brian Bennion
> > > Data: 11/16/04 19:00:37
> > > A: max
> > > Cc: namd-l_at_ks.uiuc.edu
> > > Oggetto: namd-l: Re: linux cluster trouble
> > >
> > > Hello Matteo
> > >
> > > Can you give more details about your setup?
> > > Are you running on more than one machine?
> > > What operating system do you have?
> > > Which version of charm++ and NAMD are you using?
> > >
> > > The only help I can suggest based on the info below is that you are
> trying
> > > to run on more cpus than you have available....
> > >
> > >
> > > Regards
> > > Brian
> > >
> > > On Tue, 16 Nov 2004, max wrote:
> > >
> > > > hi,
> > > > i tryed to compile namd on my pc but the results is the same:
> > > > Charmrun: error on request socket--
> > > > Socket closed before recv.
> > > >
> > > >
> > > > any suggestion
> > > >
> > > > matteo pappalardo
> > > >
> > > >
> > > >
> > > >
> > >
> > > *****************************************************************
> > > **Brian Bennion, Ph.D. **
> > > **Computational and Systems Biology Division **
> > > **Biology and Biotechnology Research Program **
> > > **Lawrence Livermore National Laboratory **
> > > **P.O. Box 808, L-448 bennion1_at_llnl.gov **
> > > **7000 East Avenue phone: (925) 422-5722 **
> > > **Livermore, CA 94550 fax: (925) 424-6605 **
> > > *****************************************************************
> > >
> >
> > *****************************************************************
> > **Brian Bennion, Ph.D. **
> > **Computational and Systems Biology Division **
> > **Biology and Biotechnology Research Program **
> > **Lawrence Livermore National Laboratory **
> > **P.O. Box 808, L-448 bennion1_at_llnl.gov **
> > **7000 East Avenue phone: (925) 422-5722 **
> > **Livermore, CA 94550 fax: (925) 424-6605 **
> > *****************************************************************
> >
>
> *****************************************************************
> **Brian Bennion, Ph.D. **
> **Computational and Systems Biology Division **
> **Biology and Biotechnology Research Program **
> **Lawrence Livermore National Laboratory **
> **P.O. Box 808, L-448 bennion1_at_llnl.gov **
> **7000 East Avenue phone: (925) 422-5722 **
> **Livermore, CA 94550 fax: (925) 424-6605 **
> *****************************************************************
>

*****************************************************************
**Brian Bennion, Ph.D. **
**Computational and Systems Biology Division **
**Biology and Biotechnology Research Program **
**Lawrence Livermore National Laboratory **
**P.O. Box 808, L-448 bennion1_at_llnl.gov **
**7000 East Avenue phone: (925) 422-5722 **
**Livermore, CA 94550 fax: (925) 424-6605 **
*****************************************************************

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:00 CST