Re: scalability problem on linux cluster

From: Peter Freddolino (petefred_at_ks.uiuc.edu)
Date: Wed Nov 12 2008 - 10:29:57 CST

You need to give namd a host list (as described in the release notes --
http://www.ks.uiuc.edu/Research/namd/2.6/notes.html) if you want it to
run on multiple networked machines (unless you're in a special
clustering environment) -- otherwise namd has no way of knowing where to
run. The fact that it says running on 4 proessors indicates, as Giacomo
noted, that 4 processes have been spawned, but since you're running with
++local it is still only running on the head node.
Best,
Peter

Ruchi Sachdeva wrote:
> Hi Giacomo,
>
> If I don't use ++local and run the job on 4 cpus, then I get following
> error in the log file:
>
> connect to address 127.0.0.1 <http://127.0.0.1>: No route to host
> connect to address 127.0.0.1 <http://127.0.0.1>: No route to host
> connect to address 127.0.0.1 <http://127.0.0.1>: No route to host
> connect to address 127.0.0.1 <http://127.0.0.1>: No route to host
> trying normal rsh (/usr/bin/rsh)
> connect to address 127.0.0.1 <http://127.0.0.1>: No route to host
> trying normal rsh (/usr/bin/rsh)
> localhost.localdomain: No route to host
> Charmrun> Error 1 returned from rsh (localhost:0)
> No route to host
> localhost.localdomain: No route to host
>
> And with ++local, the log file mentions the number of processors on
> which I launch the job, like this:
>
> Info: Based on Charm++/Converse 50900 for net-linux-tcp-iccstatic
> Info: Built Wed Aug 30 13:00:33 CDT 2006 by jim on verdun.ks.uiuc.edu
> <http://verdun.ks.uiuc.edu>
> Info: 1 NAMD 2.6 Linux-i686-TCP 4 n98 rsachdeva
> Info: Running on 4 processors.
>
> So that means the job is getting distributed on the right number of
> processors. Isn't it? Am I getting it correct?
>
> Well, thanks for your reply
>
> Ruchi
>
> On 11/12/08, *Giacomo Fiorin* <gfiorin_at_seas.upenn.edu
> <mailto:gfiorin_at_seas.upenn.edu>> wrote:
>
> Hi Ruchi, if you use ++local, you'll keep running only on the first
> node. You actually create N processes, but they get distributed
> always among two processors only.
>
> Giacomo
>
>
> ---- -----
> Giacomo Fiorin
> Center for Molecular Modeling at
> University of Pennsylvania
> 231 S 34th Street, Philadelphia, PA 19104-6323
> phone: (+1)-215-573-4773
> fax: (+1)-215-573-6233
> mobile: (+1)-267-324-7676
> mail: giacomo.fiorin_<at>_gmail.com <http://gmail.com>
> web: http://www.cmm.upenn.edu/
> ---- ----
>
>
>
>
> On Wed, Nov 12, 2008 at 9:28 AM, Ruchi Sachdeva
> <ruchi.namd_at_gmail.com <mailto:ruchi.namd_at_gmail.com>> wrote:
> > Dear All,
> >
> > I am using NAMD2.6 (pre compiled binaries) on linux (x86_64) 288
> nodes
> > cluster based on HP Intel Xeon-based ProLiant systems. It has
> InfiniBand
> > 10Gbps cluster interconnect. I ran apoA1 test job on different
> number of
> > processors as follows:
> >
> > /nfshomen278/rsachdeva/NAMD_2.6_Linux-i686-TCP/charmrun
> > /nfshomen278/rsachdeva/NAMD_2.6_Linux-i686-TCP/namd2 ++local +p2
> apoa1.namd
> >> apoa1.log &
> >
> > The jobs were submiited using bsub command. I got the following
> speed:
> >
> > Benchmark time: 1 CPUs 3.12916 s/step 36.2171 days/ns
> >
> > Benchmark time: 2 CPUs 1.62206 s/step 18.7738 days/ns
> >
> > Benchmark time: 4 CPUs 1.65563 s/step 19.1624 days/ns
> >
> > Benchmark time: 8 CPUs 1.64875 s/step 19.0828 days/ns
> >
> > Benchmark time: 16 CPUs 1.67945 s/step 19.4381 days/ns
> >
> > As we can see that CPU effiecieny is not increasing beyond 2
> cpus. With 4 &
> > more number of cpus, runtime is not decreasing much, rather it
> is increasing
> > with 4 & 16-cpus. Can anybody please tell me why I am getting poor
> > performance with greater number of cpus?
> >
> > Shall I gain better scalability if I compile namd on the cluster
> rather than
> > using pre compiled binaries? And which version of namd would be
> better:
> > charm based or mpi based?
> >
> > Thanks in advance
> >
> > Ruchi
> >
> >
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:05 CST