Re: namd scale-up

From: Revthi Sanker (revthi.sanker1990_at_gmail.com)
Date: Thu Sep 05 2013 - 08:30:40 CDT

Dear Sir,
The detail you had requested:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping : 7
cpu MHz : 1200.000
cache size : 20480 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx
smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave
avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority
ept vpid
bogomips : 5200.05
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

A
 total of 16 processors are there in each node.
Thank you so much for your time sir.

Revathi.S
M.S. Research Scholar
Indian Institute Of Technology, Madras
India
_________________________________

On Thu, Sep 5, 2013 at 6:28 PM, Norman Geist <norman.geist_at_uni-greifswald.de
> wrote:

> Give us the output of "cat /proc/cpuinfo" from one of the compute nodes. As
> you are using openmpi, you already have the idlepoll behavior enabled by
> default.
>
> Norman Geist.
>
> > -----Ursprüngliche Nachricht-----
> > Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> > Auftrag von Revthi Sanker
> > Gesendet: Donnerstag, 5. September 2013 10:47
> > An: Norman Geist
> > Cc: Axel Kohlmeyer; Namd Mailing List
> > Betreff: Re: namd-l: namd scale-up
> >
> > Dear Sir,
> > This is the script I am using to run NAMD in my cluster:
> >
> > #!/bin/bash
> > #@ output = test.out
> > #@ error = test.err
> > #@ job_type = MPICH
> > #@ class = Medium128
> > #@ node = 8
> > #@tasks_per_node = 16
> > #@ environment = COPY_ALL
> > #@ queue
> > Jobid=`echo $LOADL_STEP_ID | cut -f 6 -d .`
> > tmpdir=$HOME/scratch/job$Jobid
> > mkdir -p $tmpdir; cd $tmpdir
> > cat $LOADL_HOSTFILE >xx
> > cp -R $LOADL_STEP_INITDIR/* $tmpdir
> > cat $LOADL_HOSTFILE > ./host.list
> > export LD_LIBRARY_PATH=/sware/openmpi1.6/lib:$LD_LIBRARY_PATH
> > /sware/openmpi1.6/bin/mpirun --mca btl openib,self -np 128 -hostfile
> > $LOADL_HOSTFILE /sware/NAMD_2.9_Source/Linux-x86_64-g++/namd2 md9.namd
> > >cetp_ana_md9.log
> > mv ../job$Jobid $LOADL_STEP_INITDIR
> >
> > Am I failing to include something? Kinldy provide your valuable
> > suggestions
> > in this regard.
> > Thanks in advance.
> >
> > Revathi.S
> > M.S. Research Scholar
> > Indian Institute Of Technology, Madras
> > India
> > _________________________________
> >
> >
> > On Thu, Sep 5, 2013 at 12:02 PM, Norman Geist <
> > norman.geist_at_uni-greifswald.de> wrote:
> >
> > > Hi Revthi,****
> > >
> > > ** **
> > >
> > > you should also have mentioned if you use an NAMD compiled against
> > charm++
> > > or MPI. If charm++, try “+idlepoll” to the namd2 command, it should
> > > additionally improve scaling, sometimes two fold. Furthermore, if you
> > have
> > > hyperthreading or magnycores, try to use half of the cores claimed
> > per node
> > > and bind the processes to real physical cores only. You can use
> > > “/proc/cpuinfo” to determine that. “processors” with same “physical
> > id” and
> > > “core id” usually appear to be the same physical core, these should
> > not be
> > > used as they are bottlenecked due memory or fpu. Using “taskset” on
> > the
> > > namd2 command, you can easily control which cores are allowed. ****
> > >
> > > ** **
> > >
> > > Example:****
> > >
> > > ** **
> > >
> > > charmrun +p 64 ++nodelist nodelist taskset –c 0,2,4,6 namd2 +idlepoll
> > > my.in****
> > >
> > > ** **
> > >
> > > If you do not have virtual cores, forget about the above for now, but
> > keep
> > > in mind for the future as it has a large impact.****
> > >
> > > ** **
> > >
> > > Additionally, it is easy to say how well a scaling is, if you just
> > compare
> > > the speedup to the ideal linear case. Therefore simply devide the
> > time/step
> > > of 1node by time /step of n nodes. This number will usually be <= n
> > nodes.
> > > The nearer it is to n nodes, the better. Do some benchmarks while
> > > increasing number of nodes and keep in mind that there can be a point
> > of
> > > outscaling, where the time/step will start raising again. But you do
> > not
> > > seem to hit that case already.****
> > >
> > > ** **
> > >
> > > So far I think there’s a little more to squeeze out for 300K system
> > doing
> > > about 2.5ns/day.****
> > >
> > > ** **
> > >
> > > Good luck****
> > >
> > > ** **
> > >
> > > Norman Geist.****
> > >
> > > ** **
> > >
> > > *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> > > Auftrag von *Axel Kohlmeyer
> > > *Gesendet:* Mittwoch, 4. September 2013 10:01
> > > *An:* Revthi Sanker
> > > *Cc:* namd-l_at_ks.uiuc.edu
> > > *Betreff:* Re: namd-l: namd scale-up****
> > >
> > > ** **
> > >
> > > ** **
> > >
> > > ** **
> > >
> > > On Wed, Sep 4, 2013 at 9:43 AM, Revthi Sanker
> > <revthi.sanker1990_at_gmail.com>
> > > wrote:****
> > >
> > > Dear all, ****
> > >
> > > I am running NAMD on the super cluster at my institute. My system
> > consists
> > > of 3 L atoms roughly.****
> > >
> > > ** **
> > >
> > > please keep in mind that most people on this mailing list (and in the
> > > world in general) do not know what a lakh is and better talk about
> > 300,000
> > > atoms instead. what would you think if somebody would talk to you
> > about a
> > > system with 2000 gross atoms?****
> > >
> > > ****
> > >
> > > I am aware that the scale up depends on the configuration of the
> > cluster
> > > I am currently using. But the people at the computer center would
> > like to
> > > get a rough estimate of the the Benchmark (ns/day) for a system size
> > of
> > > mine. Anybody who is aware of the yield for this system size, please
> > let
> > > me know as I am not sure if what I am getting currently (*2.5 ns/day
> > *for
> > > 8 nodes* 16 processors=128) is optimum or can it be tweaked
> > further.****
> > >
> > > ** **
> > >
> > > the only way to find out the optimum, is by doing a (strong) scaling
> > > benchmark, i.e. use a different number of nodes and plot the
> > resulting
> > > speedup. the performance depends not only on the hardware (CPU
> > > (type,generation,clock rate), memory bandwith, interconnect, BIOS
> > > configuration (e.g. hyper-threading, turbo boost)), but also on
> > software
> > > (kernel, NAMD version, compiler, configuration (SMP, MPI, ibverbs))
> > and
> > > your system and input. so there is no way to tell from the number of
> > atoms
> > > in the system and the number of nodes/cores whether you have a good
> > > performance or a bad performance.****
> > >
> > > ** **
> > >
> > > you can compare your numbers (absolute per cpu core performance and
> > > speedup) to other published data from other machines (even if much
> > older)=
>
>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:40 CST