Re: ibverb&&smp build NAMD

From: ukulililixl (ukulililixl_at_gmail.com)
Date: Fri Aug 29 2014 - 06:41:21 CDT

Dear Norman,
Thanks for your help and I feel so sorry that I didn't phrase my question
clearly.And I want to rephrase my question clearer.

I build a net-linux-x86_64-ibverbs-smp version,with cuda6.5,gcc 4.8.2
build charm++ by
./build charm++ net-linux-x86_64 gcc ibverbs smp -j16 --with-production
config namd by
./config Linux-x86_64-g++ --charm-base charm-6.6.0-rc3 --charm-arch
net-linux-x86_64-ibverbs-smp-gcc --with-tcl --tcl-prefix ~/tcl-threaded/
--with-fftw --fftw-prefix ~/fftw --with-cuda --cuda-prefix
/public/opt/cuda-6.5 --cuda-gencode arch=compute_35,code=sm_35
--cuda-gencode arch=compute_35,code=compute_35

Then I have difficulties in running this net-linux-x86_64-ibverbs-smp
version.Our cluster have 5 nodes,each nodes have 16 cores and 2 gpu.
>From the 2.9 release note -- Shared-Memory and Network-Based Parallelism
(SMP Builds) --,it says 'SMP builds combine multiple worker threads and an
extra communication thread into a single process'.and 'SMP builds launched
with charmrun use +p to specify the total number of PEs (worker threads)
and +ppn to specify the number of PEs per process.Thus, to run one process
with one communication and three worker threads on each of four quad-core
nodes one would specify:
charmrun namd2 +p12 +ppn 3 <configfile>'.

So I run many times to figure out the instruction to run this version.
First,I try to test ./charmrun ++nodelist test.nodelist ./namd2 +p12 ++ppn
3 +idlepoll <configfile>.release note use '+ppn' but only '++ppn' can make
it run.I can also run ./charmrun ++nodelist test.nodelist ./namd2 ++p 12
++ppn 3 +idlepoll <configfile>.(I don't know why,but it seems that this two
line have the same effect as i top).
I think 12 is the number of total threads ,3 is the number of threads in
each process.Before I run, I think there should be 4 process.I make it two
nodes in my nodelist file and I think for each node,there should be 2
process and for each process there should be more than 300% cpu usage.Then
I began to run and for each node there are 2 process with each about 370%
usage which i think means 3 work threads and 1 communication threads.

based on +p is the total threads and ++ppn is the number of threads in each
progress.I do some test.

with 1 node in .nodelist.++ppn=1 ,as for +px,x from 1 to 12,it can run well
and top I can see i process.when x=13,it will told me
'ssh_exchange_identification: Connection closed by remote host'.when
x=14,it will stop at 'Entering startup at 1.53176 s, 333.883 MB of memory
in use',and in top there only have 11 process which i think should be
14.for x=15,16,it will have the same problem.
with 2,3,4,5,++ppn more than 1(+px,++ppn y,n nodes),i think there should be
(x/y)/n processes in each node(if x can be devided by y*n),but it seems
that the number of process in each node can't be more than 12 but there are
16 cores in each node,so I want to know whether there is a limit less than
16(for out cluster).
And if x can't be devided by y*n,i want to know how the threads are
allocated to the nodes.

Lee.

2014-08-28 18:25 GMT+08:00 Norman Geist <norman.geist_at_uni-greifswald.de>:

> Hi anonymous,
>
>
>
> please let me 1st say that if you phrase your question such horrible, you
> might get a similar phrased or no answer.
>
>
>
> The limit you are asking for is of course the number of cores your
> machines have.
>
>
>
> total_cores = x * y
>
>
>
> Where x is the number of processes and y the number of threads per
> process. Processes will show up in top as individual line whereas threads
> are shown as multiples of 100% utilization. So a process having 6 threads
> would show 600% cpu usage at full utilization.
>
>
>
> Please also inform about the difference between “distributed memory
> parallelism” (processes) and “shared memory parallelism (smp)” (threads).
>
> There might be a optimum ratio between processes and threads to utilize
> all the cores available.
>
>
>
> Processes are started in round robin fashion foreach entry in the nodelist
> file until number of x is reached where each process tries to fork y
> threads. This shouldn’t exceed the number of cores available.
>
>
>
> If the above doesn’t contain the answer you was looking for, please feel
> free to rephrase your question such carefully and detailed, that someone is
> actually able to figure out what you want to know or is actually willing to
> read it.
>
>
>
> Norman Geist.
>
>
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im **Auftrag
> von *ukulililixl
> *Gesendet:* Donnerstag, 28. August 2014 11:31
> *An:* namd-l_at_ks.uiuc.edu
> *Betreff:* namd-l: ibverb&&smp build NAMD
>
>
>
> like 'charmrun ++nodelist *.nodelist namd2 +idlepoll +p16 ++ppn 4 *.namd'
>
> I run 'charmrun ++nodelist *.nodelist namd2 +idlepoll +px ++ppn y *.namd'
> many times and I find for some x and y,program can't run.
>
> Is there a top limit for x or y?
>
> from my experiment,I top to see how many progress during the program,and
> it less than 10 or 11 for each node(every node has 16 core).
>
>
>
> another problem:if x can't be devised by y or number of nodes,how pes are
> allocated?
>
>
> ------------------------------
> <http://www.avast.com/>
>
> Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus
> <http://www.avast.com/> Schutz ist aktiv.
>
>

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:48 CST