AW: ibverb&&smp build NAMD

From: Norman Geist (
Date: Mon Sep 01 2014 - 02:17:13 CDT

There’s a default limit for the number of allowed ssh connections which is too low in your case. Therefore if you try to start more than this number of processes it will fail. See “/etc/ssh/sshd_config” and increase “MaxStartups” to allow more simultaneous starting connections.


Norman Geist.


Von: [] Im Auftrag von ukulililixl
Gesendet: Freitag, 29. August 2014 13:41
An: Norman Geist
Betreff: Re: namd-l: ibverb&&smp build NAMD


Dear Norman,

Thanks for your help and I feel so sorry that I didn't phrase my question clearly.And I want to rephrase my question clearer.


I build a net-linux-x86_64-ibverbs-smp version,with cuda6.5,gcc 4.8.2

build charm++ by

./build charm++ net-linux-x86_64 gcc ibverbs smp -j16 --with-production

config namd by

./config Linux-x86_64-g++ --charm-base charm-6.6.0-rc3 --charm-arch net-linux-x86_64-ibverbs-smp-gcc --with-tcl --tcl-prefix ~/tcl-threaded/ --with-fftw --fftw-prefix ~/fftw --with-cuda --cuda-prefix /public/opt/cuda-6.5 --cuda-gencode arch=compute_35,code=sm_35 --cuda-gencode arch=compute_35,code=compute_35


Then I have difficulties in running this net-linux-x86_64-ibverbs-smp version.Our cluster have 5 nodes,each nodes have 16 cores and 2 gpu.

>From the 2.9 release note -- Shared-Memory and Network-Based Parallelism (SMP Builds) --,it says 'SMP builds combine multiple worker threads and an extra communication thread into a single process'.and 'SMP builds launched with charmrun use +p to specify the total number of PEs (worker threads) and +ppn to specify the number of PEs per process.Thus, to run one process with one communication and three worker threads on each of four quad-core nodes one would specify:

charmrun namd2 +p12 +ppn 3 <configfile>'.


So I run many times to figure out the instruction to run this version.

First,I try to test ./charmrun ++nodelist test.nodelist ./namd2 +p12 ++ppn 3 +idlepoll <configfile>.release note use '+ppn' but only '++ppn' can make it run.I can also run ./charmrun ++nodelist test.nodelist ./namd2 ++p 12 ++ppn 3 +idlepoll <configfile>.(I don't know why,but it seems that this two line have the same effect as i top).

I think 12 is the number of total threads ,3 is the number of threads in each process.Before I run, I think there should be 4 process.I make it two nodes in my nodelist file and I think for each node,there should be 2 process and for each process there should be more than 300% cpu usage.Then I began to run and for each node there are 2 process with each about 370% usage which i think means 3 work threads and 1 communication threads.


based on +p is the total threads and ++ppn is the number of threads in each progress.I do some test.


with 1 node in .nodelist.++ppn=1 ,as for +px,x from 1 to 12,it can run well and top I can see i process.when x=13,it will told me 'ssh_exchange_identification: Connection closed by remote host'.when x=14,it will stop at 'Entering startup at 1.53176 s, 333.883 MB of memory in use',and in top there only have 11 process which i think should be 14.for x=15,16,it will have the same problem.

with 2,3,4,5,++ppn more than 1(+px,++ppn y,n nodes),i think there should be (x/y)/n processes in each node(if x can be devided by y*n),but it seems that the number of process in each node can't be more than 12 but there are 16 cores in each node,so I want to know whether there is a limit less than 16(for out cluster).

And if x can't be devided by y*n,i want to know how the threads are allocated to the nodes.







2014-08-28 18:25 GMT+08:00 Norman Geist <>:

Hi anonymous,


please let me 1st say that if you phrase your question such horrible, you might get a similar phrased or no answer.


The limit you are asking for is of course the number of cores your machines have.


total_cores = x * y


Where x is the number of processes and y the number of threads per process. Processes will show up in top as individual line whereas threads are shown as multiples of 100% utilization. So a process having 6 threads would show 600% cpu usage at full utilization.


Please also inform about the difference between “distributed memory parallelism” (processes) and “shared memory parallelism (smp)” (threads).

There might be a optimum ratio between processes and threads to utilize all the cores available.


Processes are started in round robin fashion foreach entry in the nodelist file until number of x is reached where each process tries to fork y threads. This shouldn’t exceed the number of cores available.


If the above doesn’t contain the answer you was looking for, please feel free to rephrase your question such carefully and detailed, that someone is actually able to figure out what you want to know or is actually willing to read it.


Norman Geist.


Von: [] Im Auftrag von ukulililixl
Gesendet: Donnerstag, 28. August 2014 11:31
Betreff: namd-l: ibverb&&smp build NAMD


like 'charmrun ++nodelist *.nodelist namd2 +idlepoll +p16 ++ppn 4 *.namd'

I run 'charmrun ++nodelist *.nodelist namd2 +idlepoll +px ++ppn y *.namd' many times and I find for some x and y,program can't run.

Is there a top limit for x or y?

from my experiment,I top to see how many progress during the program,and it less than 10 or 11 for each node(every node has 16 core).


another problem:if x can't be devised by y or number of nodes,how pes are allocated?





Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus <> Schutz ist aktiv.



Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv.

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:48 CST