From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Mon May 07 2012 - 00:49:22 CDT
congratulation, you are now faced with the main problem in high performance
computing. People call it the parallel scaling. If a program runs parallel
on multiple cores or nodes, the single processes needs to communicate to
share the work and gain speedup. There can be many reasons why you don't see
the speedup you want. The most common issue is the User ^^ and the network,
and also in the node itself the memory bandwidth.
So to benchmark the scaling of a node itself, run namd from 1 to 8 cores, on
only one node and see how your speedup is. This is what most people doesn't
Most people are interested in scaling over multiple nodes. To test that,
just run jobs of 1 to 3 nodes, each with 8 processes. You should only have
the nodes inside the nodelist, that should be currently used for the test.
So 1 node test -> one node in the nodelist. Why? Because charmrun (and mpi
also) distribute processes in a round robin fashion. Means in goes though
the nodelist and starts one process foreach line there. If not enough
processes started at the end of the nodelist, it starts from the top again.
Means with a nodelist of
And a charmrun of ++p8, it allocates
As this is not what you want to see, the nodelist should only contain the
machines for the current test.
To give reliable advice, we also should know more about your hardware
(machines and network)
Let us know
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
Gesendet: Samstag, 5. Mai 2012 20:37
Betreff: Fwd: Re: namd-l: charmrun setup
On 5/5/2012 8:14 PM, Pedro Armando Ojeda May wrote:
1) Running namd2
TCP/namd2) using charmrun
2) Launch the run as follows:
charmrun namd2 ++remote-shell ssh ++verbose +netpoll
++ppn 8 ++p 16 inputfile
my nodefile looks like:
Each node has 8 processors. Running on the command line
(not with torque or any
other queue system).
It works fine in the sense that my program finishes for the number of steps
I assigned. My concern is regarding to the simulation time, if I use "++p 8"
the time this run lasts is 22min (measured with "time" command), but if I
use "++p 16" the time for the run is 20min.
I expected the simulation time to be reduce at least for a half, but it is
almost the same time.
Does anyone has a comment about this issue?
- periodic cubic box 70x70x70
- 33933 atoms
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.930 / Virus Database: 2410.1.1/4979 - Release Date: 05/05/12
This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:21:58 CST