From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri Oct 19 2012 - 00:41:54 CDT
Hi,
as every other parallel program, namd doesn't care about physical and
virtual cores. If you tell your sge that your nodes got 16 slots, and you
choose to run a job with 64 slots, the sge will generate a nodelist
representing this number of slots, so 4 nodes. Means namd got told to use
this nodes and to spawn 64 processes, if it does, that's the expected
behavior.
Another thing might be, that you barely can't expect a speedup with HT. HT
basically is a hardware strategy to improve multitasking by introducing a
second command pipeline per physical core. This provides the possibility to
fill up waiting times of some procs with other running processes. But as
namd is a well performing code, not leaving too much spaces on the cores,
most of cases you won't see a gain through this.
Also keep in mind, that when running across multiple nodes, every process
will need to communicate with all other processes. So if you start 2 times
the processes per node, with already more or less negative impact on
performance due oversubscribing memory and the additional parallel overhead,
the need for communication is also more than doubled:
Example:
1. 2 nodes 32 processes -> means every process needs o talk to 31
other processes. Means there are 16 processes per node, each open 31 active
connections, 16 of them are at the other host. Means your network gets load
with 240 active connections.
2. 4 nodes 64 processes -> means every process needs o talk to 63
other processes. Means there are 16 processes per node, each open 63 active
connections, 47 of them are at the other hosts. Means your network gets load
with 704 active connections.
So 1st of all, try not to use the HT cores (turn off in bios or half the
slots of the machines in the sge). Then watch parallel scaling again.
Usually namd scales quite well even with Gigabit-Ethernet. So if this
doesn't make a difference, we should check your network setup.
To give reliable advice, we should also know what molecular system you
benchmarked (num atoms f.i.) and what kind of network you got. Also, for
benchmarking, don't look at the CPU utilization, watch the timing in the
namd out, that's what counts.
Good luck
Norman Geist.
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Aditya Ranganathan
Gesendet: Donnerstag, 18. Oktober 2012 15:05
An: NAMD list; namd_at_ks.uiuc.edu
Betreff: namd-l: NAMD SGE Job Script Scale Up issues!
Hello,
We installed a Linux Xeon Processor Based cluster. The specifications of the
same are as follows, Master Node + 8 Compute Nodes, Each Node consists of 8
physical processor cores (16 hyperthreaded cores). So in total 64 physical
processor cores and 128 virtual cores available for computing.
I tried submitting a NAMD job via sge using the following script: Ours is a
Rocks 5.4 based cluster.
#$ -cwd
#$ -j y
#$ -S /bin/bash
nodefile=$TMPDIR/namd2.nodelist
echo group main > $nodefile
awk '{ for (i=0;i<$2;++i) {print "host",$1} }' $PE_HOSTFILE >> $nodefile
dir=/home/moldyn/sim/namd
$dir/charmrun ++remote-shell ssh ++nodelist $nodefile +p$NSLOTS $dir/namd2
$dir/p53_equi.conf > $dir/p53onGATI.log
Now, when i submit the job with the command, qsub -pe mpich 32 namd.sh
The job generates 32 processes on 2 nodes with cpu utilization around 70%.
However, if I increase the number to say 64, it generates submits the job on
4 nodes and the benchmarks slow down drastically as compared to that with
qsub -pe mpich 32 namd.sh. I might sound a bi naive, but is this script not
equipped to handle our cluster configuration (8 physical processors per
node) or is there anything to be added or changed in my script to get it
working properly. Please let me know because with this script I`m not able
to scale up beyond 32 processors.
Thanks
Srivastav Ranganathan
Senior Research Fellow,
IIT Bombay
This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:40 CST