NAMD SGE Job Script Scale Up issues!

From: Aditya Ranganathan (aditya.sia_at_gmail.com)
Date: Thu Oct 18 2012 - 08:04:53 CDT

Hello,

We installed a Linux Xeon Processor Based cluster. The specifications of
the same are as follows, Master Node + 8 Compute Nodes, Each Node consists
of 8 physical processor cores (16 hyperthreaded cores). So in total 64
physical processor cores and 128 virtual cores available for computing.

I tried submitting a NAMD job via sge using the following script: Ours is a
Rocks 5.4 based cluster.

#$ -cwd
#$ -j y
#$ -S /bin/bash

nodefile=$TMPDIR/namd2.nodelist
echo group main > $nodefile
awk '{ for (i=0;i<$2;++i) {print "host",$1} }' $PE_HOSTFILE >> $nodefile

dir=/home/moldyn/sim/namd
$dir/charmrun ++remote-shell ssh ++nodelist $nodefile +p$NSLOTS $dir/namd2
$dir/p53_equi.conf > $dir/p53onGATI.log

Now, when i submit the job with the command, qsub -pe mpich 32 namd.sh

The job generates 32 processes on 2 nodes with cpu utilization around 70%.
However, if I increase the number to say 64, it generates submits the job
on 4 nodes and the benchmarks slow down drastically as compared to that
with qsub -pe mpich 32 namd.sh. I might sound a bi naive, but is this
script not equipped to handle our cluster configuration (8 physical
processors per node) or is there anything to be added or changed in my
script to get it working properly. Please let me know because with this
script I`m not able to scale up beyond 32 processors.

Thanks

Srivastav Ranganathan
Senior Research Fellow,
IIT Bombay

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:40 CST