Using Your Rocks Cluster
This exercise should be done while logged in as a normal user,
not as root. You can create a normal user account with the command
"useradd username" and then set the password with
"passwd username".
Part 1: Run NAMD
NAMD is a parallel molecular dynamics application developed in our
group. It is the main application run on our clusters.
- Copy the files NAMD_2.6b1_Linux-i686.tar.gz (NAMD binary)
and apoa1.tar.gz (sample NAMD simulation)
from the workshop CD and untar them in your home directory with:
tar xzf apoa1.tar.gz
tar xzf NAMD_2.6b1_Linux-i686.tar.gz
- cd NAMD_2.6b1_Linux-i686
- Use a text editor to create the file nodelist containing:
group main
host master_hostname
host compute-0-0
host compute-0-1
host compute-0-2
The nodelist file tells NAMD what nodes to run on. When we run
under the queueing system below we'll use a script to create this
file. NAMD does all of it's I/O on the first node, so by
including the master node in the calculation we can access
fileservers or disks that are only available to the master.
This is how we run NAMD in our group, with a single job for
the entire cluster and the queueing system spanning multiple
clusters.
- Start NAMD on all four machines with:
./charmrun ++remote-shell ssh ++nodelist nodelist +p4 ./namd2 ~/apoa1/apoa1.namd
If you have problems, or want to see what's going in in the launch
process, add ++verbose to the charmrun command line.
- When NAMD reaches the line that says "TIMING 20 ..." kill it with
Control-C and jot down the wallclock s/step number.
- Run NAMD again on two processors (change +p4 above to +p2) for
20 steps and compare the performance between the two. Do four
processors run twice as fast as four? How close to twice?
Part 2: Compile and Run Tachyon
Tachyon is a parallel ray tracer developed by John Stone for his
master's thesis. It is an example of a typical MPI application.
- Copy the file tachyon-0.97.tar.gz (Tachyon source and examples)
from the workshop CD and untar them in your home directory with:
tar xzf tachyon-0.97.tar.gz
- cd tachyon/unix
- Use a text editor to open the file Make-arch
- Search for the config options for "linux-lam"
- Copy this set of options to a new entry.
- Change (in the new entry) linux-lam to linux-mpich
- Change "CC = hcc" to "CC = gcc"
- Change -I$(LAMHOME)/h to -I/opt/mpich/gnu/include
- Change -L$(LAMHOME)/lib to -L/opt/mpich/gnu/lib
- Change -lmpi to -lmpich
- Save, quit the editor and run "make linux-mpich"
to build tachyon. If this doesn't work you probably missed
on of the edits above, or applied them in the wrong place.
The tachyon binary will end up in compile/linux-mpich/.
- cd (back to your home directory)
- Use a text editor to create the file machines containing:
compute-0-0
compute-0-1
compute-0-2
- Run Tachyon on the three slave machines with:
/opt/mpich/gnu/bin/mpirun -v -np 3 -machinefile machines \
tachyon/compile/linux-mpich/tachyon +V tachyon/scenes/balls.dat
- Look at the timing output, which is broken into different
stages of the calculation. Run on two and one processors
(change -np 3) and calculate speedups for the different
stages as well as the total time.
Part 3: Run Under Grid Engine
Sun Grid Engine (SGE) is a free, open souce, general purpose,
cross platform queueing system. In the geneology of queueing systems,
it is a descendant of the free DQS package, which was commercialized
by a German company that was recently bought by Sun.
- Run "qstat -f" to see the queues that were automatically
created. There should be one queue for each compute node.
The states column at far right is used for error flags.
- Run "qconf -sq compute-0-0.q" to see the queue setup for the
0th compute node. Note that there are many options to restrict
user access, memory usage, runtime, etc. that are turned off
by default. The only unique thing is the qname and hostname.
- Use a text editor to create the file tachyon.job containing:
#$ -cwd
#$ -j y
#$ -S /bin/bash
/opt/mpich/gnu/bin/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines \
tachyon/compile/linux-mpich/tachyon +V tachyon/scenes/balls.dat
Notice the similarity to the command for running Tachyon
manually. SGE will create a temporary working directory containing
a machines file (list of nodes to run on) and set the NSLOTS and
TMPDIR environment variables automatically.
The options preceeded by #$ are parsed by SGE as if they were
specified on the command line. -cwd causes the job to execute in
the current working directory. -j y merges standard error and output
into a single file. -S /bin/bash says to use the bash shell for this
script and is only necessary because Rocks lacks /bin/csh yet it is
the default shell for SGE; if you don't have this the queue will get
stuck in an error state and root needs to qmod -c it to clear the error.
- Submit the job to run on three processors under the mpich
parallel environment with the command "qsub -pe mpich 3 tachyon.job".
Note that there is no queue for the master node, so we can't use 4 nodes.
- Use "qstat -f" to check on the job until it is scheduled,
then look for output files named tachyon.job.oX and tachyon.job.poX, where
X is the job number output by qsub. View these files to see the output.
- Submit several jobs requesting 1, 2, and 3 processors in random
order so that a backlog develops.
You can use the same tachyon.job file for all of them, just use the up
arrow, possibly edit the processor request, and hit return to submit
jobs quickly.)
Use qstat to monitor how the jobs are executed (the default scheduling
policy is to take the earliest-submitted job that can be run, i.e.,
for which enough processors are available, and the scheduler runs at
regular intervals).
- Use a text editor to create the file namd.job containing:
#$ -cwd
#$ -j y
#$ -S /bin/bash
nodefile=$TMPDIR/namd2.nodelist
echo group main > $nodefile
awk '{ for (i=0;i<$2;++i) {print "host",$1} }' $PE_HOSTFILE >> $nodefile
dir=$HOME/NAMD_2.6b1_Linux-i686
$dir/charmrun ++remote-shell ssh ++nodelist $nodefile +p$NSLOTS $dir/namd2 ~/apoa1/apoa1.namd
Since NAMD does not use MPICH, we need a small shell script
and awk program to translate the SGE hostfile to charmrun format.
The second column of the hostfile is the number of processors available,
which is always one for these clusters, but this script will handle more.
- Submit the job with the command "qsub -pe mpich 3 namd.job".
Note that we are pretending to use the mpich parallel environment, but we
do not use any of the special files it sets up.
- Use qstat to monitor the job until it starts running, the use
"tail -f namd.job.oX (X is the job number) to watch the job output.
- When you get tired of this, Control-C out of tail and use
"qdel X" (X is the job number) to kill the job. Use qstat
to monitor the job until it is killed.
Part 4: There Is No Part 4
Compiling a program and running it under a queueing system is likely
all you will ever do on your cluster. We've done a typical application
(Tachyon) and a not-so-typical one (NAMD). At this point you might want to
ssh to a compute node to see what that environment is like, to try the
pretty graphical cluster tools, or go see how the Clustermatic folks are
doing. If you're really ambitious, download your own code and see if it
comiles and runs.
See Also
Rocks web site (http://www.rocksclusters.org/Rocks/)
Grid Engine web site (http://gridengine.sunsource.net/)
NAMD web site (http://www.ks.uiuc.edu/Research/namd/)
Tachyon web site (http://jedi.ks.uiuc.edu/~johns/raytracer/)