Using Your Rocks Cluster

This exercise should be done while logged in as a normal user, not as root. You can create a normal user account with the command "useradd username" and then set the password with "passwd username".

Part 1: Run NAMD

NAMD is a parallel molecular dynamics application developed in our group. It is the main application run on our clusters.

Copy the files NAMD_2.6b1_Linux-i686.tar.gz (NAMD binary) and apoa1.tar.gz (sample NAMD simulation) from the workshop CD and untar them in your home directory with:
```
tar xzf apoa1.tar.gz
tar xzf NAMD_2.6b1_Linux-i686.tar.gz
```
cd NAMD_2.6b1_Linux-i686
Use a text editor to create the file nodelist containing:
```
group main
  host master_hostname
  host compute-0-0
  host compute-0-1
  host compute-0-2
```
The nodelist file tells NAMD what nodes to run on. When we run under the queueing system below we'll use a script to create this file. NAMD does all of it's I/O on the first node, so by including the master node in the calculation we can access fileservers or disks that are only available to the master. This is how we run NAMD in our group, with a single job for the entire cluster and the queueing system spanning multiple clusters.
Start NAMD on all four machines with:
```
./charmrun ++remote-shell ssh ++nodelist nodelist +p4 ./namd2 ~/apoa1/apoa1.namd
```
If you have problems, or want to see what's going in in the launch process, add ++verbose to the charmrun command line.
When NAMD reaches the line that says "TIMING 20 ..." kill it with Control-C and jot down the wallclock s/step number.
Run NAMD again on two processors (change +p4 above to +p2) for 20 steps and compare the performance between the two. Do four processors run twice as fast as four? How close to twice?

Part 2: Compile and Run Tachyon

Tachyon is a parallel ray tracer developed by John Stone for his master's thesis. It is an example of a typical MPI application.

Copy the file tachyon-0.97.tar.gz (Tachyon source and examples) from the workshop CD and untar them in your home directory with:
```
tar xzf tachyon-0.97.tar.gz
```
cd tachyon/unix
Use a text editor to open the file Make-arch
Search for the config options for "linux-lam"
Copy this set of options to a new entry.
Change (in the new entry) linux-lam to linux-mpich
Change "CC = hcc" to "CC = gcc"
Change -I$(LAMHOME)/h to -I/opt/mpich/gnu/include
Change -L$(LAMHOME)/lib to -L/opt/mpich/gnu/lib
Change -lmpi to -lmpich
Save, quit the editor and run "make linux-mpich" to build tachyon. If this doesn't work you probably missed on of the edits above, or applied them in the wrong place. The tachyon binary will end up in compile/linux-mpich/.
cd (back to your home directory)
Use a text editor to create the file machines containing:
```
compute-0-0
compute-0-1
compute-0-2
```

Run Tachyon on the three slave machines with:

/opt/mpich/gnu/bin/mpirun -v -np 3 -machinefile machines \
  tachyon/compile/linux-mpich/tachyon +V tachyon/scenes/balls.dat

Look at the timing output, which is broken into different stages of the calculation. Run on two and one processors (change -np 3) and calculate speedups for the different stages as well as the total time.

Part 3: Run Under Grid Engine

Sun Grid Engine (SGE) is a free, open souce, general purpose, cross platform queueing system. In the geneology of queueing systems, it is a descendant of the free DQS package, which was commercialized by a German company that was recently bought by Sun.

Run "qstat -f" to see the queues that were automatically created. There should be one queue for each compute node. The states column at far right is used for error flags.
Run "qconf -sq compute-0-0.q" to see the queue setup for the 0th compute node. Note that there are many options to restrict user access, memory usage, runtime, etc. that are turned off by default. The only unique thing is the qname and hostname.
Use a text editor to create the file tachyon.job containing:
```
#$ -cwd
#$ -j y
#$ -S /bin/bash
/opt/mpich/gnu/bin/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines \
  tachyon/compile/linux-mpich/tachyon +V tachyon/scenes/balls.dat
```
Notice the similarity to the command for running Tachyon manually. SGE will create a temporary working directory containing a machines file (list of nodes to run on) and set the NSLOTS and TMPDIR environment variables automatically. The options preceeded by #$ are parsed by SGE as if they were specified on the command line. -cwd causes the job to execute in the current working directory. -j y merges standard error and output into a single file. -S /bin/bash says to use the bash shell for this script and is only necessary because Rocks lacks /bin/csh yet it is the default shell for SGE; if you don't have this the queue will get stuck in an error state and root needs to qmod -c it to clear the error.
Submit the job to run on three processors under the mpich parallel environment with the command "qsub -pe mpich 3 tachyon.job". Note that there is no queue for the master node, so we can't use 4 nodes.
Use "qstat -f" to check on the job until it is scheduled, then look for output files named tachyon.job.oX and tachyon.job.poX, where X is the job number output by qsub. View these files to see the output.
Submit several jobs requesting 1, 2, and 3 processors in random order so that a backlog develops. You can use the same tachyon.job file for all of them, just use the up arrow, possibly edit the processor request, and hit return to submit jobs quickly.) Use qstat to monitor how the jobs are executed (the default scheduling policy is to take the earliest-submitted job that can be run, i.e., for which enough processors are available, and the scheduler runs at regular intervals).
Use a text editor to create the file namd.job containing:
```
#$ -cwd
#$ -j y
#$ -S /bin/bash

nodefile=$TMPDIR/namd2.nodelist
echo group main > $nodefile
awk '{ for (i=0;i<$2;++i) {print "host",$1} }' $PE_HOSTFILE >> $nodefile

dir=$HOME/NAMD_2.6b1_Linux-i686
$dir/charmrun ++remote-shell ssh ++nodelist $nodefile +p$NSLOTS $dir/namd2 ~/apoa1/apoa1.namd
```
Since NAMD does not use MPICH, we need a small shell script and awk program to translate the SGE hostfile to charmrun format. The second column of the hostfile is the number of processors available, which is always one for these clusters, but this script will handle more.
Submit the job with the command "qsub -pe mpich 3 namd.job". Note that we are pretending to use the mpich parallel environment, but we do not use any of the special files it sets up.
Use qstat to monitor the job until it starts running, the use "tail -f namd.job.oX (X is the job number) to watch the job output.
When you get tired of this, Control-C out of tail and use "qdel X" (X is the job number) to kill the job. Use qstat to monitor the job until it is killed.

Part 4: There Is No Part 4

Compiling a program and running it under a queueing system is likely all you will ever do on your cluster. We've done a typical application (Tachyon) and a not-so-typical one (NAMD). At this point you might want to ssh to a compute node to see what that environment is like, to try the pretty graphical cluster tools, or go see how the Clustermatic folks are doing. If you're really ambitious, download your own code and see if it comiles and runs.

Using Your Rocks Cluster

Part 1: Run NAMD

Part 2: Compile and Run Tachyon

Part 3: Run Under Grid Engine

Part 4: There Is No Part 4

See Also