NAMD Wiki: NamdOnClustermatic

  You are encouraged to improve NamdWiki by adding content, correcting errors, or removing spam.

What is Clustermatic?

One of the many distributions for LinuxClusters.

See NamdOnScyld for the first part of the story.

Clustermatic, http://www.clustermatic.org/, is a complete cluster solution project from Los Alamos National Lab. It includes LinuxBIOS and BProc as components. The !Bproc in Clustermatic is a newer version than in Scyld, so the two are not binary compatible and some of the utilities are different, but they are very closely related.

Clustermatic is completely free and version 3.0 (Nov. 2002) is based on Red Hat 8.0.

The NAMD 2.5 Clustermatic binaries are based on Clustermatic 3 and will not work with the new (Nov. 2003) Clustermatic 4. TCBG now (12/16/03) has a small Clustermatic 4 test cluster and we have found that source code changes to Charm++ are needed. This is a high priority for us, since we plan to upgrade our own clusters soon.

Modification of Charm++ to support Clustermatic 4 has been completed (12/17/03). The changes are in Charm++ source. One needs to download the latest version of Charm++ source from cvs (download instructions please see http://charm.cs.uiuc.edu ).

NAMD 2.5 works on Clustermatic 5 as reported by users.

Please download the latest Charm++ from nightly autobuild website: http://charm.cs.uiuc.edu/autobuild/cur.

For opteron, compile Charm++ by: ./build charm++ net-linux-amd64 clustermatic

Configuring Shared Libraries

The shipped NAMD binaries are dynamically linked. Clustermatic 4 and 5 have poor default sets of shared libraries that are copied to the slave nodes. This is controlled by the "libraries" lines in /etc/clustermatic/config.

If a binary doesn't run on the slave nodes try this to check for shared library issues:

bpcp namd2 0:/tmp
bpsh 0 ldd /tmp/namd2

In our experience, adding the following to /etc/clustermatic/config and rebooting (or restarting the clustermatic daemons and rebooting the slaves) will allow NAMD to run:

libraries /lib/libtermcap* /lib/libdl* /usr/lib/libz* /lib/libgcc_s*

Clustermatic at TCBG

We switched from Scyld to Clustermatic for our three gigabit 24 node dual-processor Athlon clusters in 2003 because Scyld was still based on Red Hat 6.2, hadn't updated their low cost edition recently, and didn't support gigabit in the low cost edition. We have a tutorial based on Clustermatic (but using the old machines) from our 2003 Summer School.

Clustermatic doesn't provide the polish or bells and whistles of Scyld (if you consider supporting hard drives on the slave nodes a bell or whistle) but it provides everything we need for our floppy-only machines (a floppy drive is inexpensive but, we now know, unreliable). Scyld is a commercial product; if you pay them money they will make it work and you can expect it to be reliable.

The queueing system (still DQS) is set up the same way it was with NamdOnScyld. The only difference is that `bpstat -u` + 1 was changed to (`bpstat -t allup` + 1) * 2 in the script used to launch NAMD.

Other ways to use Clustermatic

There are two ways to run a NAMD job, but don't mix the command line options of one with the other.

(1) By default, charmrun for Scyld/Clustermatic automatically looks for compute nodes that are up.

charmrun for Scyld or Clustermatic provides several extra options:

++skipmaster       Donot assign any process to master node(SCYLD)         [0]
++singlemaster     Only assign one process to master node(SCYLD)          [0]
++endpe            last pe to start job(SCYLD)                            [1000000]
++startpe          first pe to start job(SCYLD)                           [0]
++ppn              start multiple processes per node                      [1]
++debug            verbose debugging printout during startup

The ++startpe and ++endpe options actually refer to bproc node numbers, which start at -1 for the master and start at 0 for the first slave. The master node (-1) will be included as in the set of slaves on the assumption that I/O is done there unless you add ++skipmaster. ++singlemaster will use the master node only once. ++ppn will use every node for multiple times, good for SMP nodes.

Some examples:

jim@umbriel>bpstat
Node(s)                        Status       Mode       User       Group
4,6                            down         ---------- root       root
0-3,5,7-22                     up           ---x--x--x root       root

jim@umbriel>./charmrun ++startpe 12 ++endpe 13 ++verbose +p6 ./namd2
Charmrun> charmrun started...
Charmrun> adding client 0: "-1", IP:192.168.1.1
Charmrun> adding client 1: "12", IP:192.168.1.112
Charmrun> adding client 2: "13", IP:192.168.1.113
Charmrun> There are 2 slave nodes available.
Charmrun> adding client 3: "-1", IP:192.168.1.1
Charmrun> adding client 4: "12", IP:192.168.1.112
Charmrun> adding client 5: "13", IP:192.168.1.113

jim@umbriel>./charmrun ++startpe 12 ++endpe 13 ++singlemaster ++verbose +p5 ./namd2
Charmrun> charmrun started...
Charmrun> adding client 0: "-1", IP:192.168.1.1
Charmrun> adding client 1: "12", IP:192.168.1.112
Charmrun> adding client 2: "13", IP:192.168.1.113
Charmrun> There are 2 slave nodes available.
Charmrun> adding client 3: "12", IP:192.168.1.112
Charmrun> adding client 4: "13", IP:192.168.1.113

jim@umbriel>./charmrun ++startpe 12 ++endpe 13 ++skipmaster ++verbose +p4 ./namd2
Charmrun> charmrun started...
Charmrun> adding client 0: "12", IP:192.168.1.112
Charmrun> adding client 1: "13", IP:192.168.1.113
Charmrun> There are 1 slave nodes available.
Charmrun> adding client 2: "12", IP:192.168.1.112
Charmrun> adding client 3: "13", IP:192.168.1.113

You can also use bpsh to run any serial program on the slave nodes, but you will need either figure out how to NFS-mount the head node's disk (not so great if you have a separate file server, though), or use bpcp to copy input and output files to RAM disks on the slave nodes. We don't mess with this and have several normal machines on our queueing system to handle serial jobs. If we needed more, I'd look at stealing cycles from desktop machines.

(2) use a machinefile and command line option ++nodelist. (this option is only available for latest Charm in cvs, charm-5.8 is a year old now)

a nodelist file looks like:

group main
host -1
host 0
host 3

Note "-1" is the master node. For cluster that is not NFS-mounted, you need to start the first process on -1 the master node in order to read NAMD config files.

Now run NAMD with: ./charmrun +p3 ++nodelist ./nodelist ./namd2 config

Clustermatic and SGE

There is a short tutorial on setting up Clustermatic and SGE at:

http://noel.feld.cvut.cz/magi/sge+bproc.html

It is a little dated, and doesn't quite show how to setup SGE for NAMD (you only need one queue that runs a script using charmrun), but its a great place to start. SGE works with Clustermatic 4, and seems to scale just fine. Once I get a final configuration setup, I'll post better details on how to make it all work.

In the meantime, if you're interested in getting Clustermatic and SGE to work feel free to send me an email at skearns AT emdlexigen.com