NAMD Wiki: NamdOnScyld
Who or what is Scyld?
One of the many distributions for LinuxClusters.
Scyld Computing Coporation http://www.scyld.com/, now part of Penguin Computing.
They pioneered the bproc-based "Beowulf 2" concept, running a bare minimum OS on the slave nodes, downloading the kernel from the master node during boot, and providing a unified process space for monitoring the entire cluster.
NAMD on Scyld at TCBG
We used the Scyld Beowulf operating system (based on Red Hat 6.2) for our three 32-processor single-processor Athlon clusters starting in 2001. A tutorial based on these clusters is at http://www.ks.uiuc.edu/Research/namd/tutorial/NCSA2001/
The model we adopted was that each cluster would run a single job at a time, using all available processors. The head node (the only node on the public network) would mount our file servers and all file I/O would happen on this node (NAMD worked like this already, no changes needed). A simple script used `bpstat -u` + 1 to calculate the number of available processors. Our existing (and ancient) DQS queueing system was sufficient to start jobs.
In order to allow on-demand access to the clusters for short setup and test runs, we used DQS to create a second "short" 30 minute queue on each cluster, and made the main 24 hour production queue subordinate to the short queue. When a job starts in the short queue, the main job is suspended. Amazingly, this worked without any changes to DQS (the binaries were build in 1999), Scyld, or NAMD. This strategy has allowed us to avoid idle nodes on the clusters as long as three jobs were available to run, while having zero wait time for short test runs.
For those of us who don't have access to DQS (its no longer around), the Sun Grid Engine appears to be a direct descendant of it, that can be setup exactly as listed above. For more details see NamdOnClustermatic.
See also NamdOnClustermatic.