Build Your Own Rocks Cluster

You should have the following parts:

Part 1: Frontend (Master Node) Installation

  1. Plug the monitor into the power strip and turn it on.
  2. Find the machine with two network cards; this is the master node, and should be the leftmost computer.
  3. Plug the master node into the power strip and connect the monitor, keyboard, and mouse.
  4. Power on the master node and insert the Rocks Kernel CD into the CD-ROM drive, then press the reset switch.
    If you wait too long and your machine starts booting off of the hard drive, just press the reset button to make it boot from the CD-ROM. If your machine still insists on booting from the hard drive you may need to modify its BIOS settings.
  5. As soon as the boot menu appears, type "frontend" to boot the frontend installation. If you wait too long without pressing any keys, it will attempt to boot as a cluster. If this happens, simply restart the machine and try again.
  6. The first screen to appear should list the available "rolls," with the first four already selected (ganglia, kernel, hpc, base). Select "sge" in addition to these and select "OK."
    "SGE" or the "Sun Grid Engine" is what allows us to queue jobs on the cluster.
  7. The installer will ask you if you have another roll CD-ROM to load. Choose "Yes" to process another CD.
  8. Insert OS Disc 1 into the drive and press Enter.
  9. Repeat for OS Disc 2.
  10. Once all three CDs are loaded, select "No" in order to continue. The current CD will be ejected, and you might have to wait several minutes for the next step as the computer silently processes.
  11. You should be presented with various fields to input. The only required field is "Fully Qualified Hostname." Enter the full host and domain name of the master node as given to you by the instructor. The other fields can be left at their defaults or replaced with any arbitrary data you'd like. This is completely optional, and only used internally. Select "OK" when done.
  12. You will then be prompted whether you want to the installer to autopartition your drive, or manually set up the partitions yourself using Disk Druid. Select "Autopartition."
  13. Next you'll configure both ethernet cards of your frontend. "Eth0" is used for the private network, and its settings can be left default. "Eth1" is for the outside network, and you should input the IP Address given to you by your instructor. The default netmask should be fine. Select "OK" when done with each interface.
  14. Enter these settings:
    Gateway:        130.126.120.1
    Primary DNS:    130.126.120.32
    Secondary DNS:  130.126.120.33
    Tertiary DNS:   130.126.116.194
    
    and select "OK" to continue.
    Note: these values are specific to our network. If you want to set up your own cluster later on, you'll have to get these addresses from your local sysadmin (which might be you!).
  15. On the next screen, select "America/Chicago" as the timezone, and change the network timeserver to "timehost.ks.uiuc.edu." Select "OK" to continue.
  16. You will then be prompted for your root password. This can be any password agreed upon by your group for this excercise, but should be extremely secure for a live server in the real world. Enter this password twice (and write it down somewhere) and select "OK" to continue.
  17. Now begins the copying process. The installer will prompt you for CD changes as necessary, and copy their contents to a temporary location on the harddrive. You should only have to provide each CD once.
  18. When the three CDs have been copied over, the installer will merge the rolls on the harddrive and begin the actual install. While this is happening, feel free to start setting up the rest of the nodes. Each node needs a power cable connected to your power strip and a network cable going from their only* network card to the mini switch. When done, the system will automatically reboot.
    *Note: your master node has two ethernet cards. Connect the Intel card to the mini switch and the 3COM card to our larger central switch, using the long network cable provided.

Part 2: Configuring the Frontend

  1. The first time you log in as root, you'll be prompted about setting up ssh keys. Press Enter three times to accept the default location and enter (and confirm) a blank password for the key pair.
  2. The normal Rocks distribution is lacking a single library needed for NAMD, a program we'll use later. To fix this, follow these steps:
    cd /home/install/site-profiles/4.0.0/nodes
    wget http://www.ks.uiuc.edu/~beacham/extend-compute.xml
    cd /home/install
    rpm -i rocks-dist/lan/i386/RedHat/RPMS/compat-libstd*
    rocks-dist dist
    
    This will install the "compat-libstdc++-33" package and rebuild the Rocks distribution so it automatically installs on all slave nodes. A script is also available to automate this step at http://www.ks.uiuc.edu/~beacham/namdrocks.sh.
  3. Once that finishes, create a normal user for your group to use (you don't always want to use root) by running "useradd username." This will set up a default account with a blank password. Run "passwd username" to set the password for the account.
  4. Test this account by trying to log in as the new user on a different terminal: Press Ctrl+Alt+F2 to switch to tty2, log in as the new user, exit, then Ctrl+Alt+F1 to switch back to tty1.
  5. Back at your root login, run "insert-ethers" to start detecting any nodes that boot.
  6. Press Enter to select "Compute," for the type of nodes to listen for.
  7. Make sure your slave nodes are connected and have one of the Rocks kernel CDs (the one you first booted with) inserted. Boot them all now, and you'll see them appear on the screen as they connect to the master node. The CD only needs to be in the drive a matter of seconds in order to load the PXE boot system (for network booting). Most modern-day motherboards actually have support for this built in, and even the CDs are unnecessary, but these specific computers are just old enough to require them. Make sure the CDs are then removed, since the installation program will automatically reboot the computers afterwards (and we don't want them installing in an infinite loop). You can also use disposable floppies instead of CDs, which can easily be generated at http://www.rom-o-matic.net. The downside of using floppy disks is that they can usually only hold one set of network card drivers at a time, so if you have a many slaves all using different network cards, you'll have to make many unique boot disks. A CD, however, can hold all of the network card drivers supported by Linux.
  8. Each slave entry should contain an empty "( )", which will be replaced by "(*)" when the node properly requests its kickstart file. When all of the nodes have done this, press F10 to exit the insert-ethers program.
  9. Each node will do a full installation and then reboot.
What's happening: When each slave boots, it accesses the network and searches for the master node. The master then sends an entire Linux distribution over to the slave to be installed. This makes the slaves easier to maintain, since you can easily swap nodes in and out with replacements, which are automatically installed and configured without any help from you.

Part 3: Checking Your Cluster Status

Rocks doesn't use bproc like Clustermatic, so you can't use bpstat, but it has an incredibly powerful web interface. We'll configure and boot into X so that we can properly view this.

  1. While logged in as root, run "system-config-display" in order to initially set up your X config file.
  2. Using the mouse (finally!), set the resolution to 1024x768 and the color depth to "Millions of Colors," and click "OK."
  3. Back at the root prompt, run "startx" to actually start your X session.
  4. Once X boots, start Firefox by navigating to "Applications > Internet > Firefox Web Browser." Its homepage should default to "http://localhost/," so it should already display the Rocks web interface.
  5. Feel free to explore the menus of the web interface. You can generate graphs, check the health of your cluster, and view job submissions to the SGE queue. Most of these entries will be blank now, but you can watch the "Cluster Status (Ganglia)" page to see when your nodes are fully up. A listing of each individual node and its status is at the bottom of the page.
  6. For the rest of the activities, you can either stay in X and use a virtual terminal (Right-click on the Desktop, "Open Terminal"), or log out of X (Actions > Log Out) to return back to your console session.

See Also

Rocks web site (www.rocksclusters.org)