Cluster Workshop - Build Your Own Clustermatic Cluster
Installation Instructions for Fedora Core 4 and Clustermatic 5
You should have the following parts:
- 4 single-processor Athlon PCs
- 5 Intel network cards (already installed)
- 4 network cables
- 1 fast ethernet switch and its power adapter
- 1 keyboard
- 1 mouse
- 1 monitor
- 1 power strip
- 5 power cables
- 4 Fedora Core 4 CD-ROMs (discs 1, 2, 3 & 4, from
fedora.redhat.com)
- 3 Clustermatic 5 CD-ROMs (from
www.clustermatic.org)
- 1 TCB Cluster CD-ROM (Sun Grid Engine Packages, NAMD
examples
and
binaries)
Part 1: Install Fedora Core 4 on the Master Node
If you've installed Fedora Core before, the following may be quite
tedious. If you've never installed Fedora Core before, the
following may be quite mysterious. It's a necessary evil in either
case.
- Plug the monitor into the power strip and turn it on.
- Find the machine with two network cards; this is the master node.
- Plug the master node into the power strip and connect the
monitor, keyboard, and mouse.
- Power on the master node, open the CD-ROM drive, insert Fedora
Core disk 1, and press the reset button. The machine should boot
from the CD-ROM.
If you wait too long and your machine starts booting off of
the hard drive just press the reset button to make it boot from
the CD-ROM. If your machine still insists on booting from the
hard drive you may need to modify its BIOS settings.
- When the Fedora Core screen comes up, hit enter.
If you don't have a mouse it is suggested that you type
linux text
at this point to do a text based install.
The process is very similar to the graphical install.
- Skip testing the CD media.
This takes far too long and has no real benefit for fresh
installs.
- Click Next to the "Welcome to Fedora Core Linux!" message.
- Select English as your installation language.
- Select a US model keyboard.
- If your mouse was not automatically detected, select a generic
3-button PS/2 mouse.
- If the installer detects that another version of Linux is
installed, click Install Fedora Core and hit Next.
- Select a Workstation install.
We typically use Custom, but Workstation is good enough for
now.
- Select Autopartition.
We don't store files on our cluster machines, even on the
master nodes, so it doesn't matter how the disk is set up. We
use dedicated fileservers for storage.
- Select "Remove all partitions on this system".
Again, we don't keep data on cluster machines.
- Yes, you really want to delete all existing data.
Of course, at home you might not want to do this.
- Click Next at the GRUB boot loader screen.
- At the network configuration screen set both cards to "Activate
on boot." Select device eth0 and click edit. In the dialog that
appears, uncheck the "Configure using DHCP" checkbox and then
enter 10.0.4.1 in the IP address Field and
255.255.255.0 in the Netmask field. Click OK after you
have made these changes and Next to the network configuration
screen.
This will be the interface to the private network.
- "Eth1" is for the outside network, and you should
input the IP address given to you by your instructor. The
netmask should be 255.255.255.0. Select "OK" when done
with the interface.
- Enter these settings:
Gateway: 130.126.120.1
Primary DNS: 130.126.120.32
Secondary DNS: 130.126.120.33
Tertiary DNS: 130.126.116.194
and select "OK" to continue.
Note: these values are specific to our network. If you want
to set up your own cluster later on, you'll have to get these
addresses from your local sysadmin (which might be you!).
- Disable the firewall and SELinux.
In most cases your cluster will not be connecting to the
outside world, so it should be safe to disable the firewall if
you trust your network, if not you'll need to enable it.
- Click Proceed when the installer warns you about not having a
firewall.
- The hardware clock should be set to GMT. Pick your time zone.
- Pick a root password that you will remember. Write it down.
- You don't need to customize the software selection or pick
individual packages.
However, you may want to do this for a production system.
This is by far the easiest time to add packages to your cluster.
On the other hand, the default install has 2 GB of software, so
you could save some time in the next step if you pared the list
down.
- Start the installation. It will take between 15 and 25 minutes to
install Fedora Core 4 and will prompt you as necessary for
additional disks.
- Make a boot floppy.
Having a Linux boot floppy can be invaluable. A floppy made
now will be unable to load kernel modules once you install
Clustermatic, but it will still allow you to boot your machine
and fix any misconfigurations. You probably won't need it today,
though.
- At this point your Fedora Core 4 box is installed. Reboot the
system when prompted to.
- After rebooting, the Welcome to Fedora Core 4 screen will pop up.
Click Next.
- You will need to Agree to the License Agreement before
continuing.
- Verify that the computer is set to the correct time, if not
change it to be correct.
- At the Display Configuration screen you can either just click
next, or adjust the monitor configuration to "Generic LCD
1024x768". After you have done this you can adjust the default
screen resolution to 1024x768.
You would normally only use the console of a production
cluster during initial configuration or adding nodes, and you
don't need a GUI for either of those, so there is little reason
to configure X-Windows. Having multiple terminals available will
be useful for this exercise, so we'll go ahead and configure
X-Windows anyway.
- Create a username and password for yourself.
- Click Next at the sound card configuration screen.
- Click Next at the Additional CDs screen.
- Click Next at the Finish Setup screen.
- Congratulations, you've installed Fedora Core 4.
Part 2: Install Clustermatic 5 on the Master Node
The following will be new to everyone. You'll need to know how to
use a unix text editor. The examples below use the mouse-driven editor
"gedit" rather than the more common "vi".
- Log in as root.
- Go to Desktop, System Settings, Security Level. Enable the
Firewall, and set eth0 to be a trusted device, or the slaves
won't be able to download the kernel.
- Open a terminal (right-click on the desktop, Open
Terminal).
- In order to ensure that one network interface is persistently
named "eth0" and the other "eth1", in practice we actually use
two completely different cards (one from Intel, the other 3COM).
If you wish to do this with your own clusters, you can perform
the following step. However, today you have two identical cards
(both Intel), and the Linux kernel maintains naming consistency
of network cards (barring hardware changes), so we only need to
figure out which card is which interface once. So, you can
connect the private and public connections to the cards, and if
this configuration fails, all you have to do is swap the two
cables.
- Run gedit /etc/modules.conf and swap eth0 and eth1 if
necessary so that the network alias lines read:
alias eth0 eepro100
alias eth1 3c59x
This will ensure that the Intel card always appears as "eth0"
and the 3COM card as "eth1."
- Insert the Clustermatic 5 CD and wait for it to appear on your
desktop.
If you're not running X-Windows you need to mount it with
mount /media/cdrom
- Install Clustermatic with rpm -ivh --force
/media/cdrom/RPMS/i686/kernel*
Substitute the directory for your architecture if you are not
using a 32 bit Intel or AMD processor. The --force option enables
you to install a version previous to the one you currently have
running.
- Make an initrd image for this new kernel with /sbin/mkinitrd
/boot/initrd-2.6.9-cm46 2.6.9-cm46
If you installed a different kernel in the provious step
adjust accordingly.
- gedit /boot/grub/grub.conf and add:
title Clustermatic
kernel /vmlinuz-2.6.9-cm46 root=/dev/VolGroup00/LogVol00
initrd /initrd-2.6.9-cm46
and edit the default line (a zero-base index into the list of
kernels that follows) to point to the new entry you made. Note
that the paths for the kernel and initrd image are given assuming
/boot is the root filesystem.
Make sure that the root device is the same as for the old
kernel. If you installed a different kernel in the previous step
you should adjust the kernel and initrd image appropriately.
- Copy the compat-libstdc++-33 package (required for NAMD) from the
workshop CD or website and install it with rpm -i
compat-libstdc++*
- Unmount the Clustermatic CD with eject and remove it.
- Reboot the computer into the new kernel that you just installed.
- Login again and open up a terminal window.
- Insert the Clustermatic 5 CD again.
- Install the remaining packages with rpm -ivh
/media/cdrom/RPMS/i586/beo*.rpm /media/cdrom/RPMS/i586/m*.rpm
/media/cdrom/RPMS/i586/bp*.rpm
- gedit /etc/clustermatic/config and edit the following lines:
- Verify that your interface line reads:
interface eth0
- Add a nodes line for the number of slave nodes you will
have:
nodes 3
- Change the iprange line to provide the corresponding number
of addresses:
iprange 0 10.0.4.10 10.0.4.12
- Add a kernelimage line to point at the proper kernel:
kernelimage /boot/vmlinuz-2.6.9-cm46
If you installed a different kernel in previous steps,
adjust accordingly.
- Add this line to the libraries section:
libraries /lib/libtermcap* /lib/libdl* /usr/lib/libz* /lib/libgcc_s*
This ensures that libraries needed by NAMD are available on
the slave nodes.
- If you are going to want to share home directories to the slave
nodes, then gedit /etc/clustermatic/config.boot to add
bootmodule nfs
modprobe nfs
also gedit /etc/clustermatic/fstab to add
MASTER:/home /home nfs defaults 0 0
and gedit /etc/exports (a new file) to add (with a tab
beween /home and *)
/home *(rw,sync)
and finally /sbin/chkconfig nfs on and /sbin/service
nfs start
- You may remove the CD.
Part 3: Attach and Boot the Slave Nodes
- Plug the private network switch into an outlet on the power
strip.
- Connect one of the master node's network cards to the switch, and
the other to the "outside world" (we should have several larger
switches connected to the outside network; ask if there is any
confusion).
- Log in as root and open a terminal.
- Try pinging www.ks.uiuc.edu with the command "ping -c 1
130.126.120.32". If this fails ("unknown host
www.ks.uiuc.edu"), then swap the master node's network cables,
wait a few seconds, and try again. If you still have trouble,
feel free to ask for assistance.
- Create the level 2 boot image with beoboot -2 -n
This builds the second stage boot image, which the slaves
will download from the master over the network. You only need to
run this command when you change the boot options in
/etc/clustermatic/config or /etc/clustermatic/config.boot.
- Start up Clustermatic services with /sbin/service clustermatic
start
- Open a second terminal and run /usr/lib/beoboot/bin/nodeadd -a
eth0 there. The nodeadd program will run until you kill it
with Ctrl-C. Leave it running!
This process is only needed when adding new nodes to the
cluster. The nodeadd program captures the hardware ethernet
address of any machine trying to boot on the private network
(eth0), adds it to node list in /etc/clustermatic/config, and
makes the beoboot daemon read the new list (-a). When a new node
is detected nodeadd will print the hardware address followed by a
message about sending SIGHUP to beoserv.
- For each slave node plug in its power cable and network cable.
- Power on each slave node and insert a Clustermatic 5 CD.
- Switch to the second terminal and kill nodeadd with Ctrl-C
- Run tail /etc/clustermatic/config to see the new
(uncommented) node addresses.
- Check the status of the cluster with bpstat
Make sure as many nodes are up as the number of slaves you
have. If you had not modified the nodes and iprange lines in
/etc/clustermatic/config to match the size of your cluster, you
would see the extra nodes harmlessly listed as down.
- Examine the log file from node 0 with less
/var/log/beowulf/node.0
Each node has its own log file in /var/log/clustermatic.
These log files only contain output from the final stages of
slave startup, after the second stage kernel has contacted the
master node.
- View the kernel messages from node 0 with bpsh 0 dmesg |
less
The bpsh command allows any binary installed on the master
node to execute on one or more slave nodes (see options in the
appendix). Interpreted scripts or programs requiring files found
only on the master node cannot be run via bpsh.
- Reboot all the slaves with bpctl --slave all --reboot
If you see any "Node is down" messages, these indicate that
fewer than the number of nodes given in /etc/clustermatic/config
were up when you issued the command.
- Log out.
There is more information about using ClusterMatic at the end of
this guide.
Part 4: Installing Sun Grid Engine
- Log into the system and open a terminal.
- Begin the installation with adding a user to run SGE. adduser
sgeadmin
- Change to the home directory of the sgeadmin user you just
created. cd /home/sgeadmin
- Insert the TCB Cluster CD into the Master Node.
- Unpack the Common and Platform specific packages off the CD into
the home directory of the sgeadmin user.
tar xzf /media/cdrom/sge-6.0u6-bin-lx24-x86.tar.gz
tar xzf /media/cdrom/sge-6.0u6-common.tar.gz
- Set your SGE_ROOT environment variable to the sgeadmin's home
directory. export SGE_ROOT=/home/sgeadmin
- Run the setfileperm.sh script provided to fix file permissions.
util/setfileperm.sh $SGE_ROOT
- Run gedit /etc/services and add the lines
sge_qmaster 536/tcp
sge_execd 537/tcp
in the appropriate place. Save the file.
- Run the QMaster installer.
# cd $SGE_ROOT
# ./install_qmaster
- Hit Return at the introduction screen.
- When choosing the Grid Engine admin user account hit y to
specify a non-root user, and then enter sgeadmin as the
user account. Hit return to progress.
- Verify that the Grid Engine root directory is set to
/home/sgeadmin
.
- Since we have set the ports needed from
sge_qmaster
and sge_engine
in a previous step, we should be able
to hit Return through the next two prompts.
- Hit return to set the name of your cell to "default".
- Use the default install options for the spool directory
configuration.
- We already ran the file permission script so we can hit yes and
skip this step.
- Since we are only going to have one execution host (cluster) we
can say y to them being in the same domain name.
- The install script will then create some directories.
- Use the default options for the Spooling/DB questions.
- When prompted for group id range, use the default range of
20000-20100
unless you have a reason to do
otherwise.
- Use the default options for the spool directory.
- The next step asks you to input an email address for the user who
should receive problem reports. Typically this will be the person
responsible for maintaining the cluster, but for now enter
root@localhost
- Verify that your configuration options are correct.
- Hit yes so that the qmaster will startup when the computer boots.
- The next step asks you to enter in the names of your Execution
Hosts (clusters). Say no to using a filename and then when
prompted for a host, enter localhost.
- The next thing that the configuration program will ask you to do
is to select a scheduler profile. Normal will work for most
situations, so that's what we'll use now.
- Our queue master is now installed. Run
. /home/sgeadmin/default/common/settings.sh
to set some environment variables up. Note, that you should add
this line into your login shell so have access to the grid engine
utilities.
- Each cluster must also have the execution host installed on it.
In this case our only cluster is the one we've been setting up.
Begin by running qconf -sh. If localhost is not listed as
execution host you will need to add it as a administrative host
by running qconf -ah hostname.
- Use . /home/sgeadmin/install_execd to start up the
Execution host configuration script.
- Like the qmaster installation we can use all of the default
options.
- After install_execd finishes running, use .
/home/sgeadmin/default/common/settings.sh to set our
environmental variables accordingly.
- Congratulations you now have a queuing system setup for your
cluster. Now to do some real work.
Appendix: Usage Options for Common Bproc Utilities
bpstat: monitor status of slave nodes
Usage: bpstat [options] [nodes ...]
-h,--help Display this message and exit.
-v,--version Display version information and exit.
The nodes argument is a comma delimited list of the following:
Single node numbers - "4" means node number 4
Node ranges - "5-8" means node numbers 5,6,7,8
Node classes - "allX" means all slave nodes with status X
"all" means all slave nodes
More than one nodes argument can be given.
Valid node states are:
down boot error unavailable up
Node list display flags:
-c,--compact Print compacted listing of nodes. (default)
-l,--long Print long listing of nodes.
-a,--address Print node addresses.
-s,--status Print node status.
-n,--number Print node numbers.
-t,--total Print total number of nodes.
Node list sorting flags:
-R,--sort-reverse Reverse sort order.
-N,--sort-number Sort by node number.
-S,--sort-status Sort by node status.
-O,--keep-order Don't sort node list.
Misc options:
-U,--update Continuously update status
-L,--lock "locked" mode for running on an unattended terminal
-A hostname Print the node number that corresponds to a
host name or IP address.
-p Display process state.
-P Eat "ps" output and augment. (doesnt work well.)
bpctl: alter state of slave nodes
Usage: bpctl [options]
-h,--help Print this message and exit
-v,--version Print version information and exit
-M,--master Send a command to the master node
-S num,--slave num Send a command to slave node num
-s state,--state state Set the state of the node to state
-r dir,--chroot dir Cause slave daemon to chroot to dir
-R,--reboot Reboot the slave node
-H,--halt Halt the slave node
-P,--pwroff Power off the slave node
--cache-purge-fail Purge library cache fail list
--cache-purge Purge library cache
--reconnect master[:port[,local[:port]]]
Reconnect to front end.
-m mode,--mode mode Set the permission bits of a node
-u user,--user user Set the user ID of a node
-g group,--group group Set the group ID of a node
-f Fast - do not wait for acknowledgement from
remote nodes when possible.
The valid node states are:
down boot error unavailable up
bpsh: run programs on slave nodes
Usage: bpsh [options] nodenumber command
bpsh -a [options] command
bpsh -A [options] command
-h Display this message and exit
-v Display version information and exit
Node selection options:
-a Run the command on all nodes which are up.
-A Run the command on all nodes which are not down.
IO forwarding options:
-n Redirect stdin from /dev/null
-N No IO forwarding
-L Line buffer output from remote nodes.
-p Prefix each line of output with the node number
it is from. (implies -L)
-s Show the output from each node sequentially.
-d Print a divider between the output from each
node. (implies -s)
-b ## Set IO buffer size to ## bytes. This affects the
maximum line length for line buffered IO. (default=4096)
-I file
--stdin file
Redirect standard in from file on the remote node.
-O file
--stdout file
Redirect standard out to file on the remote node.
-E file
--stderr file
Redirect standard error to file on the remote node.
bpcp: copy files to slave nodes
Usage: bpcp [-p] f1 f2
bpcp [-r] [-p] f1 ... fn directory
-h Display this message and exit.
-v Display version information and exit.
-p Preserve file timestamps.
-r Copy recursively.
Paths on slave nodes are prefixed by nodenumber:, e.g., 0:/tmp/
See Also
Clustermatic web site
(www.clustermatic.org)
and Clustermatic 5 README
BProc: Beowulf Distributed Process Space web site
(bproc.sourceforge.net)