[cluster-l] another system summary
rdrobert at uiuc.edu
Tue Jul 31 13:08:07 CDT 2007
In the spirit of Nils's passing along information about his new cluster...
Jerry Nelson and I have been putting together a cluster in the Agricultural and Consumer Economics department here at UIUC. We are doing statistical / econometric / empirical / data-driven models (pick whichever adjective you like best) of land use. Specifically, we are trying to fit multinomial-logit and feed-forward neural network models for (predictively) classifying land use based on the attributes of the land. Since we like to do this at a reasonably fine resolution (think 30 meter squares), this adds up to a lot of data. For example, storing everything as 4-byte floats, data for the island of Sumatra in Indonesia takes up about 13GB. The problem is an "embarrassingly parallelizable" one and does not involve passing around much information other than the initial distribution of data to work on. We have written up our algorithms in java (for various minor reasons) which is handy for running them on different platforms.
At various junctures in the past and present, we have also had allocations on NCSA's tungsten, but have always been frustrated by the over-allocation and subsequent difficulty of getting jobs through the queues. That provided the primary motivation for putting together a mini-cluster of our own.
We have been going the basement bargain route. In general, we expect to be processor bound rather than memory or hard-drive bound. Hence, the strategy is to maximize the number of processors while minimizing memory and hard-drive space. Indeed, it seems that it will be better to get more, slower processors for the same money than fewer, faster processors. After fooling around testing various combinations, this is what we have ended up with.
Cluster Level Costs:
~$100 * 2 = $200 -> A couple shelving "racks" to put the nodes on
? ~$100 -> several power strips
~$60 * 2 = $120 -> A couple unmanaged network switches
sub-TOTAL = $420
Node Level Costs (prices from newegg.com):
$43 -> motherboard [JetWay JP4M9MP LGA 775 VIA P4M890 Micro ATX Intel Motherboard - Retail]
$40 -> 1GB memory (Transcend 1GB 240-Pin DDR2 SDRAM DDR2 533 (PC2 4200) Desktop Memory Model JM533QLJ-1G - Retail)
$117 -> Core 2 Duo CPU (Intel Core 2 Duo E4300 Allendale 1.8GHz LGA 775 Processor Model BX80557E4300 - Retail)
$35 -> box and power supply
$37 -> 80GB hard drive (EXCELSTOR Jupiter Series ESJ8080S 80GB 7200 RPM SATA 3.0Gb/s Hard Drive - OEM)
sub-TOTAL = $272
So far, we have 10 nodes like those described above, 2 more that are very similar (from the testing phase), and 2 more scavenged computers that are half as fast, but still useful.
Assuming the 12 nodes, the cost is roughly: $420 + 12 * $272 = $3684 so far. We plan to pick up roughly 10 more nodes for a total cost of around $6000.
For software, we have been using exclusively free stuff. We run a minimal ROCKS installation (kernel, os, base, ganglia, hpc, webserver). We manually install java on the head-node and all compute-nodes. Then all the other software and scripts are written in house. At the moment, we are not using any queue-ing software. We have a very small number of users (at most 2) and thus find it easier just to be civil at this point.
Our extra "scavenged" computers happen to be Celerons which are 32-bit. So far, we have been too lazy to figure out how to "cross-kickstart" them as ROCKS nodes. The simple solution is to just install some version of Linux (we have used Fedora 7) and then just create all the directories that ROCKS normally puts on the compute nodes for the main local partition (/state/partition1/), set up NFS to share all the goodies that we normally share between the front-end and the compute nodes, put a symbolic link for the .ssh directory so that you can do password-less logins, and install the ganglia monitor (stealing the configuration file from a "real" compute node). It's kinda ugly, but it works like a dandy. (Oh, and you have to have "insert-ethers" running on the front-end the first time you boot up the fake nodes so that they will be assigned an ip-address and name via dhcp. insert-ethers will complain that it was unable to kickstart the new nodes, but no harm done...) As long as!
he .ssh directories are linked up, even cluster-fork will still work.
As a performance comparison, we have made a little benchmark program that just runs a few iterations of a neural network solver on a bit of real data. It turns out that the nodes in our mini-cluster run roughly 2.2 times as fast as the nodes on tungsten. Part of the difference is naturally newer hardware. The other possible source of disparity is that the tungsten nodes are 64-bit Xeon processors, but for some reason their operating system won't let you install 64-bit java. So, we are left running 32-bit java on a 64-bit OS and CPU. Furthermore the tungsten nodes are dual processor, but we get much less than perfect parallelism: 2 identical tasks run 74.9% as fast running only one at a time. In contrast, the nodes in our mini-cluster (dual core on one chip) seem to run right around perfect parallelism (sometimes better than perfect, sometimes worse: probably just the noise in the system).
Overall, we now have a computing capacity that is on the same order as the amount we were ever able to consistently obtain from tungsten. When we add more nodes, we will have more power than we were able to get our hands on. Additionally, we do not have to fight through the queue and control of the cluster is only bounded by our competence (or lack thereof).
Hope you enjoyed the blurb.
Post-Doc, Agricultural and Consumer Economics
More information about the cluster-l