Linux PC Clusters
The TCB Group has used computing clusters and various other computing hardware since 1993 to perform its molecular dynamics analysis. We have been using Linux PC clusters since 1998. The following describes our existing computing environment.
The Ariel cluster, consisting of three 24-node dual-processor Xeon systems, went into production in June 2004. The machines are housed in Sun Fire 900 racks; only the head nodes are UPS-backed. The cooling is provided by a pair of 5-ton Liebert air conditioners.
The TCB Group also leverages the power of Graphical Processing Units (GPUs) found video cards common to every desktop computer. These GPUs are often many times faster than CPUs when running GPU optimized code, such as VMD and NAMD. The GPU cluster consists of 8 Sun Ultra 24 workstations each with 4 CPU cores, 8 gigabytes of memory and a nVidia C1060 graphics accelerator. These workstations are connected via an InfiniBand network for internal data and communications traffic.
An additional GPU cluster consisting of Supermicro workstations each with 12 CPU cores, 72 gigabytes of memory, and two nVidia C2050 graphics accelerators went into production in January 2011. This cluster also utilizes InfiniBand networking.
In addition to computing clusters, the TCB Group utilizes large single servers each with large amounts of memory and CPU cores useful for large memory simulations. The full list is available here.
All facilities (power, cooling, space, and building support) are generously provided by the Beckman Institute.
- December 1993: Group's first cluster, using HP 735 workstations and an ATM interconnect, is installed.
- July 1998: First eight nodes, switch, etc. installed for Oberon.
- August 1998: Additional eight nodes of Oberon installed.
- August 2000: Oberon retirement begins, as its nodes are moved to desktops.
- January 2001: First eighteen nodes, switch, etc installed for Titania.
- February 2001: Additional fourteen nodes installed for Titania, bringing total up to 32. puck is installed with 4 nodes. Old Oberon cluster retirement completed.
- April 2001: First sixteen nodes, switch, etc installed for the second part of the Titania cluster, known as oberon. An additional sixteen nodes have been ordered, and will arrive within a few weeks.
- May 2001: a third cluster of 32 nodes, known as umbriel, is ordered for the Titania cluster.
- June 2001: Titania cluster completed with the installation of the second sixteen nodes of oberon and all of umbriel.
- August 2001: Half of puck is retired as desktops; the resulting space is used for portia.
- May 2002: titania is split into two clusters of 16, the second of which is named ariel.
- April 2003: the first 24 nodes of the new Umbriel cluster, made up of dual-processor Athlon systems, is ordered.
- May 2003: the first 24 nodes of the new Umbriel cluster arrive and are installed as umbriel. Two more rounds of 24 nodes are ordered, to be named miranda and caliban. One of the Titania clusters is de-racked for use as desktops.
- June 2003: miranda and caliban arrive and are installed, completing Umbriel. The remaining Titania cluster nodes are retired, with the intention of moving some systems to the front machine room when power becomes available.
- October 2003: 16 nodes of Titania are installed in 3137 as darwin.
- March-April 2004: additional machine room space is offered by Beckman, and three rounds of 24 dual-processor Xeon systems are ordered to fill it. These will make up the Ariel cluster.
- June 2004: Ariel cluster configuration is complete.
- Sep 2005: darwin, made of the remaining nodes of the titania cluster, is retired; its nodes are used to run a series of cluster building workshops.
- June 2008: Cancun, a SunFire X4600 M2 (16 CPU cores and 128 gigabytes of memory) is put into production.
- June 2008: TCB's first GPU cluster (eight Sun Ultra 24s with nVidia GTX 9700s and Infiniband is bulit.
- June 2009: Cancun is upgraded to 256 gigabytes of memory.
- June 2009: The Sun Ultra 24 GPU cluster is upgraded with nVidia C1060 GPUs.
- June 2009: Cardiff, a SunFire X4600 M2 (32 CPU cores and 256 gigabytes of memory) is put into production.
- October 2010: Canberra, a large memory compute server (48 CPU cores and 256 gigabytes of memory) is put into production.
- December 2010: Harare, a large memory compute server (48 CPU cores and 256 gigabytes of memory) with an nVidia S2050 (four GPUs) is put into production.
- January 2011: A GPU cluster with 8 Supermicro workstations (each with 12 CPU cores, 72 gigabytes of memory, and two nVidia C2050 graphics cards) with InfiniBand networking is completed.
- February 2011: Abijan, Asmara, Philo, Paria, and Rockford, all large memory compute servers (each with 48 CPU cores and 256 gigabytes of memory) are put into production.
- March 2012: Aegir, Fenrir, Narvi, Surtur, Thrymr, Hyperion, Lagos, Multan, Riyadh, Tarvos, and Vladivostok, all compute servers (each with 64 CPU cores and 64 gigabytes of memory) are put into production.
The TCB Group has given several tutorials on cluster building:
- TCB Cluster Building Workshops
- NAMD Linux Cluster Tutorial 2003
- Part of the Summer School on Theoretical and Computational Biophysics.
- Lecture: 11 Jun 2003, 3:45-4:30pm
- Hands-On Sessions
- Tutorial Handouts
- Session One: 11 Jun 2003, 4:30-6:30pm
- Session Two: 12 Jun 2003, 3:30-5:30pm
- Session Three: 12 Jun 2003, 5:30-7:30pm
- NAMD Linux Cluster Tutorial 2001
- Part of NCSA's Linux Revolution Conference.
- Lecture/Session One: 19 Jun 2001
- Lecture/Session Two: 25 Jun 2001