Papadopoulos, Philip M.
Extending clusters to Amazon EC2 using the Rocks toolkit
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 25:317-327, AUG 2011

In 2006, Amazon introduced their elastic computing cloud (EC2) where customers could rent, by the hour, Xen-based virtual machines hosted in Amazon's data center. In this so-called infrastructure as a service (IAAS) cloud, users have full root-level access to virtual machines so that they can fully customize and optionally publish machine images. The generally accepted approach to provisioning within Amazon is to first start with a base image already residing within the cloud. Then, the user customizes this base configuration to match their specific requirements. While this works well for very simple, standalone software configurations, this approach has users starting with a black box of software (the base image) and then adding/modifying this system using techniques that range from very rigorous (excellent system configuration techniques) to completely ad hoc methods. A quick survey of public machine images available in Amazon's cloud shows a growth of 28% from September 2010 (similar to 5500 images) to December 2010 (similar to 7050 images). The sheer number of public images makes the selection of the base configuration all that more daunting for the non-expert system administrator. In 2004, we introduced Rocks cluster toolkit rolls as pluggable, programmatic components to extend the definition of a Beowulf-style cluster. In contrast to the black box characteristics of most cloud images, we describe how the Rocks EC2 roll automatically handles the specific administrative changes needed to make any Rocks-defined computing appliance bootable within the EC2 infrastructure. When coupled with the Condor roll, it becomes straightforward to build an extended cluster as a single Condor pool with local job submission. This extended pool consists of both the user's local cluster infrastructure (head node and local worker nodes) and EC2 nodes. Because the EC2 nodes are configured identically to their local counterparts, users need not modify their job submission scripts, executable paths or other parameters of their jobs simply to take advantage of cloud resources. These extensions are included in the EC2 and Condor rolls and enable the systematic reuse of software that has already been developed for clusters. These systems can function in the cloud as either standalone entities or as integrated components of a user's existing cluster. With this work we demonstrate that cloud computing does not require entirely new approaches to systems definition and use.

DOI:10.1177/1094342011414747

Find full text with Google Scholar.