Scaling Science: 1 Million Compute Hours in 1 week
For many scientists, the computer has become as important as the test tube, the centrifuge or the grad student in delivering ground breaking research. Whether screening for active cancer treatments or colliding atoms, the availability of compute cycles can significantly affect the time it takes for scientists to crunch their numbers. Indeed, compute resources are often so constrained that researchers often have to scale back the scope of their work to fit the capacity available.
Not so with Amazon EC2, where the general purpose, utility computing model is a perfect fit for scientific workloads of any scale. Researchers (and their grad students), can access the computational resources they need to deliver on their scientific vision, while staying focused on their analysis and results.
Scaling up at the Morgridge Instutute
Victor Ruotti faced this exact problem. His team at the Morgridge Institute at the University of Wisconsin-Madison are looking at the genes expressed as template cells, stem cells, start to take on the various special functions our tissues need, such as absorbing nutrients or conducting nervous impulses. Impressive and important work, which has large computational requirements: millions of RNA sequence reads and a data footprint of 78 TB.
Victor’s research was selected as the winner of Cycle Computing’s inaugural Big Science Challenge, and using’s Cycle’s software ran through the 15,376 alignment runs on Amazon EC2, clocking up over a million compute hours in a week, for just $116 an hour.
A Century of Compute
Over 1,000,000 compute hours, 115 years of work for a single processor, were used to build the genetic map the team needed to quickly identify which regions of the genome are important for establishing cell types which have clinical importance. The entire analysis started running on Spot instances in just 20 minutes, on high memory instance types (the M2 class), meaning that the team could use Cycle Server to stretch their budget further and build an extremely high resolution genetic map. The spot price was typically 12 times lower than the equivalent on-demand price, and their cluster ran across an average of 5000 instances (8000 at peak), for a total cost of $19,555. That’s less than the price of 20 lab pipettes.
Cycle Computing on the AWS Report
Our very own Jeff Barr was lucky enough to spend a few minutes chatting with Cycle Computing CEO, Jason Stowe for the AWS Report. Here is the episode they recorded:
Cycle also have a blog post with some more information on this, and the 2012 Big Science Challenge.
We’re very happy to see the utility computing platform of AWS be used for such ground breaking work. If you’re working with data and would like to discuss how to get up and running at this, or any other scale, I do hope you’ll get in touch.
If you would like to know more I’ll be hosting a webinar on big data and HPC on the 16th of October. We’ll discuss some customer success stories and common best practices for using tools such as Elastic MapReduce, DynamoDB and the broad range of services on the AWS Marketplace to accelerate your own applications and analytics.
Registration is free. See you there.