AWS Official Blog

Scaling to the Stars

by Jeff Barr | on | in Amazon EC2 | | Comments

Recently I blogged about The Server Labs, a consultancy that specializes in high-performance computing including on Amazon Web Services.

Heres another story that I found fascinating: nominally it is about how The Server Labs uses Amazon Web Services as a scale-out solution that also implements Oracle databases; however its really about space exploration (or should I say nebula computing). It began with an email asking whether there would be a problem running up to 1,000 Amazon EC2 High-CPU Extra-Large instances.

The Server Labs is a software development/consulting group based in Spain and the UK that works closely with the European Space Agency, and they needed to prove the scalability of an application that they helped build for ESA’s Gaia project. In addition to the instances, they also requested 2 large and 3 X-Large instances to host Oracle databases that coordinate the work being performed by the high-CPU instances.

Gaias goal is to make the largest, most precise three-dimensional map of our Galaxy by surveying an unprecedented number of stars – more than one billion. This, by the way, is less than 1% of all stars! The plan is to launch a mission in 2011, collect data until 2017; and then publish a completed catalog no later than 2019.

I had the opportunity to see a PowerPoint deck created and presented by The Server Labs founder, Paul Parsons, and their software architect, Alfonso Olias, who is currently assigned to this project.

The deck explained that the expected number of samples in Gaia is 1 billion stars x 80 observations x 10 readouts, which is approximately equal to 1 x 1012 samplesor as much as 42 GB per day transferred back to Earth. Theres a slide in the deck that says Put another way, if it took 1 millisecond to process one image, the processing time for just one pass through the data on a single processor) would take 30 years.

As the spacecraft travels, it will continuously scan the sky in 0.7 degree arcs, sending the data back to Earth. Some involved algorithms will come into play in order to process the data; and the result is a fairly complex computing architecture that is linked to an Oracle database. Scheduling the cluster of computational servers is not quite so complicated, and is based on a scheduler that is focused on keeping each machine as busy as possible.

However the amount of data to process is not steadyit will increase over time. Which means that infrastructure needs will also vary over time. And of course idle computing capacity is deadly to a budget.

The opportunity to solve large computational problems usually turns to grid computing. No difference this time either except that as mentioned above, the required size of the grid is not constant. Because Amazon Web Services is on-demand, its possible to apply just enough computational resources to the problem at any given time.

In their test, The Server Labs set up an Oracle database using an AWS Large Instance running a pre-defined public AMI. Then they mounted 5 EBS volumes of 100 GB each, and mounted them to the instance.

Then they created Amazon Machine Images (AMIs) to run the actual analysis software. These images were based on large instances and included Java, Tomcat, the AGIS software and an rc.local script to self-configure an instance when its launched.

The requirements break down as follows:

To process 5 years of data for 2 million stars, they will need to run 24 iterations of 100 minutes each, which works out to 40 hours running a grid of 20 Amazon EC2 instances. A secondary update has to be run once and requires 30 minutes per run, or 5 hours running a grid of 20 EC2 instances.

For the full 1 billion star project numbers extrapolate out more or less as follows: They calculated that they will analyze 100 million primary stars, plus 6 years of data, which will require a total of 16,200 hours of a 20-node EC2 cluster. Thats an estimated total computing cost of 344,000 Euros. By comparison, an in-house solution would cost roughly 720,000 EUR (at todays prices) which doesnt include electricity or storage or sys-admin costs. (Storage alone would be an additional 100,000 EUR.)

Its really exciting to see the Cloud used in this manner; especially when you realize that an entire set of problem solutions that were beyond economic possibility before the Cloud became a reality.