The International Centre for Radio Astronomy Research (ICRAR) began in 2009 as a joint venture between Curtin University and The University of Western Australia. Based in Perth, Western Australia, ICRAR’s 110 employees are currently part of an international effort to develop the biggest radio telescope in the world, known as the Square Kilometre Array (SKA). During its 50-plus year lifetime, the SKA will expand our understanding of the Universe
Once operational, the SKA is expected to gather and process as much data from the sky every day as the world currently produces in a year. The SKA will use this data to make maps of the sky that scientists can use to study the Universe. A single SKA image could be as big as 600 TB, and each sky map will need thousands of images.
“We need to address computing challenges that are immeasurable,” says Kevin Vinsen, Research Associate Professor at ICRAR. “When it’s fully operational in the next decade, depending on the science case, the SKA might collect between 500 TB and 1 PB of imaging data every day. The sheer amount of raw compute power that we need to do that is mind-boggling.”
To amass compute resources for a series of preliminary experiments, ICRAR formed a community computing initiative called theSkyNet. This initiative allows ICRAR to use spare CPU cycles volunteered by the public to simulate a supercomputer. Vinsen and his colleagues then use the compute power generated by theSkyNet to analyze images of galaxies from the Pan-STARRS1 telescope in Hawaii as part of theSkyNet project.
Crowd-sourced computing projects often run into problems matching physical server capacity to the load of incoming data. ICRAR needed to run experiments using theSkyNet in a cost-effective and flexible way that would allow Vinsen and his team to obtain results quickly.
The scalable, on-demand nature of Amazon Web Services (AWS) made it a logical choice for the experiments needed to design the SKA. AWS can provide the resources that ICRAR needs to analyze vast amounts of imaging data. Vinsen won an AWS Grant in Education to start theSkyNet in 2012 and the project has grown to 40 teraFLOPs over the past year. A teraFLOP is equal to one trillion floating-point operations per second.
“We see cloud-based solutions and supercomputing facilities as complementary and expect that both will play a role in the processing, storage, and dissemination of the enormous volumes of data created by the next generation of observatories,” says Associate Professor Vinsen. “We want to be flexible and we can easily use AWS for our experiments in place of a dedicated supercomputer.”
ICRAR uses Amazon Route 53 to route all external users to its theSkyNet websites. The scientists then use one medium Amazon Elastic Compute Cloud (Amazon EC2) instance and on-demand Amazon Machine Images (Amazon AMIs) to process theSkyNet’s crowd-sourced CPU cycles, and another small Amazon EC2 instance as a network file server.
To store imaging data, ICRAR mounts two 60 GB Amazon Elastic Block Store (Amazon EBS) volumes and archives the data using Amazon Glacier. The ICRAR team also uses Amazon Simple Storage Service (Amazon S3) as a key store to show volunteers the galaxies that the processing power of their PCs are helping to analyze. Figure 1 demonstrates theSkyNet on AWS.
ICRAR set up theSkyNet project on AWS in only four days. The team can now quickly and efficiently expand the cloud infrastructure as the public volunteer more CPU cycles to support the initiative.
“The scalability of AWS has been enormously helpful,” says Associate Professor Vinsen. “I can add more capacity as I need it with minimal fuss. Using AWS allows us to process upwards of 150 GB of sky images and store more than 400 GB of imaging data every month.”
By using Amazon S3 as a key-value store, ICRAR is able to seamlessly index and manage input from hundreds of thousands of public CPUs around the world. Amazon ELB helps ICRAR manage the flow of data to and from theSkyNet community.
ICRAR uses Amazon EBS to store upwards of 400 GB worth of imaging data monthly as it is processed by the community. Amazon EC2 provides the compute capacity for ICRAR to analyze data from 400 and 500 galaxies simultaneously.
The project has proved to be enormously popular and shortly after the migration to AWS, online communities in Russia, America, and Australia overloaded ICRAR’s theSkyNet server. However, it only took Associate Professor Vinsen two hours to add additional capacity. “Other community computing projects have taken days to recover from overloads because they have to find more infrastructure resources to bring up new servers,” he says. “With AWS, I can just provision a bigger instance.”
ICRAR plans to use AWS to meet the ongoing computing requirements of future experiments within theSkyNet.
To learn more about how AWS can help your data needs, visit our Big Data details page: http://aws.amazon.com/big-data/.