AWS Government, Education, & Nonprofits Blog

The Italian National Institute of Astrophysics Explores the Universe with the Cloud

The National Institute for Astrophysics (Istituto Nazionale di Astrofisica or INAF) is an Italian institution that conducts scientific research in astronomy and astrophysics. INAF research ranges from the study of the planets and minor bodies of the solar system to the large-scale structure of the Universe.

Recently, INAF has been involved in two large projects where they turned to the Amazon Web Services (AWS) Cloud: the ESO Extremely Large Telescope (E-ELT) and Cherenkov Telescope Array (CTA).

Is there complex life outside of Earth?

The first project is the design of the ultra-high resolution spectrograph HiReS for the ESO Extremely Large Telescope (E-ELT). Thanks to the unprecedented quality of the data and the accurate structural stability, researchers will be able to detect bio signatures in the atmosphere of planets outside of our solar system for the first time. The aperture of the European Extremely Large Telescope will give them the ability to detect the presence of complex life outside of Earth and to complete a census of the composition of Earth-like planets that orbit their host star at a distance that allows it to sustain life. This system is complex and the simulations required to assess its potentiality produce TBs of data each.

The second project involves scientific simulations of the Cherenkov Telescope Array (CTA), a large facility that will observe galactic and extragalactic sources that irradiate photons in the band of gamma rays, allowing for the study of ultra-high energy physics. As in the previous scenario, each simulation of CTA collects TBs of data in each run.

Both projects require a large amount of computational power to handle TBs of data. Each simulation of HIRES requires a million GPU hours and produces more than 5 TB of raw data, while in the case of CTA, each simulation requires more than 300,000 CPU/hours to produce events and process data on the cloud for more than 60 TB each time.

INAF evaluated the possibility of procuring the necessary hardware to perform these computing tasks, but the Total Cost of Ownership (TCO), coupled with the on-demand nature of this research, led them to the cloud.

AWS Cloud for on-demand computing

For both E-ELT and CTA, the team used Amazon Elastic Compute Cloud (Amazon EC2) to perform the large-scale calculations seen in Figure 1 and 2. For both projects, INAF used Amazon Simple Storage Service (Amazon S3) for the storage of the processed data, and AWS Lambda and Amazon Simple Queue Service (Amazon SQS) for managing the flow and tasks between EC2 instances. The availability of long-term storage with Amazon Glacier allowed the team to store data cost-effectively.

Figure 1 – AWS Architecture for ESO-HiReS simulation. Input coming from the spectrograph design are uploaded to Amazon S3. Then, AWS Lambda initiates EC2 g2x.large instances to perform a CUDA simulation and then the results are stored back on S3.

Figure 2 – AWS architecture for CTA simulations. As in the case of HIRES, the architecture provides triggers from S3 as soon as the input for simulations are uploaded. An Amazon SQS FIFO queue is used to dispatch simulations between EC2 instances. Then, the processed data is sent back to S3. They make use of Docker to containerize the software and Amazon Glacier for long-term storage.

“Thanks to AWS, we were able to concentrate on science and simulations. We were able to scale as soon as the project required us to do so. It was critical to obtain the required power quickly,” said Marco Landoni, Reasearch Fellow, INAF. “AWS services like SQS and Lambda allowed us to deliver the architecture in the fastest way possible, producing hundreds of TB of data and consuming millions of CPU or GPU hours with almost no impact on the allocated budget for each project.”

Learn more about the AWS Region in Italy that will open in early 2020.