Five ways to use AWS for research (starting right now)

If you are a scientific researcher, you are likely more interested in getting your research done than in the computational resources that you use to do it. You may think about ways to continue your research remotely with the rise in remote work. Did you know the cloud and Amazon Web Services (AWS) can accelerate your research and time to science? Here are five ways:

1. Get the workstation you need, when you need it

Computational needs and research directions change over time, making it hard to predict what platforms you will need even within a typical grant cycle. For example, GPU acceleration and machine learning can help in many research fields, but many labs do not have access to a GPU.

AWS can launch virtual machines with different underlying hardware and software on demand, like through Amazon Elastic Compute Cloud (Amazon EC2) virtual machines (or instances). This means you can use a web interface to create a fully configured machine in a few minutes. You can launch an inexpensive instance suitable for interactive work with the correctly installed scientific code, stop it, and restart with multiple cores and hundreds of GB of memory to run your application. Or maybe you need a Windows machine to run a critical application that you cannot recompile on Linux. All Amazon EC2 instances include access to NICE DCV, a high-performance remote display protocol that gives you a full remote desktop, so you can use familiar graphical interfaces and visualization tools.

Learn how to launch your own virtual machine in this Quick Start Guide to launching an Amazon EC2 instance.

2. Share your research workflow with others and replicate/reproduce scientific results

Scientific reproducibility of research is a critical requirement of research that is difficult to achieve when using shared computers for overlapping analyses. The National Institutes of Health (NIH) requires a plan for ensuring scientific rigor and reproducibility of results. And reviewers may ask for changes to an analysis done a year ago, by students who have graduated, on machines that were upgraded, using data that has been saved somewhere safe.

In my former lab at the University of Washington, we used to struggle with keeping analytic environments identical while also taking advantage of upgrades, bug fixes, and operating systems patches that changed those environments. Amazon EC2 instances (including the entire machine image and project data) can be saved in long-term object storage like Amazon Simple Storage Service (Amazon S3)—and then restored when needed. The entire machine can be shared with another lab that wishes to reproduce your workflow with different data.

3. Store and share your data

Research data is valuable and increases in volume with every analysis that generates new results. In some research fields, the data used for computation are too large to store locally. Sometimes data is not comprehensively or frequently backed up. It can be difficult to share data with other researchers without copying it. Amazon S3 is a durable way to store, save, and share research data.

You can also make data available on AWS Data Exchange. Many datasets are available on the Registry of Open Data on AWS, like research from the Allen Institute for Brain Science.

Learn how to store and retrieve files in S3 in this Quick Start Guide.

4. Have your own supercomputer for a day

If you submit compute jobs using your institution’s high performance computing (HPC) center using your favorite scheduler (such as Slurm, SGE, or Torque), you may have to wait in the queue even as the deadline for a conference abstract is approaching. Using AWS ParallelCluster, you can create a cluster to run large workloads, like genomics, that are largely independent.

The University of Sydney, together with RONIN, used this capability to sequence the genome of the Tasmanian devil and other endangered species, helping conservationists to ensure their survival. The team, led by Dr. Carolyn Hogg, was able to complete 18 months of work in six weeks on AWS, and save more than four weeks of queue wait for national supercomputing resources.

You can also use AWS ParallelCluster for more traditional, tightly coupled HPC applications that use MPI, such as the US Naval Research Laboratory did with their atmospheric modeling application. Their performance on AWS had similar scaling to their Cray supercomputer.

Learn how to start your own supercomputer on AWS.

5. Access the latest tools for analytics, artificial intelligence (AI), and machine learning (ML)

The cloud gives researchers access to the most current compute architectures to keep up with scientific advances, including innovations in analytics, AI, and ML. AWS has tools for managing databases (such as Amazon Athena) and pooling data from multiple sources into a data lake (such as AWS Lake Formation), which can be queried. You can also quickly set up a Research, Electronic and Data Capture (REDCap) environment for collecting and organizing your research data.

In biomedical research, analytic tools can enable large-scale endeavors to combine genomic data, clinical data, and behavioral data to support research in precision medicine, such as those envisioned by the NIH STRIDES initiative.

Researchers can also take advantage of a broad set of AI and ML services, from accessing Amazon Machine Images that are preconfigured with the latest CUDA drivers for GPU-acceleration and popular ML frameworks like PyTorch and TensorFlow, to higher level services for text analysis like with Amazon Comprehend, timeseries analysis like with Amazon Forecast, and computer vision and recommendations. You can also launch a Jupyter notebook to access resources for training ML models like on Amazon SageMaker. These higher level services are particularly useful because they are fully managed—meaning they handle the underlying compute resources for you in a scalable way, to be fault tolerant and highly available.

An interdisciplinary team of scientists at Duke University, led by Geraldine Dawson, the director of the Duke Center for Autism and Brain Development, and Guillermo Sapiro, a professor of electrical and computer engineering, used ML and computer vision on AWS to explore the potential of an early application-based screening tool for autism. Children with autism are less attentive to social cues, so researchers used coded facial movements from a child watching videos with different social and non-social stimuli to train an ML model. Their application was almost 90 percent accurate for some behaviors, an improvement from 50 percent accuracy with questionnaires.

Learn how to get started with AI and analytics tools.

Learn more about research and technical computing on AWS, read more stories on research on AWS, and contact us. And register to attend the AWS Public Sector Summit Online on June 30 to learn more about the cloud for research in sessions like “Processing data and enabling research remotely.”