Genomics labs can unlock new frontiers by using AWS to access powerful computing tools without the need for costly, space-consuming infrastructure. You can access the resources necessary to analyze big data genomic pipelines, store petabytes of data, and share your results with collaborators around the world without needing any more hardware than a standard off-the-shelf computer.
AWS provides all of the basic building blocks necessary to create your own cloud supercomputer. You can create an account and launch your first machine in minutes, without the need for specialized hardware or extensive facilities to host servers, and at the scale you need to complete your project.
Share your data with your collaborators whether they are down the hall or on the other side of the globe. AWS can provide a central, shared workspace where you and your colleagues can store and analyze the data with the analysis tools of your choice.
AWS has the most expansive ecosystem of tooling available for you to build and run your genomics pipeline. Whether it is free public data sets like The Cancer Genome Atlas (TCGA), or containers for you to use in your analysis, you can rely on AWS as your one-stop shop.
“For us to maintain a real-world data platform would be prohibitively expensive. [AWS allows us] to scale up our experiments and try out our new software on realistic configurations of hundreds or even thousands of computers.”
Michael Franklin, Professor, Computer Science and Director, AMP Lab, UC Berkeley
“The AWS Cloud enables swift collaboration even with hundreds of terabytes of data.”
Dr. Narayanan Veeraraghavan, Lead Programmer Scientist, Genome Sequencing Center, Baylor College of Medicine
“The whole ecosystem of the tools that are developed around AWS APIs, like the cookbooks that we use to launch infrastructure....helped us a great deal.”
Ravi Madduri, Research Fellow and Project Manager, University of Chicago
You can use AWS to access controlled repositories such as the NIH Database of Genotypes and Phenotypes (dbGaP). AWS has all the tools you need to address the security and compliance requirements for working with these sensitive data sets, like GATK and Galaxy.
AWS has published a whitepaper that describes how to work with controlled data sets using AWS. Download the AWS dbGaP whitepaper »
You can quickly and securely build genomics pipelines by taking advantage of AWS Cloud based platform offerings from popular vendors. Working with these platforms make it easy to leverage built-in standardized tooling or create your own custom tooling to analyze both public data sets (like TCGA or ICGC) or your own data. If you are planning to analyze a large population, you can use the platform’s tools or transfer the variant call files along with other phenotypic data into your own custom-built population analysis infrastructure running on AWS, built with Amazon Redshift or Apache Spark.
Some suggestions for other areas to explore to learn about genomics in the AWS Cloud: