Genomics in the Cloud
Simplify and securely scale your genomic analyses with AWS.
AWS provides you with the inherent scalability and an ecosystem of partners and tools that are prepared to handle sensitive data and workloads. AWS customers can accelerate their genomics insight and build a bridge from their existing on-premise infrastructure to the cloud.
With AWS, you can efficiently and dynamically store and compute your data, collaborate with peers, incorporate analytics and machine learning, and integrate your findings into clinical practice.
Benefits
Increase the pace of scientific discovery
AWS enables genomics customers to derive actionable insights from large, complex data. You don’t need to make large upfront investments in time and money to build and maintain infrastructure. AWS gives you flexible and cost-effective access as many resources as you need, almost instantly, and you only pay for what you use.
Ensure scalability & provide dynamic resourcing
Running genomic analysis pipelines on AWS means you can efficiently scale to meet your demand, then scale back down again when the demand is gone. AWS also provides some alternative pricing and computing methods to complement genomic testing.
Customize & optimize your workflows
From building your genomics pipeline to the integration of genomic findings into diagnostic treatment patterns, AWS has a broad ecosystem of partners that you can work with to optimize and customize your workflows. This ecosystem provides you a variety of flexible options and allows you to build optionality for your solutions.
Cromwell on AWS
Cromwell, a workflow management system from Broad Institute, is now supported in the AWS Cloud. With Cromwell on AWS, researchers and scientists will now have even more flexibility in scaling genomics experiments using computing capabilities in the cloud – instead of contending for limited on premises resources.
Genomics use cases
-
Containers
-
Workflow Management
-
Big Data Analytics
-
Data Sets
-
Collaboration
-
Containers
-
Containers for genomics pipeline
To make your genomics pipeline easier to distribute and execute, you can encapsulate your processes and run containers in the AWS Cloud. Configure your own plug-n-play workflow architecture and build an environment specific to your workflow and research needs. Using Amazon EC2 Container Service (ECS) or running Docker on AWS you can solve your larger genomics problem as smaller parts, making the data output reproducible, and the data easier to share.
Related products
Related resources
Architecture video
Learn about powerful and reusable application pipelines built in AWS by Human Longevity, Inc. in this architecture video. See how they process up to 12TB per day of raw data in Amazon S3 with custom analytics tools running in Docker containers.
Architecture diagram
Explore the detailed architecture diagram from Human Longevity, Inc. for utilizing Docker on AWS for genomic analysis
Customer case study
In this case study, learn how Benchling cut their search times by 90% and scaled to hundreds of genomes using AWS Lambda.
-
Workflow Management
-
Workflow management
To help make your genomics pipeline more efficient to manage, you can design workflow management rules in the AWS Cloud. Compose and execute a series of computational or data manipulation steps, and optimize the parallelization of jobs to accelerate your time to finish.
Related products
Related resources
Customer case study
In this case study, learn how Baylor College of Medicine worked with DNAnexus to move their genome analysis pipelines to the AWS Cloud. See how they powered large clinical genomic studies that required a secure and compliant environment
Customer case study
In this case study, learn how DNAnexus built a platform for genomic analysis on AWS. See how the combination of AWS infrastructure and DNAnexus platform controls and certified compliance allowed them to meet the demanding requirements of HIPAA, CAP/CLIA, GxP, and other privacy laws and regulations.
-
Big Data Analytics
-
Big data analytics for genomics
Genomics organizations are facing a data tsunami from what is generated from their genomics pipelines. To make this data more actionable, you can deploy AWS components to support your entire analytical pipeline from data ingestion and analysis, through to visualization, storage, warehousing, and archiving. Gain the flexibility to choose from a wide breadth of database services to fit your needs and get the best results.
Related products
Related resources
Architecture video
In this architecture video, see how UC Santa Cruz analyzes petabytes of genomics data using a low-cost solution with Docker containers and EC2 spot instances.
Customer case study
In this case study, learn how the Guttman lab at the California Institute of Technology utilizes High Performance Computing (HPC) clusters on AWS to reduce its genomics computing time from weeks to days.
Customer case study
In this case study, learn how GENALICE was able to process the complete genomes of 800 patients in 60 minutes using AWS high performance compute services.
-
Data Sets
-
Working with public & private data sets
With AWS you can access your own private data sets or controlled repositories, such as the NIH Database of Genotypes and Phenotypes (dbGaP) or the Cancer Genome Atlas (TCGA) among others. Use the toolset of your choice (like GATK or Galaxy) to analyze your data. AWS has all the tools you need to address the security and compliance requirements for working with these sensitive datasets, including built-in features to encrypt your data at rest or in-transit.
Related products
Related resources
Customer case study
In this case study, learn how the The Icahn School of Medicine at Mount Sinai analyzes and shares huge genomic data sets with external collaborators while maintaining stringent privacy and security controls on AWS.
Whitepaper
Architecting for Genomic Data Security and Compliance in AWS
Learn how to work with controlled-access data sets from dbGaP, GWAS,and other individual-level genomic research repositories
Customer case study
In this case study, learn how Illumina massively scales its DNA sequencing technologies using AWS. See how they support the Illumina BaseSpace Sequence Hub and store 10 PB of genomics data using products like Amazon RedShift
AWS public datasets program
AWS public datasets program covers the cost of storage for publicly available, high-value cloud-optimized datasets. Search for AWS hosted public genomic datasets using the genomic registry search tool. AWS public genomic datasets include:
The Cancer Genome Atlas (TCGA) Open Data Set »
International Cancer Genome Consortium (ICGC) Open Data Set »
1000 Genomes Project Open Data Set »
-
Collaboration
-
Fostering collaboration
Share your data with your collaborators whether they are down the hall or on the other side of the globe. AWS can provide a central, shared workspace where you and your colleagues can create datasets, write algorithms, or create tools, all without having to physically move the data back and forth or worry about intellectual property infringement.
Related products
Related resources
Customer case study
In this case study, learn how ThermoFisher provide its customers with a scalable and secure platform on which to conduct research, collaborate, and improve medical treatments for patients by using the AWS Cloud globally.
Customer case study
In this case study, learn how Celgene enable secure collaboration between its own researchers and academic research labs using AWS, and enabled teamwork by providing isolated access for researchers from different organizations.
Customer case study
In this case study, learn how UC Santa Cruz Genomics Institute was able to process samples faster and securely get results to collaborators using AWS.
Learn how to run genomics workflows on AWS
Access our online guide to learn how to use AWS services, such as Amazon S3, AWS Step Functions, and AWS Batch, as well as popular open source workflow orchestrators like Cromwell and Nextflow, to run large scale genomics workflows on AWS. Including source code, documentation and relevant scripts that are now available on GitHub
Select case studies and resources
View all Genomics customer case studies and related resources
Illumina case study
In this case study, learn how Illumina massively scales its DNA sequencing technologies using AWS. See how they support the Illumina BaseSpace Sequence Hub and store 10 PB of genomics data using products like Amazon RedShift
Sequence Bio case study
In this case study, learn how Sequence Bio quickly built a safe and secure platform for data-driven drug discovery on AWS.
Smithsonian case study
In this case study, learn how the Smithsonian Institute data science team scales AWS compute instances up and down as needed, allowing the team to annotate genomes in parallel while also managing costs.
UC Santa Cruz case study
In this case study, learn how the UC Santa Cruz Genomics Institute was able to reduce their genomic computational time from three months down to four days using AWS high-performance compute services, and reduce overall costs.
Blog: Driving momentum in genomics research
AWS Education Blog: AWS collaborates wtih Broad Institute
Learn how Cromwell, an execution engine that simplifies the orchestration of computing tasks needed for genomic analysis, is now enabled on the AWS Cloud.
Blog: Deploy Illumina DRAGEN with new quick start
AWS What's New: New Quick Start to deploy Illumina DRAGEN on the AWS Cloud.
Learn how this Quick Start deploys Dynamic Read Analysis for GENomics Complete Suite (DRAGEN CS), a data analysis platform by Illumina, on the AWS Cloud in about 15 minutes.
Blog: precision medicine at scale
AWS Compute Blog: Accelerating Precision Medicine at Scale
Learn how Edico Genome developed a novel solution to accelerate genomics analysis using FPGA-enabled applications on AWS.
Blog: Human Longevity, Inc.
AWS News Blog: Changing medicine through genomics research.
Learn how Human Longevity, Inc. is using using AWS to store the massive amount of data that is being generated as part of the effort to revolutionalize medicine.