astrazeneca_Customer-Reference_Logo@2x

AstraZeneca’s Genomics Data Processing Solution Runs 51 Billion Tests in 1 Day on AWS

2021

Around 20 years after the publication of the first human genome, genomics is transitioning from a research-heavy practice to a driver for personalized medicine. Engaged in this transition, global biopharmaceutical company AstraZeneca is accelerating the use of genomics in precision medicine and furthering the translation of genomics to transform drug discovery.

AstraZeneca uses petabytes of genomic sequencing data to inform drug research and development. To rapidly process data at scale, AstraZeneca used Amazon Web Services (AWS) to build a fast, efficient solution for extracting impactful genomics insights.

AstraZeneca Genomics on AWS: A Journey from Petabytes to New Medicines (1:19)
kr_quotemark

We’ve provided genetics input into more than 40 of AstraZeneca’s drug discovery projects in 2020 using these capabilities."

Slavé Petrovski
VP, Head of Genome Analytics and Informatics, Centre for Genomics Research, R&D, AstraZeneca

Building a Solution That Frees Scientists to Innovate

In addition to quickly gleaning insights from genomic data, AstraZeneca wanted to reallocate resources to scientific exploration and avoid spending bioinformatics time performing relatively low-value data management activities. Because the company collects petabytes of data through multiple sources in large bursts, it needed powerful, scalable compute capacity.

Having built on AWS before, AstraZeneca decided to expand its use of AWS tools to develop a cloud-based bioinformatics solution for rapid genomic processing and analytics.

Automating on AWS to Produce Fast Insights

AstraZeneca’s high throughput solution performs many steps of genomic data processing and analysis. Genomic secondary analysis looks at raw sequencing reads to reconstruct a genome and identify genomic variants that can be analyzed further in later stages. To automate the data orchestration of those steps, the architecture uses AWS Lambda, a serverless compute service that enables users to run code without provisioning or managing servers. To build out a task execution layer, the architecture uses AWS Batch, which dynamically provisions the optimal quantity and type of compute resources—such as CPU or memory-optimized instances—based on the volume and specific resource requirements of the batch jobs submitted. Along the way, the architecture sorts data into buckets using Amazon Simple Storage Service (Amazon S3), an object storage service.

On AWS, AstraZeneca sped up and improved productivity with scale, compute power, and access to rich technology services. Because the company can run analysis at scale when needed, data is available for analysis sooner. “We can now run over 51 billion statistical tests in under 24 hours, studying the effects of individual mutations or individual genes, each with a broad range of phenotypes,” says Slavé Petrovski, VP, head of Genome Analytics and Informatics at AstraZeneca’s Centre for Genomics Research. The company’s efforts are paying off. “We’ve provided genetics input into more than 40 of AstraZeneca’s drug discovery projects in 2020 using these capabilities,” says Petrovski.

Boosting Scientific Innovation

AstraZeneca produced a rapid, efficient genomics bioinformatics pipeline that gives its scientists the time and resources to pursue innovation. As a result, the company’s Centre for Genomics Research is moving forward in its goal to analyze two million genomes by 2026.


About AstraZeneca

AstraZeneca is a global biopharmaceutical company developing innovative medicines in a range of therapy areas. The Centre for Genomics Research is an AstraZeneca initiative seeking to analyze two million genomes by 2026.

Benefits of AWS

  • Supports running over 51 billion statistical tests in less than 24 hours
  • Facilitated delivery of genomic insights to more than 40 drug discovery projects in 2020
  • Scales up or down to accommodate the erratic demands for genomic sequencing
  • Frees up resources for scientific exploration

AWS Services Used

AWS Lambda

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes.

Learn more »

AWS Batch

AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS.

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »


Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.