Regeneron Brings Large-Scale Genomics to Drug Discovery Using AWS
In 2020 alone, the 12 largest biopharma companies spent over $96 billion on pharmaceutical product R&D, continuing a steady increase in drug development costs. With a vast majority of experimental medicines failing to make it to market, only a handful of new drugs each year succeed in gaining approval and reaching commercialization. To increase success rates, biopharma organizations are bringing genomics into the pharmaceutical R&D process, leveraging genetic data to better understand drug reactions. While still in the earlier stages of adoption, this practice—known as pharmacogenomics—has doubled the rate of success. Today, only five percent of genes in the human genome are targets for approved drugs, even though far more are implicated in diseases.
Regeneron, an international biotech and pharmaceutical company, is helping the industry shift toward accelerating and improving drug discovery through the integration of genomic insights using Amazon Web Services (AWS). The Regeneron Genetics Center (RGC) is a research initiative focused on sequencing exomes—gene-encoding regions of the human genome—and enabling large-scale analysis of genomic and health data to yield actionable scientific results that can be applied in Regeneron’s own drug development programs and by the broader research community.
“We couldn’t perform insightful large-scale agnostic analyses without the unmatched scalability of AWS cloud infrastructure.”
Jeffrey Reid, PhD
Chief Data Officer, Regeneron Genetics Center
Large-Scale Genomic Discoveries on AWS-Powered Platforms
Analyzing thousands or millions of genomes at a time allows researchers to uncover connections between diseases and specific genetic variations that would not be obvious in a small population. The larger and more diverse the dataset, the greater the certainty that scientific findings will apply to a wide range of patients. To build these comprehensive datasets on a global level, Regeneron has worked closely with organizations like the UK Biobank and AWS Partner DNAnexus to get genetic samples and health information from millions of volunteers.
Using Amazon Elastic Compute Cloud (Amazon EC2), a web service that provides secure, resizable compute capacity in the cloud, Regeneron and its collaborators have been able to accelerate exome sequencing and processing of these genetic samples. The resulting petabyte of de-identified health and genomic data are stored securely using Amazon Simple Storage Service (Amazon S3), an object storage service. Storing this information on Amazon S3 offers Regeneron 90 percent cost savings compared to on-premises servers. By 2021, the RGC sequenced over one million genomes at 10 times the rate that would have been possible with local storage and compute.
Once genomic data is obtained, AWS offers the analytical power Regeneron needs to make scientific discoveries based on the data. In particular, the RGC is a leader in performing a deep analysis process—“all-by-all analyses”—which involves searching massive genomic datasets to identify every association between any phenotype and genotype that exists in a database, to ultimately inform drug discovery and development efforts.
“The association results tables for all-by-all analyses have more than one trillion cells,” says Jeffrey Reid, PhD, chief data officer for the RGC. “We couldn’t perform these insightful large-scale agnostic analyses without the unmatched scalability of AWS cloud infrastructure.”
The RGC has initiated research collaborations with a wide array of academic and pharmaceutical industry groups, making data available for custom project-based analyses. Researchers can access the data securely on Amazon S3 and manage the files they need for their projects using Amazon Elastic File System (Amazon EFS), a simple, serverless, set-and-forget elastic file system that lets individuals share file data without provisioning or managing storage.
“Using AWS we’re able to deliver the best of both worlds—allowing research to proceed as it would in an academic setting while providing researchers with more control over the infrastructure they use, such as the compute instances they spin up,” says Reid. “To maximize the impact of the data, you have to maximize access to the data. That’s what we’ve done using AWS. There’s no way we could have delivered this scale of data on this timeline to this many partners all around the world without AWS solutions.”
Enabling Global Collaboration to Improve Human Health Using AWS
International industry collaboration is key to accelerating new genomic discoveries. “We needed a way of working across multiple institutions and localities, and this is one reason we focused on cloud computing,” says Reid. “Using AWS helped us provide a secure data science platform on which we can generate and share the data with collaborators across the world, apply large-scale analytics, and then disseminate those results.”
Maximizing the impact of genomic data means maximizing representation in samples and in the different research projects being performed. The RGC has over 100 collaborators across the world working together to gather diverse genomic datasets that will make analyses more powerful and results more broadly applicable.
“We're actively trying to improve the diversity of genetic ancestry in our databases because we know there's a lot of insight that is left undiscovered due to a historic focus on European ancestry,” says Reid. “By building on AWS, we can democratize global access to make sure that in the future, precision medicine and polygenic risk scores are used to really improve care equitably for people of all ancestries.”
Regeneron is a biotechnology and pharmaceutical company dedicated to accelerating and improving the traditional drug development process.
Benefits of AWS
- Sequenced over 1 million exomes at a 10x accelerated pace
- Stored genomic data on AWS with 90% cost savings over on-premises storage
- Enabled agnostic all-by-all data analyses to uncover genomic insights
- Democratized access to improve diversity in genomic datasets that inform precision medicine
- Discovered novel drugs based on genetic targets
AWS Services Used
Amazon Elastic File System
Amazon Elastic File System is a simple, serverless, set-and-forget, elastic file system that enables you to create and configure shared file systems quickly and simply for AWS compute services.
Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 475 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS.
Contact our experts and start your own AWS journey today.