Baylor College of Medicine’s HGSC Analyzes Genomics Data Faster Using Illumina DRAGEN on AWS


The Human Genome Sequencing Center (HGSC) at the Baylor College of Medicine (Baylor) is one of the few genomics sequencing labs that receives funding from the National Institutes of Health’s All of Us Research Program in the United States. In 2019, the HGSC began exploring solutions to provide a highly scalable, secure analysis of its large genomics datasets. The HGSC engaged AWS Advanced Partner Illumina to thoroughly assess the Illumina DRAGEN (Dynamic Read Analysis for GENomics) Bio-IT Platform, a bioinformatics solution that provides highly accurate, comprehensive, and efficient secondary genomic analysis of sequencing data and uses field programmable gate array (FPGA) technology for acceleration. 

Using DRAGEN alongside AWS services, the HGSC can analyze hundreds of genomic samples a day. This solution has helped Baylor increase its participation in research initiatives such as the National Institutes of Health’s All of Us Research Program, a nationwide effort to accelerate progress toward precision medicine by collecting and analyzing the health and genetic data of one million volunteers.

Scientist working on computer

When large amounts of data come off the sequencing instruments, we use FPGAs on AWS to process that data quickly.” 

Eric Venner
Associate Professor and Head of the Clinical Informatics Group, Human Genome Sequencing Center, Baylor College of Medicine

Searching for Solutions to Accelerate Research

Baylor is a health sciences university in Houston, Texas. It ranks twentieth in the United States in National Institutes of Health funding and ranks first in genetics. In 2016, Baylor launched the HGSC Clinical Laboratory to support large-scale sequencing efforts that prepare genomics data for clinical use. The HGSC played a crucial role in the Human Genome Project and the All of Us Research Program, for which Baylor, Johns Hopkins University, and the University of Texas Health Science Center in Houston work as genome centers alongside other groups in the United States. 

For Baylor to be able to participate in the program, the HGSC needed to scale to meet large sequencing workloads and simplify compute and storage management. It also had to meet strict compliance standards, including ISO/IEC 27001, which comprises over 100 security requirements and federal regulations that control data accessibility and classified information. Genetic data is considered Controlled Unclassified Information, which is subject to additional safeguarding controls. “We had to meet standards that are a couple of notches higher than what we’ve had to do for HIPAA,” says Richard Gibbs, director of the HGSC. 

Baylor had previously worked with Illumina, which provides sequencing and software technology, and had used AWS for genomics computing since 2012. All research centers participating in the All of Us Research Program have standardized on DRAGEN for secondary analysis. However, the HGSC realized that migrating to DRAGEN on the cloud from on premises held the most immediate and long-term potential, including in the area of attracting and retaining team members. “Working on the cloud is an advantage because people want to learn technologies that will be popular for the next several decades,” says Eric Venner, associate professor and head of the clinical informatics group at the HGSC. “Now we can attract some very talented junior engineers.”

Using DRAGEN on AWS to Find Accuracy, Scalability, and Security

In spring 2019, the Baylor team began collaborating with Illumina on variant calling technology, which identifies variants in sequencing data. In the fall of 2019, Noora Siddiqui, the team’s engineer, began building the production pipeline using Illumina DRAGEN on AWS, an AWS Quick Start that sets up a configurable AWS environment for DRAGEN. After 3 weeks, the first scale test was performed, and by just over 3 months, the pipeline was up and running. “We completed the pipeline, using AWS technical support,” says Venner. “By using Illumina DRAGEN on AWS, our engineer was able to build the solution into a new production system.”

 Using Illumina’s DRAGEN on AWS, Baylor is able to scale on demand and process data significantly faster than before. The HGSC processes about 5,000 genomes each month. “Everyone’s got a sequencer and a plan, but to crank out 5,000 genomes a month with reliability is enormously difficult,” Gibbs says. “That’s what we’re doing here. It takes a lot of engineering skill and support, but it serves the industry well.” 

Baylor’s DRAGEN solution accelerates analysis of genomic data using Amazon Elastic Compute Cloud (Amazon EC2) F1 Instances; DRAGEN uses FPGAs to deliver custom hardware accelerations. “Our workloads transmit in short, sudden bursts,” says Venner. “When large amounts of data come off the sequencing instruments, we use FPGAs on AWS to process the data quickly.” The center uses Amazon EC2 F1 Instances alongside Amazon EC2 Spot Instances, which let users run fault-tolerant workloads at a 90 percent discount compared to Amazon EC2 On-Demand Instances. “We can save on computing costs using Spot Instances,” says Venner. 

The HGSC stores its live data using Amazon Simple Storage Service (Amazon S3), an object storage service built to retrieve any amount of data from anywhere. Data that doesn’t need to be used right away is automatically passed to Amazon S3 Glacier, which offers secure, durable, and extremely low-cost cloud storage classes for data archiving and long-term backup. “Storage management and the automated data life cycle on AWS are very important,” says Venner. The HGSC’s information technology team uses AWS to help meet security and compliance standards. “It’s simpler to perform security audits in a new cloud solution than it is in a legacy environment with a lot of built-in baggage,” says Venner.

The HGSC can quickly build computing environments because its DRAGEN solution uses Amazon Elastic Container Service (Amazon ECS), a fully managed container orchestration service for deploying, managing, and scaling containerized applications. “It’s valuable to capture the environment that a job is running in,” says Venner. “Traditionally, people would create and manage complex environments to run different types of jobs in the same location. Now, we just create an environment that is specific to the job that’s running, which makes debugging simpler.” 

Applying Illumina and AWS Solutions to Healthcare

Using Illumina’s DRAGEN on AWS, the HGSC hopes to fully integrate its technology into medical practices. HGSC’s analysis of human genomes can predict an individual’s health risks, explain underlying conditions, and alter clinical management, facilitating more comprehensive care for patients. For instance, the HGSC recently collaborated with cardiovascular clinics in the Texas Medical Center to perform the HeartCare study, which focuses on identifying genes related to cardiovascular disease. “We’ve been looking at high-penetrance cardiovascular alleles in key genes and how individuals might benefit from that information under the clinical care model,” says Venner.

In the future, Baylor will continue to explore AWS services and Illumina solutions to further improve the security and speed of its data processing. “We hope to scale accessibility to genetic data, particularly for those underserved in the medical care system or who experience many gaps in care,” says Gibbs. “They will be at the forefront. We want to help them access genetic information that will be useful in their health profiles.”

About Baylor College of Medicine

Baylor College of Medicine in Houston, Texas, is home to the Human Genome Sequencing Center, one of the few genome sequencing centers in the United States that receives funding from the National Institutes of Health.

Benefits of AWS

  • Built its data pipeline in just over 3 months
  • Processes data faster compared to its previous pipeline
  • Processes about 5,000 genomes per month
  • Recruits top engineering talent
  • Simplifies security and compliance
  • Automates storage management and data life cycle processes
  • Scales automatically to meet bursts of volume

AWS Services Used

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

Learn more »

Amazon EC2 F1 Instances

Amazon EC2 F1 instances use FPGAs to enable delivery of custom hardware accelerations. F1 instances are easy to program and come with everything you need to develop, simulate, debug, and compile your hardware acceleration code, including an FPGA Developer AMI and supporting hardware level development on the cloud.

Learn more »

Amazon EC2 Spot Instances

Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices.

Learn more »

Illumina DRAGEN on AWS

The DRAGEN Bio-IT Platform enables ultra-rapid analysis of next-generation sequencing (NGS) data, significantly reduces the time required to analyze genomic data, and improves accuracy.

Learn more »

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.