Quantitative Biology Center Drives New Genomics Research Faster Using AWS

QuadX

Analyzing Gene Expression Differences

Every day, researchers at the Quantitative Biology Center (QBiC) use high-performance computing (HPC) platforms to analyze genomics data and determine, for example, gene expression differences between diseased and normal tissue. QBiC is located at the University of Tübingen in Germany, and it supports genomics research within the university and at other research organizations across the globe.

QBiC’s HPC workloads are mostly hosted in an on-premises data center. However, as the volume of research data rapidly continues to grow, QBiC foresees difficulties in scaling quickly and cost-effectively. “As our data volume grew larger, we realized we needed much more computational capacity than our on-premises infrastructure could provide,” says Alex Peltzer, senior bioinformatics research scientist at QBiC. “The researchers using our platform also needed better performance, so they could analyze more data and complete their research faster.” QBiC’s highest value is data processing according to the FAIR data principles: findable, accessible, interoperable, and reproducible. “Meeting the FAIR processing requirements involves the need to scale efficiently, which we couldn’t do easily,” Peltzer says.

“The evaluated setup can potentially cut our genomics research time by 50 percent because of the automation and orchestration we get with AWS Batch.”

Alex Peltzer, Senior Bioinformatics Research Scientist, Quantitative Biology Center, University of Tübingen

  • About the Quantitative Biology Center
  • The Quantitative Biology Center (QBiC) is a research unit that is part of the University of Tübingen in Germany. QBiC hosts an HPC research platform for internal and external researchers to analyze and process genomics data.

  • Benefits
    • Can process up to 100,000 genetic samples in a single research project
    • Reduces genomics research time by 50%
    • Speeds research of gene expression differences
    • Drives down the cost of analysis
  • AWS Services Used

Leveraging an HPC Research Platform and AWS-Based Cloud Computing

QBiC’s need for scalability and performance led it to the Amazon Web Services (AWS) Cloud. “We knew the cloud would meet our needs, and AWS offers more advanced technology than the other providers we looked at,” says Peltzer. AWS also integrates with the Nextflow and nf-core frameworks, which support scalable scientific workflows using software containers. “AWS works very well with Nextflow, and no other cloud provider could do that,” says Peltzer. “Without that integration, we would have had to spend a lot of time and money rewriting the scheduling capabilities ourselves.”

QBiC chose to leverage Amazon Elastic Compute Cloud (Amazon EC2) instances, powered by Intel Xeon Scalable processors, for its existing on-premises infrastructure. QBiC now runs Nextflow on AWS for workflow management and uses AWS Batch for the automation and orchestration of Nextflow batch jobs.

The organization is also using Amazon EC2 Spot Instances to drive down the costs of analysis. EC2 Spot Instances are spare compute capacity on AWS available at discounts of up to 90 percent compared to the price for On-Demand Instances. “We are driving down the cost of analysis by using Amazon EC2 Spot Instances,” says Peltzer. “That represents cost savings that we can put into the research.”

Processing 100,000 Genetic Samples

Running its analysis workloads in Nextflow on AWS, QBiC can take advantage of parallel processing and can scale on demand. “Using AWS, we can scale our HPC platform up or down quickly, whether processing 30 genetic samples or 100,000 samples in one research project,” says Peltzer. In addition, QBiC and its research customers are experiencing increased reliability for genomics sequencing jobs. “We no longer have to worry about system outages and slowed performance because too many people are queuing for processing jobs,” says Peltzer. Although the setup is currently running in technical research projects, the current results hint toward potential production usage.

Reducing Genomics Research Time by 50%

In its benchmarking projects, QBiC has reduced its research and processing time for all jobs by using Amazon EC2 and AWS Batch, for both the university and private research organizations that share resources with the center. “The evaluated setup can potentially cut our genomics research time by 50 percent because of the automation and orchestration we get with AWS Batch,” says Peltzer. “We can do things much faster compared to our on-premises environment.”

As a result, QBiC and other research institutions across Germany see a huge potential in using AWS Cloud applications. The distributed infrastructure can help QBiC more quickly complete research analyzing gene expression to find mutations that may be involved in cancer. “Performing genomics sequencing on AWS, we are looking at plant and animal data to see how experimental treatments change how certain genes are expressed,” Peltzer says.

QBiC will continue to evaluate the use of AWS services as its research requirements grow. “We expect to be part of one of the largest public genomic sequencing hubs in Germany in the next few years,” Peltzer says. “AWS will help us to make that possible.”

Learn More

To learn more, visit aws.amazon.com/hpc.