Customer Stories / Life Sciences
2023

BioNTech Accelerates Data Processing for Proteomics Workflows by 500x Using AWS
Learn how BioNTech accelerated processing of mass spectrometry data using parallelized workflows to decrease processing time by 500 times.
50%–75%
reduction in file search times
Significantly reduced
the cost of compute instances
Ran hundreds
of data searches simultaneously
Improved
scientists’ productivity while maintaining strong data security
Increased
data accessibility and reusability in the organization
Overview
Headquartered in Germany, BioNTech is a global company that specializes in developing immunotherapies and vaccines, such as the Pfizer-BioNTech COVID-19 vaccine, for cancer and infectious diseases. Mass spectrometry (MS) is a powerful technology for direct identification of peptides bound to human leukocyte antigen (HLA) molecules from patient-derived tumor tissue or cell lines. These HLA immunopeptidomes can be interrogated as a source for antigen discovery for cell-based therapies and used to train machine learning models to inform vaccine development.
BioNTech aimed to further improve its workflows for storing, organizing, and processing terabytes of MS data to make them more efficient and scalable. It decided to migrate its on-premises MS software and data storage to Amazon Web Services (AWS), allowing for scalable and secure state-of-the-art handling. Now, BioNTech has accelerated its time to insights and made it simpler for researchers to share and collaborate on MS data using AWS Storage Gateway, a service that provides on-premises applications with access to virtually unlimited cloud storage.

Opportunity | Using AWS Storage Gateway to Further Streamline and Accelerate the Processing of BioNTech’s Mass Spectrometry Data
Mass spectrometry is a powerful methodology for immunopeptidomics because it can detect and identify thousands of unique HLA-bound peptides in a single analysis of clinically relevant tissues and cell lines. The raw data set produced in a single acquisition is a large collection of spectra that can be searched against a reference proteome database to yield peptide and protein identifications. In proteomics and immunopeptidomics workflows, software packages such as Spectrum Mill MS Proteomics Software are vital components in processing and analyzing the large volumes of MS data that is routinely collected.
Until 2022, the company ran this software on local servers. Scientists had to move data manually from instrument computers to local workstations running Spectrum Mill, and these devices would fill up quickly, requiring additional steps to archive the data. “Our total data was easily 10–15 terabytes, and moving it to the on-premises device was time consuming and challenging,” says Akhil Chaudhary, data engineer at BioNTech. “As our research activities were growing, our MS data collection was also significantly increasing,” says Michael McCarthy, solutions architect at BioNTech. “The local hardware could no longer support our scale.”
To accelerate data processing and access to the interpreted results, BioNTech’s computational biology team needed a way to process hundreds of requests simultaneously with different search parameters and protein sequence databases as part of their effort to maximize the peptide and protein information for novel discoveries. The department approached the BioNData team—a central data and analytics group within the company—to build tools to scale the data processing capabilities horizontally. The team chose AWS to build a hybrid lab data model and create horizontally scaling APIs. “In the US, we have a long history of using AWS successfully in products,” says McCarthy. “It was the natural choice.”

On AWS, our scientists are generating and sharing exponentially more data with the aim of finding effective, targeted, and personalized therapies for patients. It’s really the imagination that limits you, and I haven’t yet found something that I couldn’t build in AWS."
Michael McCarthy
Solutions Architect, BioNTech
Solution | Massively Accelerating Data Processing Using Parallelized Workflows
In the first phase, BioNTech’s focus was to be able to move data seamlessly from the MS instrument computers to the cloud and host Spectrum Mill on AWS. The second phase involved building a system for running the search requests simultaneously.
To move the MS raw data to the cloud, BioNTech installed the AWS Storage Gateway agent on every instrument computer. Following acquisition, MS raw data is quickly and automatically moved to Amazon Simple Storage Service (Amazon S3), an object storage service built to retrieve any amount of data from anywhere. “The speed is extremely fast. A file of 5 GB takes only 5–10 seconds to appear on Amazon S3,” says Chaudhary. With multiple instruments generating large data sets, this MS data pipeline enables more efficient migration of the data to a centralized localization for easy access for processing and archiving.
BioNTech’s computational biology team quickly adopted the new workflow. “Everyone’s using the cloud-based system, and the researchers find it much simpler,” says McCarthy. “We automate data management in AWS, letting scientists focus on the science.”
Next, the team installed Spectrum Mill on Amazon Elastic Compute Cloud (Amazon EC2), which provides secure and resizable compute capacity for virtually any workload. “By running Spectrum Mill on the cloud, we cut individual search times by 50–75 percent,” says Chaudhary. In addition, BioNTech runs Amazon EC2 Spot Instances, which can run fault-tolerant workloads for up to 90 percent off compared to On-Demand prices. Because the company only pays for the time it’s using the instances, it has reduced compute costs significantly.
To scale the number of workflows it can run at a time, the team uses Amazon Machine Images, which provide the information required to launch an instance, and Amazon EC2 Auto Scaling, which can add or remove compute capacity to meet changing demand. “Now, we run our searches 50–75 percent faster, and with Amazon EC2 Auto Scaling, we can run hundreds of instances in parallel, massively accelerating data processing up to 500 times,” says McCarthy.
BioNTech manages Spectrum Mill workflows using Amazon Simple Queue Service (Amazon SQS), a fully managed message queuing service. And the company uses Amazon API Gateway, a service for creating, maintaining, and securing APIs at any scale, to execute Spectrum Mill searches. Then, it pulls the data from a data warehouse on Amazon Redshift, which offers excellent price performance for cloud data warehousing. These datasets are used by the scientific teams to identify therapeutic targets and build artificial intelligence algorithms for vaccine design.
The team connects processed results with data consumers across the company with data.all, an open-source tool for sharing datasets across AWS accounts. As a result, researchers no longer need to spend time on data management. “On AWS, our scientists are generating and sharing exponentially more data with the aim of finding effective, targeted, and personalized therapies for patients,” says McCarthy.
Outcome | Expanding Speed and Scalability to More Workflows
BioNTech has quickly seen the benefits of its new workflows on AWS. “We could redo all the work from the past 7 years in 60 hours for a fraction of the price,” says Chaudhary. In its next phase, the team is looking to improve and automate mass spectrometry analysis tools to lower the false discovery rate of peptides. It’s also creating a graphical wrapper around its API so that all teams at BioNTech can benefit from the API in their day-to-day workflows.
“The Spectrum Mill project is just the first of many we’re planning,” says McCarthy. “This project inspired confidence that we can solve similar problems for our global teams. It’s really the imagination that limits you, and I haven’t yet found something that I couldn’t build in AWS.”
About BioNTech
BioNTech is a global immunotherapy research and development company that creates and manufactures active immunotherapies and performs clinical trials of treatments and vaccines for cancer and other diseases.
AWS Services Used
AWS Storage Gateway
AWS Storage Gateway is a set of hybrid cloud storage services that provide on-premises access to virtually unlimited cloud storage.
Amazon EC2
Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 750 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.
Learn more »
Amazon S3
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.
Learn more »
Amazon SQS
Amazon Simple Queue Service (Amazon SQS) lets you send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available.
Learn more »
More Life Sciences Customer Stories
Total results: 217
no items found
-
United States
A-Alpha Bio Boosts the Performance of Protein-Protein Interaction Prediction Using NVIDIA BioNeMo on AWS
Learn how biotechnology company A-Alpha Bio scaled protein-binding predictions by 10 times while increasing model speed by 12 times using BioNeMo on AWS. -
Europe, Middle East, & Africa
CBR Genomics Reduces Time to Diagnosis with Genomics-as-a-Service Solution on AWS
Medical technology company CBR Genomics seeks to make patients’ genetic data available as a lifelong diagnostic tool while complying with applicable regulations and storing the data securely. -
Other
Delivering Innovative Health with Generative AI Solutions at Merck
This case study features a video highlighting how Merck, a global pharmaceutical company, uses AWS to drive innovation and transform its business. -
Germany
HSE Prioritizes Data Protection as It Transforms Customer Shopping Experiences Using Generative AI on AWS
This case study examines how the Health Service Executive (HSE) in Ireland leverages AWS to modernize its IT infrastructure, improve patient care, and enhance operational efficiency.
Get Started
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.