Lifebit Powers Collaborative Research Environment for Genomics England on AWS


Less than 1 year after the COVID-19 pandemic outbreak, professionals went from diagnosing the first case to administering a vaccine. Genomic advancement, among other breakthroughs, is widely credited with the rapid understanding of the disease and the expedited vaccine deployment.

Since the first whole human genome was sequenced in 2003, genomics has become commonplace in the healthcare and life sciences industries, resulting in an exponential growth in genomic data. Each human genome contains enough data to fill 200 phone books. Within this data lie life-altering discoveries, including knowledge of the causes of diseases, which can lead to treatments. But disease causes—which are often “typos,” or mutations in genetic sequences—can be challenging to find; and genomic data is highly regulated and stored in siloed data lakes, further impeding research.

Facing this challenge is Lifebit Biotech Ltd. (Lifebit), an Amazon Web Services (AWS) Select Consulting Partner. Working with biobanks, research institutions, and pharmaceutical companies, Lifebit provides solutions that analyze clinicogenomic datasets to accelerate drug discovery, diagnostics, disease surveillance, drug-response predictions, and wellness models.

The Lifebit team

We use the whole roster of AWS computations—from general-purpose computation to graphically accelerated units—to run large production pipelines faster and more efficiently.”

Thorben Seeger
Vice President of Commercial, Lifebit Biotech Ltd.

Unlocking Access to Siloed Genomic Data

Lifebit CloudOS, a fully federated cloud operating system, uses AWS to unlock clinicogenomic data for drug and biomarker discovery. This facilitates greater research collaboration, enabling a rapid increase in drug development and disease prevention. At the onset of the COVID-19 pandemic, Genomics England (GEL) turned to Lifebit CloudOS. A pioneer of population genomics, GEL oversees the 100,000 Genomes Project, a cohort of cancer and rare-disease whole genomes.

Earlier genomics research relied on fewer, smaller datasets, and the industry could rely on centralized technologies to analyze this data. As a result, data protection regulation was more lenient, and collaboration was more manageable. But because genomic data has since become the largest source of data in history, that system cannot support today’s research. “Data centralization is no longer feasible or affordable,” says Thorben Seeger, vice president of commercial for Lifebit. “The data is too big to move efficiently, and many regulations forbid data to leave an organization, state, or nation.” As a result, 80–90 percent of these datasets are unavailable to research. “GEL is widely known as the ‘Fort Knox’ of genomics,” Seeger says. “But when you lock data up, it’s nearly impossible to access or combine with other data.”

Lifebit reengineered the traditional model for securing data—bringing its compute engine and analytics to the data itself. This new model is powered by Amazon Elastic Compute Cloud (Amazon EC2), a web service that provides secure, resizable compute capacity in the cloud. “We are deploying our cutting-edge research in our clients’ own environments on AWS,” says Seeger. “Each user receives a clean-room environment to access and analyze data separately. The fully managed service provides maximum research utility without sacrificing security or control.”

Lifebit uses the highly scalable cloud capabilities of AWS to gain the compute capacity it needs to accommodate the exponential relationship between the size of a dataset and the outcomes. The company works on projects with more than 100 PB of stored data, requiring billions of virtual CPU hours. “We use the whole roster of AWS computations to run production pipelines faster and more efficiently,” says Seeger. “That was critical because GEL needed rapid data processing for faster insights.”

Standing Up a Secure, Robust Collaboration Service

During the COVID-19 pandemic, GEL launched an initiative with the UK government to deliver a cohort to eight leading pharmaceutical companies—as well as research organizations—to fuel vaccine, treatment, and early-detection research. The cohort included sequenced genomes from 20,000 COVID-19 patients with severe cases and 15,000 patients with mild cases, plus data from the 100,000 Genomes Project. Yet GEL needed a federated data analytics system to make that cohort available to multiple parties. “We were setting up a new research environment, and we needed a company that could go live within 7–8 weeks,” says Parker Moss, chief commercial officer at GEL.

Lifebit built upon GEL’s existing AWS architecture to deliver the fully live system in under 3 months. Today, pharmaceutical companies and researchers can access the cohort and connect their own private datasets. “The user’s external data doesn’t move into the GEL environment,” says Moss. “However, through federated links, you can research as if that data is in one place. It’s a very powerful value proposition.” This system saves time and offers extra protection. “Data stays in our clients’ environments, and all of the AWS safety features keep it secure,” says Seeger.

On the system, researchers use automated tools to securely query, analyze, and collaborate on large datasets in seconds. “We are bridging the dichotomy between security and usability,” says Seeger. “This fosters global collaboration between public institutions like GEL, other leading cohorts, research organizations, and private institutions.”

Scaling at the Speed of Genomics on AWS

Lifebit CloudOS makes genomic research more accessible. “The cloud, combined with our data environment, is the great democratizer,” explains Seeger. “Millions of researchers can access and perform big data analysis on demand—something only a few trained specialists with high-performance computing could do previously.”

Critically, Lifebit customers and their users gain virtually infinite storage using Amazon Simple Storage Service (Amazon S3), which offers industry-leading scalability, data availability, security, and performance. One whole human genome equates to 120–300 GB of data, and Lifebit is performing simulations on running databases on more than 10 million patients with thousands of clinical and phenotypical variables. “Connecting global datasets is driving ethnic genomic diversity,” says Seeger. “This helps us understand diseases in general but also enables us to cater to previously underserved populations.”

On AWS, Lifebit delivered a system that led to one of the most significant cloud computing deals in the history of life sciences. “The prevalence of AWS in the healthcare and life sciences markets is extremely helpful,” Seeger says. “We have seen incredible flexibility from AWS, which, in the London region, is helping us set up the security GEL is famous for. The scale and global presence of AWS is of huge strategic importance for us as we pursue large government initiatives.”

Accelerating Global Collaboration in Drug Research and Disease Prevention

By using AWS, Lifebit enabled GEL to rapidly deliver a research environment for COVID-19 data and analytics. Now, Lifebit is speaking to nations about combining datasets to facilitate research outcomes and speed up drug development for cancer and rare diseases. “Our federated analysis system doesn’t only exist for the singular purpose of serving one country or one disease cohort,” says Seeger. “It works with other cohorts worldwide, making this scientific field the most collaborative it has ever been.”

About Lifebit Biotech Ltd.

Lifebit Biotech is a global leader in population genomics software and AI-powered drug discovery. Operating in North America, Europe, the Middle East, Africa, and the Asia-Pacific region, it powers population genomics initiatives, biobanks, research, and pharma companies.

Benefits of AWS

  • Launched a federated data analytics system in under 3 months
  • Processes more than 100 PB of project data
  • Enables collaborative research on disparate datasets worldwide
  • Maintains compliance with data privacy regulations
  • Performs analysis in clients’ own environments
  • Efficiently orchestrates billions of CPU hours
  • Democratizes access to bioinformatic analysis
  • Enables sustainable self-funding business models

AWS Services Used

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.