AWS for Industries

Executive Conversations: Future-proofing population genomics initiatives through federation with Thorben Seeger of Lifebit

Population genomics initiatives amass a multitude of clinical, omics, and phenotypic data from diverse participants. This data is critical to a range of healthcare use cases, including biomarker discovery, drug repurposing, patient stratification, and precision medicine. However, strict data security and regulatory frameworks often make large-scale data migration unfeasible, creating a fragmented data landscape that limits researchers from accessing and leveraging these datasets. In an interview led by Ankit Malhotra, Worldwide Genomics Lead for Amazon Web Services (AWS) Public Sector Healthcare, Lifebit’s Chief Business Development Officer, Thorben Seeger, speaks about the role of federation technology in breaking down data silos for accelerating knowledge sharing and distributed data analysis, its impact on healthcare today, and in the future.

Ankit Malhotra: I’m excited to speak with you today. To get us started, can you tell us about Lifebit’s mission and your role in the organization?

Thorben Seeger: I’m excited to be here as well. Thank you, Ankit. I am the Chief Business Development Officer at Lifebit. I look after our global government and pharmaceutical clients and lead our commercial efforts.

Research and healthcare organizations need data to cure diseases. The good news is that there is a significant amount of data out there to answer some of the most complex questions. More than 200 public and private data custodians worldwide (like national precision medicine projects, healthcare providers, biobanks, research organizations, and academia) are generating real-world clinical and genomic data at population-scale. However, the majority of this data is currently inaccessible and unusable by data consumers (like researchers and pharma companies). Moving these highly sensitive datasets to a central location is impossible because of its size, regulatory restrictions, security concerns, and control issues—creating extensive data silos.

Lifebit empowers biomedical data owners to make their data findable and usable for data consumers—securely, and in a trusted environment—so they can generate novel insights for accelerating therapeutic breakthroughs. Our proprietary platform federates biomedical data for wider access to researchers, allowing them to run analyses on multiple, distributed datasets in-situ, while avoiding risky movement of highly sensitive data.

AM: Let’s talk about large-scale population genomics programs. Why are they important and how do they affect the life sciences industry?

TS: Population genomics programs generate and aggregate genomic data at a massive scale. This, coupled with advanced data analytics tools, gives us answers we didn’t have before faster—to transform precision medicine and provide an equitable solution for diverse healthcare requirements. In the UK alone, these programs, such as Genomics England’s 100,000 Genomes Project, have led to a several-fold increase in patients receiving an accurate diagnosis of rare diseases because this data exists. In the pharmaceutical industry, we know drugs based on large genetic evidence are more than twice as likely to achieve regulatory approval. Recognizing the growing value of this data, pharma organizations are partnering with biobanks to power massive sequencing projects, such as AstraZeneca’s 2 Million Genomes Project, and Boehringer Ingelheim joining a nation-wide research collaboration in Finland to analyze 500K genomes.

From biomarker discovery to more targeted clinical trials, population genomics data has been used across the life sciences value chain. However, accessing and leveraging this data has been fraught with major usability, security, privacy, ethics, and commercial challenges, meaning the data is becoming increasingly siloed more than ever before.

AM: What are some of the challenges researchers and organizations face in accessing and leveraging these large-scale genomic datasets?

TS: Firstly, there are technical constraints. Genomic datasets, like a person’s whole genome sequencing data, are massive in size, making it too big and expensive to be moved around for large cohorts. It’s simply unsustainable in the long term. Additionally, different datasets have contrasting technology standards, systems, and implementation methodologies because of proprietary and legacy tools and software, causing fragmentation.

Secondly, genomic data sharing is even more challenging because of its highly sensitive and private nature. Data movement is tightly guarded, with strict local regulatory and compliance requirements prohibiting large-scale transfer, access, and secondary usage. Compounding this challenge is the growing interoperability and fragmented policies creating a growing lack of trust on the part of the data custodians.  It is well known that the re-identification risk of even anonymized data is high, so moving data ultimately means a loss of control and security risks. Additionally, their fear of losing control over the IP of their data further complicates the issue.

These security, operational, business continuity, and economic factors make it challenging to share, integrate, and aggregate datasets—limiting the opportunity for continued innovation and discovery using that data.

AM: How do federated data systems help address these challenges?

TS: Federated data platforms bridge the gap between flexible data accessibility and security for very large, unwieldy datasets, by allowing analysis of data in-situ, without moving it from the data custodian’s environment. By offering a security-by-design approach, it ensures data custodians are in control of their data at all times, while allowing data consumers to combine and analyze the data—maximizing its value and fueling collaboration.

The underlying architecture of federated data systems should be designed to provide seamless, authorized access to secure data, differential privacy, security, authentication, authorization, and system auditing. This provides two crucial benefits. Since the data never leaves the organization or business, legal, technical, and societal risks inherent with data transfer and/or centralization are greatly reduced. It also facilitates cross-border collaboration, while respecting local governance and legal regulations.

AM: Could you tell us about the work Lifebit is doing in this space?

TS: In 2019, Lifebit unveiled its federated data platform (or ‘trusted research environment’), Lifebit CloudOS, to ensure population-scale clinical and genomic data is both accessible and secure. The platform connects and interlinks clinical and genomic data silos—while all data remains under the control of the data custodian, in its own environment. Apart from securing the data, it brings computational analysis to where the data resides, for faster innovation, and a broader diversity of life-saving discoveries.

Its unique architecture allows for cloud/high-performance computing abstraction in a unified, federated way, ensuring analysis and machine learning runs over distributed data. The system deploys and automates analysis, such as its Data Transformation Suite with automated ETL pipelines. It rapidly transform raw data to research-ready data using industry gold standards like the Observational Medical Outcomes Partnership (OMOP) common data model, producing faster insights. By providing intuitive, no-code tools for data exploration and visualization it also remedies the shortages of highly trained bioinformaticians required to generate insights from sequencing data.

The platform is designed for use by organizations in the pharmaceutical, drug discovery, healthcare, and population genomics space. Some of the largest life sciences organizations, like Boehringer Ingelheim, have deployed Lifebit’s platform to accelerate discoveries. In fact, as part of the DARE UK (Data and Analytics Research Environments UK) program and in consortium with the University of Cambridge, NIHR Cambridge Biomedical Research Centre, Genomics England, Eastern AHSN, and Cambridge University Health Partners, Lifebit has achieved federated analysis across the trusted research environments of two institutions, which is a first for the industry.

AM: How has cloud computing in general, and AWS in particular, helped in bridging the gap of usability and security needed for federated data ecosystems like Lifebit CloudOS?

TS: Cloud technology and AWS is a critical enabler for us. First, AWS offers near unlimited high-performance compute and storage resources that powers our platform, and the scalable infrastructure which researchers need to go deep into the data.

Secondly, the global presence of AWS data centers makes it ideal for international collaboration and to launch novel national programs.

Thirdly, AWS addresses important data security aspects such as encryption at every point, firewalls, monitoring software, and other features so that research can be performed safely. Thanks to AWS, we can offer tightly secured access using granular controls that only approve access to the right type of data each researcher is supposed to see.

And finally, AWS helps us apply advanced analytics to the data, powered by machine learning and artificial intelligence.

A good example of that is what we did with the NIHR Cambridge Biomedical Research Centre/University of Cambridge. We have been able to rapidly set up AWS accounts helping researchers with the next-generation computational infrastructure. Their new cloud-based trusted research environment will enable secure analysis of their clinical and genomic data in a system that is more scalable and flexible, and importantly, that allows a more collaborative research environment.

AM: How are you leveraging other aligned technologies, like NVIDIA Clara Parabricks on AWS, to innovate?

TS: Using AWS as our technology partner, NVIDIA Clara Parabricks helps us accelerate our pipelines. We use Clara Parabricks in combination with GPUs specifically for large-scale production of raw files of whole genomes.

The graphically accelerated compute units that NVIDIA produces and AWS hosts are powerful for facilitating a very fast process that is extremely affordable, compared to other options on the market. That’s important to institutions which need to be conscientious of budget, like academia and the public sector.

AM: What does the future look like for federated data ecosystems for large-scale population genomics analyses?

TS: By 2025, more than 60 million patients are expected to have their genomes sequenced―which can transform precision medicine and drive innovation in therapeutics at an unprecedented level. Federated analysis of complex data allows a seamless integration of distributed datasets, and is set to be a disruptor. However, a lot depends on the deployment of data standardizations to harness its full potential. Lifebit and other platforms use key standards adopted widely across the healthcare industry, such as the OMOP common data model (CDM). OMOP CDM captures data uniformly across different health institutions to make clinical data analysis more efficient and reproducible.

Thus, federated analysis and learning is increasingly allowing researchers to apply their algorithms and analytics to distributed data―avoiding issues with compliance since no data needs to be moved. It’s a big step forward, not just for genomics, but for any sort of clinical healthcare data where accessing multiple data sources is needed.

AM: At AWS, we’re excited about that future as well. We work closely with our partners in healthcare and life sciences to align our offerings to their needs, so they can continue this groundbreaking work. To learn more, visit us at AWS for Health.

Thorben SeegerAs Chief Business Development Officer, Thorben is the mastermind behind Lifebit’s international commercial operations. Bringing over 14 years of experience in enterprise technology and financial solutions, Thorben has successfully developed commercial strategies and led high-performance business development teams for global powerhouses, including Goldman Sachs, Morgan Stanley, and Duel Tech. With Lifebit, Thorben plans multi-million GBP cloud-based precision medicine programs for the world’s leading population genomics and national precision medicine programs, including Genomics England and the Danish National Genome Center. Additionally, Thorben guides partnerships with pharma giants, such as Boehringer Ingelheim, helping them to leverage international data cohorts and pursue therapeutic breakthroughs.

Ankit Malhotra

Ankit Malhotra

Ankit Malhotra is the worldwide genomics lead on the Amazon Web Services (AWS) Public Sector healthcare team. At AWS, Ankit helps healthcare and biomedical research customers in the public sector integrate genomics into their workloads, helping them accelerate and innovate using the AWS Cloud. With cross training in computer science, molecular biology, and genetics, he has over 10 years of experience as a NIH-funded computational genomic scientist.