AWS for Industries

Simplifying Multi-modal & Multi-omics Analysis with AWS for Health

New AWS for Health Guidance: Multi-modal and Multi-omics

The new era of personalized health relies on data to guide more customized patient treatments, therapeutics, and diagnoses. Genomics sits at the core of personalized health, and by taking into account the individual variability among people and diseases, clinicians can create more personalized care journeys and targeted treatments. Across clinical and research disciplines, combining and analyzing different modalities including multiple molecular data types and imaging data is powering a more holistic view of patients and more robust insights into an area of study.

A great example of this is the work being done by Philips to incorporate multi-modal data into its Philips Healthsuite Platform, which was recently presented at the 2022 AWS Industry Innovators: Healthcare & Life Sciences event. To help determine the best treatment options on an individualized-basis, Philips created a platform on AWS that integrated different modalities of medical data involved in cancer treatments, including genomic, imaging, digital pathology, and clinical data. As a result, leading healthcare organizations like MD Anderson Cancer Center can now run more data-driven, personalized oncology treatments and clinical trial matching.

While the promise of multi-modal and multi-omics is becoming evident, the integration and analysis of varying forms of structured and unstructured data poses a unique set of challenges, including:

  • Addressing influx of diverse data types and formats
  • Extracting insights from unstructured data, such as voice and imaging
  • Ingesting, normalizing, structuring, and formatting differing data types for consumption
  • Creating cohorts and defining relative data subsets

To reduce barriers for handling and analyzing multi-modal and multi-omics data, AWS for Health has released the new Guidance for Multi-omics and Multi-modal data Integration and Analysis on AWS.

It is a prescriptive deep dive on how to prepare genomic, clinical, mutation, expression, and imaging data for large-scale analysis, and perform interactive queries using The Cancer Genome Atlas (TGCA) and The Cancer Imaging Archive (TCIA) as an example dataset. The ETL code provided in this guidance can be customized to ingest and transform additional datasets.

This comprehensive guidance provides step-by-step instructions and recommendations for:

  • optimizing data formats and structures,
  • querying and accessing data from different sources with ease, and
  • integrating and analyzing genomics data together with other omics (for example, epigenomics, proteomics, transcriptomics, metabomics)
  • as well as other modalities of data (for example, X-rays, health records, recorded audio, wearables data).

Following the six pillars of the AWS Well-Architected Framework, the guidance is designed to help healthcare and life sciences organizations build a secure, resilient, and scalable environment in AWS. It directs how to prepare genomic, clinical, mutation, expression and imaging data for large-scale analysis and perform interactive queries against a data lake.

The modern data architecture (Image 1) in this guidance demonstrates how to ingest common multi-omics data sets into a centralized data lake and work with that data using Amazon Athena and low-code Jupyter Notebooks. There are example ingestion pipelines for clinical, mutation, gene expression, and copy number data (TCGA), imaging metadata (TCIA), genomic variant calls data (1000 Genomes), annotation data (ClinVar), and an individual Variant Call File (VCF) data.

Image 1: AWS for Health Guidance: The Modern Data Architecture

Image 1: AWS for Health Guidance: The Modern Data Architecture

This guidance demonstrates how to:

  • Build, package, and deploy libraries
  • Provision serverless data ingestion pipelines for multi-modal data preparation and cataloging
  • Visualize and explore clinical data through an interactive interface
  • Run interactive analytics queries against a multi-modal data lake

This guidance was built in collaboration with AWS for Health featured consulting partner BioTeam. BioTeam is a scientific IT consulting company expert in applying strategies, advanced technologies, and IT services to solve the most challenging research, technical, and operational problems in the life sciences. They can help implement and customize this guidance to ingest customized datasets.

The full guidance is now available here: Guidance for Multi-Omics and Multi-Modal Data Integration and Analysis on AWS

Additional AWS Resources for Multi-modal and Multi-omics:

Related resources:
AWS cloud healthcare services

Stephanie Black

Stephanie Black

Stephanie Black is the Worldwide Head of Life Sciences and Genomics Marketing at Amazon Web Services (AWS). Specialized at the intersection of life sciences and cloud technology, Stephanie has spent the last decade helping leading life sciences organizations bring new products to market and expand their market reach. She holds a graduate certificate in genetics from Stanford University, in addition to dual undergraduate degrees in business and strategic marketing.

Pantea Khodami

Pantea Khodami

Pantea Khodami is the Worldwide Head of Healthcare and Life Sciences Solutions Portfolio for AWS. She has over a decade of global experience in product management, business development, and sales/commercial strategy in healthcare and life sciences with a focus on genomics. She holds a bachelors and a masters degree from Massachusetts Institute of Technology in Materials Science and Engineering with a minor in Management from MIT Sloan School of Management.