AWS Solutions Library

Guidance for Multi-Modal Data Analysis with AWS Health and ML Services

Overview

This Guidance demonstrates how to set up an end-to-end framework to analyze multimodal healthcare and life sciences (HCLS) data. It analyzes this data using purpose-built health care and life sciences services (such as AWS HealthOmics, AWS HealthLake, AWS HealthImaging) and machine learning (ML) and analytics services (such as Amazon SageMaker, Amazon Athena, and Amazon QuickSight). It ingests raw HCLS data formats like variant call format (VCF), Fast Healthcare Interoperability Resources (FHIR), and Digital Imaging and Communications in Medicine (DICOM), and provides a zero-extract, transform, load (ETL) architecture to customers who want to run their data analysis at scale on AWS.

The architectures shows how to store, transform, and analyze linked genomic, clinical, and medical imaging data of patients. The effectiveness of the Guidance is demonstrated on a coherent synthetic patient dataset with multiple disease scenarios, released by MITRE and available on AWS Registry of Open Data. It then trains an ML model for predicting patient outcomes. It also includes an interactive dashboard for visualizing summary statistics of data and ML model reports that can be customized based on the user persona.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Download the architecture diagram

100 %

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

HealthOmics integrates with Amazon EventBridge and provides notifications for actions like Variant or Annotation store creation and delete in addition to start and completion of data import jobs. You can overlay rules and handling targets onto this Guidance to monitor and respond to any incidents that may occur, such as repeated import failures.

Read the Operational Excellence whitepaper

HealthImaging enforces the use of AWS Key Management Service (AWS KMS) encryption as it will not allow the creation of an unencrypted datastore. In addition to this, encryption at rest and transit are supported by HealthOmics, HealthLake, Amazon SageMaker, Athena, QuickSight, Lake Formation, and Amazon S3. This Guidance uses AWS-owned keys, but customers are able to bring their own keys if needed.

Read the Security whitepaper

When deploying this Guidance in an environment with pre-existing HealthOmics resources, you should be aware of HealthOmics Analytics quotas. This Guidance creates 1 Variant store and 1 Annotation store. By default, HealthOmics has a limit of 10 Variant stores and 10 Annotation stores. There are also default limits on the number of import jobs to HealthOmics Analytics stores and the file sizes they can handle. The default limit is 5 concurrent Variant or Annotation store import jobs. This Guidance uses 1 Variant import job and 1 Annotation import job. Variant import jobs have a default limit of 1,000 sources, each with a limit of 20 GB. The example variant data used by this Guidance consists of about 800 Variant files, each about 1 GB. Annotation import jobs have a default limit of 1 source, each with a limit of 20 GB in size. The example annotation data in this Guidance is a single file that is about 10 GB.

Read the Reliability whitepaper

The data in HealthLake is automatically available through Lake Formation. This allows customers to create organizational units (OUs) of users and then grant row and column-level access to those users depending on their data access requirements.

Read the Performance Efficiency whitepaper

HealthLake automatically transforms the clinical data stored in your data catalog to run SQL queries on the data. This eliminates the need for exporting data and paying for data transfer costs for HealthLake data.

Read the Cost Optimization whitepaper

By establishing a centralized data lake for all modalities, this Guidance removes the need to create redundant data. Data stores provided by HealthLake, HealthOmics, and HealthImaging become the single source of truth for each of their respective data types. Lake Formation can govern and filter each data type to provide users with the appropriate access to data without duplication. Similarly, you can create common database constructs, such as “views” in Athena to support multiple analysis use cases without data replication.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open sample code on GitHub

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Guidance for Multi-Modal Data Analysis with AWS Health and ML Services

Overview

How it works

Well-Architected Pillars

Implementation Resources

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help

Guidance for Multi-Modal Data Analysis with AWS Health and ML Services

Overview

How it works

Well-Architected Pillars

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Implementation Resources

Related Content

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help