This Guidance demonstrates how to set up an end-to-end framework to analyze multimodal healthcare and life sciences (HCLS) data. It analyzes this data using purpose-built health care and life sciences services (such as AWS HealthOmics, AWS HealthLake, AWS HealthImaging) and machine learning (ML) and analytics services (such as Amazon SageMaker, Amazon Athena, and Amazon QuickSight). It ingests raw HCLS data formats like variant call format (VCF), Fast Healthcare Interoperability Resources (FHIR), and Digital Imaging and Communications in Medicine (DICOM), and provides a zero-extract, transform, load (ETL) architecture to customers who want to run their data analysis at scale on AWS.

The architectures shows how to store, transform, and analyze linked genomic, clinical, and medical imaging data of patients. The effectiveness of the Guidance is demonstrated on a coherent synthetic patient dataset with multiple disease scenarios, released by MITRE and available on AWS Registry of Open Data. It then trains an ML model for predicting patient outcomes. It also includes an interactive dashboard for visualizing summary statistics of data and ML model reports that can be customized based on the user persona.

Please note: [Disclaimer]

Architecture Diagram

[text]

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • HealthOmics integrates with Amazon EventBridge and provides notifications for actions like Variant or Annotation store creation and delete in addition to start and completion of data import jobs. You can overlay rules and handling targets onto this Guidance to monitor and respond to any incidents that may occur, such as repeated import failures.

    Read the Operational Excellence whitepaper 
  • HealthImaging enforces the use of AWS Key Management Service (AWS KMS) encryption as it will not allow the creation of an unencrypted datastore. In addition to this, encryption at rest and transit are supported by HealthOmics, HealthLake, Amazon SageMaker, Athena, QuickSight, Lake Formation, and Amazon S3. This Guidance uses AWS-owned keys, but customers are able to bring their own keys if needed.

    Read the Security whitepaper 
  • When deploying this Guidance in an environment with pre-existing HealthOmics resources, you should be aware of Amazon Omics Analytics quotas. This Guidance creates 1 Variant store and 1 Annotation store. By default, HealthOmics has a limit of 10 Variant stores and 10 Annotation stores. There are also default limits on the number of import jobs to HealthOmics Analytics stores and the file sizes they can handle. The default limit is 5 concurrent Variant or Annotation store import jobs. This Guidance uses 1 Variant import job and 1 Annotation import job. Variant import jobs have a default limit of 1,000 sources, each with a limit of 20 GB. The example variant data used by this Guidance consists of about 800 Variant files, each about 1 GB. Annotation import jobs have a default limit of 1 source, each with a limit of 20 GB in size. The example annotation data in this Guidance is a single file that is about 10 GB.

    Read the Reliability whitepaper 
  • The data in HealthLake is automatically available through Lake Formation. This allows customers to create organizational units (OUs) of users and then grant row and column-level access to those users depending on their data access requirements.

    Read the Performance Efficiency whitepaper 
  • HealthLake automatically transforms the clinical data stored in your data catalog to run SQL queries on the data. This eliminates the need for exporting data and paying for data transfer costs for HealthLake data.

    Read the Cost Optimization whitepaper 
  • By establishing a centralized data lake for all modalities, this Guidance removes the need to create redundant data. Data stores provided by HealthLake, HealthOmics, and HealthImaging become the single source of truth for each of their respective data types. Lake Formation can govern and filter each data type to provide users with the appropriate access to data without duplication. Similarly, you can create common database constructs, such as “views” in Athena to support multiple analysis use cases without data replication.

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Guidance

Guidance for Multi-Omics and Multi-Modal Data Integration and Analysis on AWS

This Guidance helps users prepare genomic, clinical, mutation, expression, and imaging data for large-scale analysis and perform interactive queries against a data lake.

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?