AWS HealthOmics FAQs


AWS HealthOmics is a purpose-built service that helps healthcare and life science organizations and their software partners store, query, and analyze genomic, transcriptomic, and other omics data and then generate insights from that data to improve health. It supports large-scale analysis and collaborative research.

AWS HealthOmics provides scalable workflows and integrated tools for preparing and analyzing omics data and automatically provisions and scales the underlying infrastructure so that you can spend more time on research and innovation. AWS HealthOmics supports large-scale analysis and collaborative research.

AWS HealthOmics can process data directly from Amazon Simple Storage Service (S3) or AWS HealthOmics storage using AWS HealthOmics private and Ready2Run workflows. You can import data such as raw genomic sequence files, variant call format files, and annotation datasets from Amazon S3 into bioinformatics-compatible AWS HealthOmics storage and analytics stores. You can control access to AWS HealthOmics variant and annotation stores using AWS Lake Formation and use Amazon Athena to make data easier to query and combine with other forms of data, such as medical health records from Amazon HealthLake. You can also use Amazon Athena to make data easier to query and combine with other forms of data, such as medical health records from Amazon HealthLake. Additionally, you can use the transformed data in Amazon QuickSight for advanced analytics. You can also use Amazon SageMaker to build, train, and deploy novel machine learning algorithms on your multiomic and multimodal data. Lastly, you can also use Amazon EventBridge to publish events as part of your event-driven architecture.

We have two types of data stores, one for raw, biological data, and one for variant and annotation data. AWS HealthOmics Storage can import FASTA-formatted reference genomes and gzipped FASTQ, BAM, and CRAM formatted raw sequence files. AWS HealthOmics analytics stores can import (g)VCF-formatted files for variant data and VCF, GFF, and TSV/CSV files for genomic annotations. AWS HealthOmics workflows can read any data supported by your defined workflow definition and tooling from either AWS HealthOmics storage or Amazon S3.

Private workflows enable you to bring your own bioinformatics scripts that are written in the most commonly used workflow languages, WDL, CDL, and Nextflow. You can run these private workflows with a single execution, which is known as a run. For private workflows, you pay only for what you request and you are billed separately for omics instance types and run storage. All tasks in your workflow are mapped to the instance that is the best fit for your defined resources.

Ready2Run workflows are pre-built workflows that have been designed by industry leading third party software companies like Sentieon, Inc., NVIDIA, and Element Biosciences along with common open-source pipelines such as Broad Institute’s GATK best practice workflow and AlphaFold for protein structure prediction. You can simply use Ready2Run workflows to process your data with the most commonly used workflows like Germline and Broad Institute’s GATK-8P. Ready2Run workflows are pay per run with a pre-determined price. This means you are charged the same price for every workflow.

Privacy and Security

AWS HealthOmics is HIPAA eligible. You can use attribute-based access controls to define who has access to AWS HealthOmics resources. All persistent storage supports customer-managed keys. Row and column permissions are also available with AWS HealthOmics analytics stores. AWS HealthOmics APIs are integrated with AWS CloudTrail and Amazon CloudWatch logs to allow you to generate detailed data provenance and access audit trails.

AWS HealthOmics is a HIPAA-eligible service. If you are storing protected health information (PHI) on AWS, you are required to have a BAA. You can quickly enter into a BAA online using AWS Artifact.