AWS HealthOmics features

AWS HealthOmics makes it easier to store, query, and analyze genomic, transcriptomic, and other omics data and then generate insights from that data. It simplifies and accelerates the process of storing and analyzing multiomic information for research and clinical applications, so you can focus on deriving deeper insights from your data.

With AWS HealthOmics storage, you can store petabytes of omics data efficiently and cost effectively, allowing scientific discovery at population scale. AWS HealthOmics private and Ready2Run workflows automate provisioning and scaling of compute infrastructure, so you can run bioinformatics analysis pipelines at production scale and spend less time managing infrastructure and more time conducting research. AWS HealthOmics comes with a collection of Ready2Run workflows that are pre-built and priced per run. AWS HealthOmics analytics simplifies preparing omics data for multimodal analyses, letting you bring multiomics and health record data together and generate more targeted and personalized therapy. These features are also HIPAA eligible.

General

AWS HealthOmics storage is compatible with bioinformatics file formats such as FASTQ, BAM, and CRAM and allows you to store, discover, and share this data efficiently and at low cost. These file formats are stored as read-set objects within a sequence store. You can also store reference genomes in the FASTA format. Data is imported as immutable objects with unique identifiers to support workloads that require strict data provenance. Access to individual data objects, including references and read-set objects, can be controlled using tags and attribute-based access controls through AWS Identity and Access Management (IAM). To reduce long-term storage costs, data objects that have not been accessed within 30 days are automatically moved to an archive storage class. Archived objects can be reactivated at any time with an API call.

AWS HealthOmics helps you run bioinformatics workflows at scale. You can choose Ready2Run workflows or bring-your-own private workflows to process your biological data without the need to manage the underlying infrastructure.

Ready2Run workflows are pre-built workflows designed by industry leading third-party software companies like Sentieon, Inc., NVIDIA, and Element Biosciences along with common open-source pipelines such as Broad Institute’s GATK best practice workflow and AlphaFold for protein structure prediction. You can simply use Ready2Run workflows to process your data without the need to manage the software tools or workflow scripts. Ready2Run workflows are pay per run with a pre-determined price.

Private workflows enable you to bring your own workflow scripts that are written in Workflow Description Language (WDL) or Nextflow, which are the two most commonly used workflow languages. You can run these private workflows with a single execution, which is known as a run. For private workflows, you pay only for what you request and you are billed separately for omics instance types and run storage. All tasks in your workflow are mapped to the instance that is the best fit for your defined resources.

With AWS HealthOmics, you can quickly ingest and transform genomics data formats such as (g)VCF, GFF3, and TSV/CSVs into Apache Iceberg tables. You can make the genomics data accessible through analytics services such as Amazon Athena. You can transform both variant data (data from an individual sample) and annotation data (known information about positions in the genome). You can control access to analytics stores with AWS Lake Formation, making it easier to perform queries across diverse data sources while implementing fine-grained access controls. For example, you can securely combine the genome data of individuals with their medical history from Amazon HealthLake—which can include prior treatments, medications, or lab reports—to facilitate precision medicine.

AWS HealthOmics makes it easier for researchers to collaborate through tagging, setting permissions, and sharing data securely with collaborators. This simplifies how you make your omics data findable, accessible, interoperable, and reusable (FAIR). With domain-specific metadata, you can link AWS HealthOmics data stores with other omics and healthcare data to facilitate multiomic and multimodal analyses. For data provenance, AWS HealthOmics archives all workflow run metadata in CloudWatch logs and enables you to store easily query this information. You can export this information out of CloudWatch to S3 for long term storage. This information can help you track which algorithms were used with your input data to generate your output data for your compliance requirements.

Security, privacy, and compliance

AWS HealthOmics is HIPAA eligible. You can apply attribute-based controls to define fine-grained data access and governance. Comprehensive logging and provenance capture is built in so you know what data was accessed, who accessed it, and when.