Overview

With AWS HealthOmics, you pay only for what you use. You are charged based on the amount of data you store and the compute instances you use for processing your workflow. With AWS HealthOmics, you can store sequence and reference data objects or variant and annotation data. You can also run bioinformatics workflows to analyze and transform genomic, transcriptomic, and other omics data. AWS HealthOmics is optimized for the storage and computation of omics data and works with other AWS services such as Amazon SageMaker, Amazon Simple Storage Service (S3), and Amazon Athena.

Free Tier

As part of the AWS Free Tier, you can get started with AWS HealthOmics for free. Your Free Tier starts from the first month when you create your first AWS HealthOmics resource. The details of the AWS HealthOmics Free Tier are in the table below.

 

Free Tier usage per month for the first 2 months

AWS HealthOmics storage 1500 gigabase-months in active storage class and 1500 gigabase-months in archive storage class
AWS HealthOmics workflows

275 omics.m.xlarge instance hours or equivalent compute instances and 49,000 GB-hours of run storage

AWS HealthOmics analytics 200 gigabyte-months

AWS HealthOmics storage pricing

When you store genomic sequences in AWS HealthOmics storage, you pay for storage per gigabase per month. A gigabase is one billion bases from your imported sequence files (such as FASTQ, BAM, and CRAM). AWS HealthOmics storage stores the bases, quality scores, alignments, and other metadata from your source files. You pay per gigabase stored, so you don’t need to worry about optimal file formats or compression techniques. AWS HealthOmics takes care of all of that for you.

Sequence objects are called read sets and are logically equivalent to a FASTQ, BAM, or CRAM file. AWS HealthOmics storage offers you an active storage class and an archive storage class for your read sets. Read sets in the archive class cost less per month to store than read sets in the active class. Read sets in the active class can be accessed in milliseconds, while read sets in the archive tier need to be activated before they can be accessed. After read sets have not been accessed for 30 days, they automatically move to the lower-cost archive storage class until they are reactivated.

There is no import fee for read sets. AWS HealthOmics storage data is charged for a minimum storage duration of 30 days, and data deleted before 30 days incurs a prorated charge equal to the storage charge for the remaining days. AWS HealthOmics storage is designed for long-lived but infrequently accessed data that is retained for years.

You pay for GET requests made to your read-set objects. All other request types on read sets are free.

AWS HealthOmics analytics pricing

AWS HealthOmics analytics helps you prepare your genomic variant data and genomic annotations for use with the broad suite of AWS analytics and machine learning services such as Amazon Athena and Amazon SageMaker. You can store any amount of genomic variant data, and you only pay for what you store. Data size is defined as the size of transformed data. However, when you query and analyze the data in other services, you pay for the use of those services.

AWS HealthOmics analytics data is charged for a minimum storage duration of 30 days, and data deleted before 30 days incurs a prorated charge equal to the storage charge for the remaining days.

AWS HealthOmics private and Ready2Run workflows pricing

AWS HealthOmics also manages the execution of bioinformatics workflows with private and Ready2Run workflows.

Private workflows enable you to bring your own bioinformatics scripts that are written in the most commonly used workflow languages. You can run private workflows with a single execution. You pay only for what you request and you are billed separately for omics instance types and run storage. All tasks in your workflow are mapped to the instance that is the best fit for their defined resources. For example, a task that is defined to use 8 CPUs and 60 GB of RAM will map to the omics.r.2xlarge instance type for execution.

Ready2Run workflows are pre-built workflows that have been packaged by industry third party software companies and open-source pipelines. You can simply use Ready2Run workflows to process your data with the most commonly used workflows like Germline and GATK-BP. Ready2Run workflows are pay-per-run which means you are charged the same price for every workflow.

For both private and Ready2Run workflows, logs are stored in Amazon CloudWatch logs in your account and billed in CloudWatch for as long as you choose to retain them. You can configure the service to report resource use per run for simplified budgeting, planning, and accounting.

  • Private workflows
  • Ready2Run workflows

Pricing examples

Example 1

A population sequencing initiative is starting to sequence individuals from a biobank they have collected. They choose to do this in the EU West (Ireland) Region. They sequence 100,000 individuals, each at 130 gigabases, and store the raw sequencing data in AWS HealthOmics storage. Over the next five years, they remain in the archive storage class after the 30 days following import and are accessed twice, on average, when they transition to the active storage class for 30 days. Each genome is downloaded in 500 parts, generating 500 GET API calls. Their total cost over five years for a single genome is:
Active storage class: $0.005769 gigabase/month * 130 gigabases * 90 days = $2.22
Archive storage class: $0.001154 gigabase/month * 130 gigabases * (1825 – 90) days = $8.56.
GET APIs: $0.005 / 1000 API calls * (2 * 500 API calls) = $0.005
Total cost for 5 years: $2.22 + $8.56 + $0.005 = $10.79 (or $2.16/year)

Example 2

A bioinformatics scientist wants to run a Nextflow workflow in AWS HealthOmics workflows in the US East (N. Virginia) Region. She has three tasks in the workflow. The first reserves 16 vCPUs and 30 GB memory and takes 3 hours to run. The second requires 32 vCPUs and 160 GB memory and takes 2 hours to run. The third reserves 4 vCPU and 10 GB memory and takes 10 minutes to run. The customer registers the workflow and calls the StartRun API with the default 1200 GB filesystem. Her overall costs are:
Task 1 (omics.c.4xlarge): $ 0.9180/hr * 3 hrs = $ 2.754
Task 2 (omics.r.8xlarge):
$ 2.7216/hr * 2 hrs = $5.4432
Task 3 (omics.m.xlarge): $ 0.2592/hr * 1/6 hrs = $ 0.0432
Storage: $0.0001918/ GB-hour * (1200GB*(3 hr+2 hr+1/6 hr)) = $1.18916
Total: $9.42956

Example 3

A data scientist has 3,202 variant call format (VCF) files that he wants to analyze in Amazon Athena in the US East (N Virginia) Region. He creates a variant store and ingests these files using the AWS HealthOmics APIs. The ingested data is 1.5 TB in size. Over the course of the next month, he executes 1,000 queries in Athena, calculating allele frequencies for different subpopulations, each on average consuming 50 GB. His overall monthly costs are:
Variant store: $0.035 GB/month * (1024 GB/TB * 1.5 TB) = $53.76
Amazon Athena: $5 / TB * 1000 * 50 / 1024 = $244.14

Example 4

A computational scientist wants to run the GATK-BP Germline fq2vcf for 30x genome Ready2Run workflow in the US East (N. Virginia) Region for 3 samples. The customer input their data and calls the StartRun API for each sample. The cost for the 3 runs is:
GATK-BP Germline fq2vcf for 30x genome Ready2Run workflow: $ 10.00/run * 3 = $30.00
Total: $30.00

Additional pricing resources

AWS Pricing Calculator

Easily calculate your monthly costs with AWS

Cloud Economics Resource Center

Contact AWS specialists to get a personalized quote