Overview
With AWS HealthOmics, you pay only for what you use. You are charged based on the amount of data you store and the compute instances you use for processing your workflow. With AWS HealthOmics, you can store sequence and reference data objects or variant and annotation data. You can also run bioinformatics workflows to analyze and transform genomic, transcriptomic, and other omics data. AWS HealthOmics is optimized for the storage and computation of omics data and works with other AWS services such as Amazon SageMaker, Amazon Simple Storage Service (S3), and Amazon Athena.
Free Tier
As part of the AWS Free Tier, you can get started with AWS HealthOmics for free. Your Free Tier starts from the first month when you create your first AWS HealthOmics resource. The details of the AWS HealthOmics Free Tier are in the table below.
Free Tier usage per month for the first 2 months |
|
AWS HealthOmics storage | 1500 gigabase-months in active storage class and 1500 gigabase-months in archive storage class |
AWS HealthOmics workflows | 275 omics.m.xlarge instance hours or equivalent compute instances and 49,000 GB-hours of run storage |
AWS HealthOmics analytics | 200 gigabyte-months |
AWS customers receive 100GB of data transfer out to the internet free each month, aggregated across all AWS Services and Regions (except China and GovCloud).
AWS HealthOmics storage pricing
When you store genomic sequences in AWS HealthOmics storage, you pay for storage per gigabase per month. A gigabase is one billion bases from your imported sequence files (such as FASTQ, BAM, and CRAM). AWS HealthOmics storage stores the bases, quality scores, alignments, and other metadata from your source files. You pay per gigabase stored, so you don’t need to worry about optimal file formats or compression techniques. AWS HealthOmics takes care of all of that for you.
Sequence objects are called read sets and are logically equivalent to a FASTQ, BAM, or CRAM file. AWS HealthOmics storage offers you an active storage class and an archive storage class for your read sets. Read sets in the archive class cost less per month to store than read sets in the active class. Read sets in the active class can be accessed in milliseconds, while read sets in the archive tier need to be activated before they can be accessed. After read sets have not been accessed for 30 days, they automatically move to the lower-cost archive storage class until they are reactivated.
There is no import fee for read sets. AWS HealthOmics storage data is charged for a minimum storage duration of 30 days, and data deleted before 30 days incurs a prorated charge equal to the storage charge for the remaining days. AWS HealthOmics storage is designed for long-lived but infrequently accessed data that is retained for years.
There are two ways of accessing AWS HealthOmics storage data: Through read, write, and update through HealthOmics APIs and reading through S3 APIs. For access through HealthOmics APIs, you pay for GET requests made to your read-set objects. All other HealthOmics request types on read sets are free. For access through S3 APIs, COPY and LIST requests are billed separately from all other request types.
AWS HealthOmics analytics pricing
AWS HealthOmics analytics helps you prepare your genomic variant data and genomic annotations for use with the broad suite of AWS analytics and machine learning services such as Amazon Athena and Amazon SageMaker. You can store any amount of genomic variant data, and you only pay for what you store. Data size is defined as the size of transformed data. However, when you query and analyze the data in other services, you pay for the use of those services.
AWS HealthOmics analytics data is charged for a minimum storage duration of 30 days, and data deleted before 30 days incurs a prorated charge equal to the storage charge for the remaining days.
AWS HealthOmics Ready2Run workflows pricing
Ready2Run workflows are pre-built workflows that have been packaged by industry third party software companies and open-source pipelines. You can simply use Ready2Run workflows to process your data with the most commonly used workflows like Germline and GATK-BP. Ready2Run workflows are pay-per-run which means you are charged the same price for every workflow. To view detailed information on each Ready2Run workflow, visit the HealthOmics console.
AWS HealthOmics private workflows pricing
Private workflows enable you to bring your own bioinformatics scripts that are written in the most commonly used workflow languages. You can run private workflows with a single execution. You pay only for what you request and you are billed separately for omics instance types and run storage. All tasks in your workflow are mapped to the instance that is the best fit for their defined resources. For example, a task that is defined to use 8 CPUs and 60 GB of RAM will map to the omics.r.2xlarge instance type for execution. For run storage, you can choose a statistically provisioned file system with greater file system throughput or a file system that scales dynamically.
Data transfer
You pay for all bandwidth out of HealthOmics. Data transfer fees do not apply to data transferred to any AWS services within the same AWS Region as the data store. The pricing below is based on data transferred "in" and "out" of AWS HealthOmics (over the public internet)†††. Learn more about AWS Direct Connect pricing. For Data Transfers exceeding 500 TB/Month, please contact us.
Rate tiers take into account your aggregate usage for Data Transfer Out to the Internet across all AWS services.
††† Data Transfer Out may be different from the data received by your application in case the connection is prematurely terminated by you, for example, if you make a request for a 10 GB object and terminate the connection after receiving the first 2 GB of data. AWS HealthOmics attempts to stop the streaming of data, but it does not happen instantaneously. In this example, the Data Transfer Out may be 3 GB (1 GB more than 2 GB you received). As a result, you will be billed for 3 GB of Data Transfer Out.
Pricing examples
Example 1
A population sequencing initiative is starting to sequence individuals from a biobank they have collected. They choose to do this in the EU West (Ireland) Region. They sequence 100,000 individuals, each at 130 gigabases, 50 gigabytes, and store the raw sequencing data in AWS HealthOmics storage. Over the next five years, they remain in the archive storage class after the 30 days following import and are accessed twice, on average, when they transition to the active storage class for 30 days. They use S3 APIs for accessing the files. Each genome is downloaded in 500 parts, generating 500 GET API calls. Their total cost over five years for a single genome is:
Active storage class: $0.005769 gigabase/month * 130 gigabases * 90 days = $2.22
Archive storage class: $0.001154 gigabase/month * 130 gigabases * (1825 – 90) days = $8.56.
S3 GET APIs: $0.0004 / 1000 API calls * (2 * 500API calls) = $0.0004
Total cost for 5 years: $2.22 + $8.56 + $0.0004 = $10.78 (or $2.15/year)
Example 2
A bioinformatics scientist wants to run a Nextflow workflow in AWS HealthOmics workflows in the US East (N. Virginia) Region. She has three tasks in the workflow. The first reserves 16 vCPUs and 30 GB memory and takes 3 hours to run. The second requires 32 vCPUs and 160 GB memory and takes 2 hours to run. The third reserves 4 vCPU and 10 GB memory and takes 10 minutes to run. The customer registers the workflow and calls the StartRun API with the default 1200 GB file system. Her overall costs are:
Task 1 (omics.c.4xlarge): $ 0.9180/hr * 3 hrs = $2.754
Task 2 (omics.r.8xlarge): $ 2.7216/hr * 2 hrs = $5.4432
Task 3 (omics.m.xlarge): $ 0.2592/hr * 1/6 hrs = $0.0432
Static run storage: $0.0001918/ GB-hour * (1200GB*(3 hr+2 hr+1/6 hr)) = $1.18916
Total: $9.42956
Example 3
A bioinformatics scientist is developing a new WDL workflow in AWS HealthOmics in the US East (N. Virginia) Region. She has two tasks in the workflow. The first reserves 16 vCPUs and 30 GB memory and takes 3.5 hours to run. The second requires 32 vCPUs and 160 GB memory and takes 2.25 hours to run. The customer registers the workflow and calls the StartRun API with the dynamic file system. Over the course of the 5.75 hour workflow run, the file system grows linearly from 0GB to 1043GB, totaling 3000 GB-hr of file storage. Her overall costs are:
Task 1 (omics.c.4xlarge): $ 0.9180/hr * 3.5 hrs = $3.213
Task 2 (omics.r.8xlarge): $ 2.7216/hr * 2.25 hrs = $6.1236
Dynamic run storage: $0.0004110/ GB-hr * 3,000 GB-hr = $1.233
Total: $10.5696
Example 4
A data scientist has 3,202 variant call format (VCF) files that he wants to analyze in Amazon Athena in the US East (N Virginia) Region. He creates a variant store and ingests these files using the AWS HealthOmics APIs. The ingested data is 1.5 TB in size. Over the course of the next month, he executes 1,000 queries in Athena, calculating allele frequencies for different subpopulations, each on average consuming 50 GB. His overall monthly costs are:
Variant store: $0.035 GB/month * (1024 GB/TB * 1.5 TB) = $53.76
Amazon Athena: $5 / TB * 1000 * 50 / 1024 = $244.14
Example 5
A computational scientist wants to run the GATK-BP Germline fq2vcf for 30x genome Ready2Run workflow in the US East (N. Virginia) Region for 3 samples. The customer input their data and calls the StartRun API for each sample. The cost for the 3 runs is:
GATK-BP Germline fq2vcf for 30x genome Ready2Run workflow: $ 10.00/run * 3 = $30.00
Total: $30.00
Additional pricing resources
Easily calculate your monthly costs with AWS
Contact AWS specialists to get a personalized quote