reference deployment

Hail on AWS

Simplified genomic analysis on Amazon EMR

This Quick Start, built in collaboration with Goldfinch Bio, Inc. and Privo IT, helps to simplify building, managing, and interacting with Hail clusters in your Amazon Web Services (AWS) account. Hail is an open-source library built for Apache Spark to provide scalable data exploration and analysis, with a particular emphasis on genomics.

Using Hail, researchers can perform genomic analysis more quickly and efficiently. Hail makes it easier to use Spark programming techniques to process genetic data (genomic data frames). It also helps to simplify dealing with multiple input formats by creating a common data structure (Hail MatrixTable). 

This deployment uses Amazon EMR in combination with Apache Spark to scale large datasets across instances, such as production-scale genome‐wide association studies (GWAS) and single-node ad hoc processes.

Privo logo

This Quick Start was developed by Goldfinch Bio, Inc. and Privo IT in collaboration with AWS.
Privo is an APN Partner.

  •  What you'll build
  • The Quick Start sets up the following:

    • A Hail 0.2 AWS Service Catalog portfolio, allowing you to create and manage your own Hail clusters.
    • Four AWS CodeBuild pipelines to support building various combinations of Hail 0.2.x releases, Variant Effect Predictor (VEP) versions, and Loss-Of-Function Transcript Effect Estimator (LOFTEE) plug-ins.
    • An Amazon SageMaker instance that lets you stand up and tear down JupyterLab notebook environments that integrate with Hail clusters (through Sparkmagic and Livy).
    • An Amazon EMR cluster that lets you stand up and tear down Hail 0.2 clusters as needed.
    • An Amazon Simple Storage Service (Amazon S3) Sagemaker bucket to back up launched notebook environments.
    • An Amazon S3 bucket for staging Hail artifacts.
    • An optional virtual private cloud (VPC) configured with a private subnet, according to AWS best practices, to provide you with your own virtual network on AWS.
  •  How to deploy
  • To deploy Hail, follow the instructions in the deployment guide. The deployment process takes about 10 minutes and includes these steps:

    1. If you don't already have an AWS account, sign up at, and sign in to your account.
    2. Launch the Quick Start, choosing from the following options. Both options are based on a single template.
    3. Test the deployment.

    Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on the Quick Start.  

  •  Cost and licenses
  • You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using the Quick Start. 

    The AWS CloudFormation template for this Quick Start includes configuration parameters that you can customize. Some of these settings, such as instance type, affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you use. Prices are subject to change.

    Tip: After you deploy the Quick Start, we recommend that you enable the AWS Cost and Usage Report. This report delivers billing metrics to an S3 bucket in your account. It provides cost estimates based on usage throughout each month and finalizes the data at the end of the month. For more information about the report, see the AWS documentation.

    Hail 0.2 is released under the MIT License.