Introducing Genomics Tertiary Analysis and Data Lakes Using AWS Glue and Amazon Athena

Posted on: Jul 10, 2020

Genomics Tertiary Analysis and Data Lakes Using AWS Glue and Amazon Athena is a new AWS Solutions Implementation that creates a scalable environment in AWS to prepare genomic data for large-scale analysis and perform interactive queries against a genomics data lake. The solution demonstrates how to 1) build, package, and deploy libraries used for genomics data conversion, 2) provision data ingestion pipelines for genomics data preparation and cataloging, and 3) run interactive queries against a genomics data lake. The solution uses AWS CloudFormation to automate its deployment in the AWS Cloud, and it includes continuous integration and continuous delivery (CI/CD) using AWS CodeCommit source code repositories and AWS CodePipeline  for building and deploying updates to the data preparation jobs, crawlers, data analysis notebooks, and the data lake infrastructure. It fully leverages infrastructure as code principles and best practices that enable you to rapidly evolve the solution.

To learn more about Genomics Tertiary Analysis and Data Lakes Using AWS Glue and Amazon Athena, see the AWS Solutions Implementation webpage.

Additional AWS Solutions are available on the AWS Solutions Implementations webpage, where customers can browse solutions by product category or industry to find AWS-vetted, automated, turnkey reference implementations that address specific business needs.