FAQ

Q: What does the Genomics Tertiary Analysis and Data Lakes Using AWS Glue and Amazon Athena solution do?

A: The solution creates a scalable environment in AWS to prepare genomic data for large-scale analysis and perform interactive queries against a genomics data lake. This solution demonstrates how to build, package, and deploy libraries used for genomics data conversion; provision data ingestion pipelines for genomics data preparation and cataloging; and run interactive queries against a genomics data lake.

Q: Can I modify the solution to work with my own genomics data, queries, and notebooks?

A: Yes, you can modify the solution to fit your particular needs, for example, by adding new AWS Glue jobs and crawlers to ingest, prepare, and catalog your data; and new Jupyter notebooks and Amazon Athena queries to perform data analysis. Each change will be tracked by the CI/CD pipeline, facilitating change control management, rollbacks, and auditing.

Q: What bioinformatics tools are used for data preparation?

A: This solution demonstrates how to use third-party bioinformatics tools to prepare data for ingestion into a genomics data lake. The example provided uses Hail, from the Broad Institute, to read genomics variant data in a Variant Call File (VCF) format into a Spark data frame for processing. The solution also demonstrates how to build a third-party tool, for example, Hail, from source using AWS CodeBuild and deploy it to an Amazon S3 bucket for use in an AWS Glue Job.

Q: What bioinformatics datasets are used in the solution?

A: This solution deploys the ClinVar dataset, a portion of the 1000 Genomes dataset, and an individual 1000 Genomes VCF into the solution data lake bucket. These datasets are used to demonstrate how to ingest, prepare and analyze genomics data using AWS Glue and Amazon Athena. Finally, a Jupyter notebook is provided that demonstrates how to create a drug response report from within a Jupyter notebook.

Q: Can I deploy the solution in any AWS Region?

A: No, this solution uses the AWS CodePipeline service which is currently available in specific AWS Regions only. Therefore, you must launch this solution in an AWS Region where this service is available. For the most current availability by Region, see AWS service offerings by Region.

Training and Certification

AWS Training and Certification builds your competence, confidence, and credibility through practical cloud skills that help you innovate and build your future.  Learn more »

Introduction to AWS CodeCommit

This course introduces you to AWS CodeCommit – the fully-managed source control service that makes it easy for you to host secure and highly scalable private Git repositories. Throughout this course, you will learn more about the service’s features and benefits and how best to use CodeCommit for your own development needs. We also demonstrate how to create a new repository.

Enroll now »

Introduction to AWS CodeBuild

In this introductory course, we discuss what AWS CodeBuild is and how it works and review some common use cases and best practices.

Enroll now »

AWS Certified Solutions Architect – Associate

This exam validates your ability to effectively demonstrate knowledge of how to architect and deploy secure and robust applications on AWS technologies.

Schedule your exam »

Partner resources

The AWS Partner Network (APN) is focused on helping partners build successful AWS-based businesses to drive superb solutions and customer experiences. APN Partners are focused on customer success, helping you take full advantage of all the business benefits that AWS has to offer. With their deep expertise on AWS, APN Partners are uniquely positioned to help your company at any stage of your Cloud Adoption Journey and to help you solve some of your most complex problems.

Visit the following pages to learn more about the services we used to build this AWS Solution.

Need more resources to get started with AWS?

Visit the Getting Started Resource Center to find tutorials, projects and videos to get started with AWS.

Learn more »