AWS HPC Blog

Tag: Genomics

Running accurate, comprehensive, and efficient genomics workflows on AWS using Illumina DRAGEN v4.0

In this blog, we provide a walkthrough of running Illumina DRAGEN v4.0 genomic analysis pipelines on AWS, showing accuracy and efficiency, copy number analysis, structural variants, SMN callers, repeat expansion detection, and pharmacogenomics insights for complex genes. We also highlight some benchmarking results for runtime, cost, and concordance from the Illumina DRAGEN DNA sequencing pipeline.

Cost-effective and accurate genomics analysis with Sentieon on AWS

In this blog post, we benchmark the performance of Sentieon’s DNAseq and DNAscope pipelines using publicly available genomics datasets on AWS. You will gain an understanding of the runtime, cost, and accuracy performance of these germline variant calling pipelines across a wide range of Amazon EC2 instances.

BioContainers are now available in Amazon ECR Public Gallery

Today we are excited to announce that all 9000+ applications provided by the BioContainers community are available within ECR Public Gallery! You don’t need an AWS account to access these images, but having one allows many more pulls to the internet, and unmetered usage within AWS. If you perform any sort of bioinformatics analysis on AWS, you should check it out!

Accelerating Genomics Pipelines Using Intel’s Open Omics Acceleration Framework on AWS

In this blog, we showcase the first version of Open Omics and benchmark three applications that are used in processing NGS data – sequence alignment tools BWA-MEM, minimap2, and single cell ATAC-Seq on Xeon-based Amazon Elastic Compute Cloud (Amazon EC2) Instances.

Analyzing Genomic Data using Amazon Genomics CLI and Amazon SageMaker

In this blog post, we demonstrate how to leverage the AWS Genomics Command line and Amazon SageMaker to analyze large-scale exome sequences and derive meaningful insights. We use the bioinformatics workflow manager Nextflow, it’s open source library of pipelines, NF-Core, and AWS Batch.

Getting Started with NVIDIA Clara Parabricks on AWS Batch using AWS CloudFormation

In this blog post, we’ll show how you can run NVIDIA Parabricks on AWS Batch leveraging AWS CloudFormation templates. Parabricks is a GPU-accelerated tool for secondary genomic analysis. It reduces the runtime of variant calling on a 30x human genome from 30 hours to just 30 minutes, and leverages AWS Batch to provide an interface that scales compute jobs across multiple instances in the cloud.

Benchmarking NVIDIA Clara Parabricks Somatic Variant Calling Pipeline on AWS

Somatic variants are genetic alterations which are not inherited but acquired during one’s lifespan, for example those that are present in cancer tumors. In this post, we will demonstrate how to perform somatic variant calling from matched tumor and normal genome sequence data, as well as tumor-only whole genome and whole exome datasets using an NVIDIA GPU-accelerated Parabricks pipeline, and compare the results with baseline CPU-based workflows.

Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It focuses on germline analysis for whole genome and whole exome applications using GPU accelerated bwa-mem and GATK’s HaplotypeCaller.