AWS for Industries
Amazon Omics now supports Sentieon genomic analysis pipelines
Blog is guest authored by Don Freed and Brendan Gallagher from Sentieon. To help customers easily build, deploy, and scale workloads, Amazon Omics now supports pre-built Ready2Run workflows from third-party software companies and open-source pipelines. Read more about the launch here.
Since 2014, AWS Partner Sentieon has been focused on developing highly-optimized algorithms for bioinformatics applications using their expertise in algorithm, software, and system optimization. Today, the company has made its DNAseq and TNseq pipelines available on Amazon Omics as Ready2Run workflows—making it easier for researchers and clinicians to analyze genomic data.
Sentieon’s DNAseq and TNseq pipelines provide matching results to the Broad Institute’s GATK best practices pipelines for germline and somatic variant calling. Sentieon DNAscope provides improved accuracy with a reduction in variant calling errors while still maintaining a faster runtime compared to the GATK. Sentieon also provides tools for large-scale joint calling, long-read alignment and variant calling, and handling of UMI-tagged reads.
Sentieon’s Ready2Run workflows provide a robust, scalable, and timely solution for genomic analysis. Ready2Run workflows are a set of pre-built workflows from third-party software companies and open-source pipelines. With just a few clicks or a single API call, customers can run pre-built pipelines. Ready2Run workflows are priced-per-run to give customers predictable pricing. Sentieon will support nine Ready2Run workflows including workflows for alignment and germline and somatic variant calling of short-read and long-read datasets.
Figure 1: Sentieon Ready2Run workflows on Amazon Omics displaying list price per run and estimated run time
Sentieon Ready2Run workflows support five different reference genomes, including multiple versions of GRCh38 and GRCh37 as well as UCSC’s hg19 to allowing flexibility for choosing the reference genome that can best integrate with downstream data analysis.
Existing Sentieon customers can start using Sentieon on Amazon Omics today through their existing license agreement. New Sentieon customers will automatically receive a free, two-week evaluation license for the Sentieon software. For production, an active Sentieon license is required.
Supported pipelines
The initial launch of Sentieon Ready2Run workflows on Amazon Omics supports nine pipelines. Sentieon germline workflows support alignment (with FASTQ input), preprocessing, and germline variant calling using either the Sentieon Haplotyper or Sentieon DNAscope variant callers. Variant calling with DNAscope can utilize DNAscope model files for Illumina, Element Biosciences, Ultima Genomics, and MGI/Complete Genomics to correct platform-specific data biases, further improving variant calling accuracy. Variant calls can be output in either the VCF format for a single-sample callset or the gVCF format for later integration through joint calling.
The Sentieon somatic workflows support alignment, preprocessing, and somatic variant calling with Sentieon TNseq, matching the Mutect2 GATK best-practices pipeline for somatic variant calling.
The Sentieon LongRead workflows support data from either the PacBio HiFi or Oxford Nanopore technologies. Alignment and germline structural variant calling are supported in both workflows while the PacBio HiFi workflow additionally supports germline small variant calling.
All pipelines produce analysis-ready CRAM files that can be used with other bioinformatics tools or stored for later reference. The CRAM file is efficiently compressed through a lossless data compression algorithm, effectively maintaining the read information from the supplied input file.
Sentieon pipelines may also be run as a private workflow on Amazon Omics. To run Sentieon workflows as private workflows, visit Sentieon’s github repository for instructions.
Conclusion
Sentieon Ready2Run workflows give customers the ability to run consistent, accurate, and efficient genomic analysis pipelines that runs at-scale
To get started with Sentieon’s Ready2Run workflows, visit the Amazon Omics console.
To learn more about the price for each workflow, visit Amazon Omics Ready2Run pricing.
Authors
Don Freed is a Senior Bioinformatics Scientist at Sentieon, Inc. and is passionate about genomic discovery. Prior to joining Sentieon, Don performed graduate research at the Johns Hopkins School of Medicine and the Kennedy Krieger Institute where he studied the connection between somatic mosaic mutations and autism spectrum disorder.
Brendan Gallagher is the Head of Business Development for Sentieon Inc. Brendan has almost 10 years’ experience in bioinformatics software tools. Prior to joining Sentieon, Brendan helped develop the approved precision medicine, “Lutathera,” in the chemistry/wetlab at BioSynthema. Lutathera is a peptide radionuclide receptor therapy for cancer patients, developed at BioSynthema in St. Louis and currently marketed by Novartis.