AWS HPC Blog
Category: AWS Batch
BioContainers are now available in Amazon ECR Public Gallery
Today we are excited to announce that all 9000+ applications provided by the BioContainers community are available within ECR Public Gallery! You don’t need an AWS account to access these images, but having one allows many more pulls to the internet, and unmetered usage within AWS. If you perform any sort of bioinformatics analysis on AWS, you should check it out!
Optimize Protein Folding Costs with OpenFold on AWS Batch
In this post, we describe how to orchestrate protein folding jobs on AWS Batch. We also compare the performance of OpenFold and AlphaFold on a set of public targets. Finally, we will discuss how to optimize your protein folding costs.
Rearchitecting AWS Batch managed services to leverage AWS Fargate
AWS service teams continuously improve the underlying infrastructure and operations of managed services, and AWS Batch is no exception. The AWS Batch team recently moved most of their job scheduler fleet to a serverless infrastructure model leveraging AWS Fargate. I had a chance to sit with Devendra Chavan, Senior Software Development Engineer on the AWS Batch team, to discuss the move to AWS Fargate and its impact on the Batch managed scheduler service component.
Analyzing Genomic Data using Amazon Genomics CLI and Amazon SageMaker
In this blog post, we demonstrate how to leverage the AWS Genomics Command line and Amazon SageMaker to analyze large-scale exome sequences and derive meaningful insights. We use the bioinformatics workflow manager Nextflow, it’s open source library of pipelines, NF-Core, and AWS Batch.
Efficient and cost-effective rendering pipelines with Blender and AWS Batch
This blog post explains how to run parallel rendering workloads and produce an animation in a cost and time effective way using AWS Batch and AWS Step Functions. AWS Batch manages the rendering jobs on Amazon Elastic Compute Cloud (Amazon EC2), and AWS Step Functions coordinates the dependencies across the individual steps of the rendering workflow. Additionally, Amazon EC2 Spot instances can be used to reduce compute costs by up to 90% compared to On-Demand prices.
Getting Started with NVIDIA Clara Parabricks on AWS Batch using AWS CloudFormation
In this blog post, we’ll show how you can run NVIDIA Parabricks on AWS Batch leveraging AWS CloudFormation templates. Parabricks is a GPU-accelerated tool for secondary genomic analysis. It reduces the runtime of variant calling on a 30x human genome from 30 hours to just 30 minutes, and leverages AWS Batch to provide an interface that scales compute jobs across multiple instances in the cloud.
Understanding the AWS Batch termination process
In this blog post, we help you understand the AWS Batch job termination process and how you may take actions to gracefully terminate a job by capturing SIGTERM signal inside the application. It provides you with an efficient way to exit your Batch jobs. You also get to know about how job timeouts occur, and how the retry operation works with both traditional AWS Batch jobs and array jobs.
Bayesian ML Models at Scale with AWS Batch
Ampersand is a data-driven TV advertising technology company that provides aggregated TV audience impression insights and planning on 42 million households, in every media market, across more than 165 networks and apps and in all dayparts (broadcast day segments). The Ampersand Data Science team estimated that building their statistical models would require up to 600,000 physical CPU hours to run, which would not be feasible without using a massively parallel and large-scale architecture in the cloud. AWS Batch enabled Ampersand to compress their time of computation over 500x through massive scaling while optimizing their costs using Amazon EC2 Spot. In this blog post, we will provide an overview of how Ampersand built their TV audience impressions (“impressions”) models at scale on AWS, review the architecture they have been using, and discuss optimizations they conducted to run their workload efficiently on AWS Batch.
Encoding workflow dependencies in AWS Batch
This post covers the different ways you can encode a dependency between basic and array jobs in AWS Batch. We also cover why you may want to encode dependencies outside of Batch altogether using a workflow system like AWS Step Functions or Apache Airflow.
AWS Batch updates: higher compute utilization, AWS PrivateLink support, and updatable compute environments
In this post, I cover some of the recent updates to AWS Batch, including improvements to job placement, addition of AWS PrivateLink support, and the new capabilities to update your AWS Batch compute environments.