AWS HPC Blog

Category: AWS Batch

Figure 2: Identification of redun jobs and grouping them into Array Jobs to run on AWS Batch. (Top) redun represents the workflow as an Expression Graph (top-left), and identifies reductions (red boxes) that are ready to be executed. The redun Scheduler creates a redun Job (J1, J2, J3) for each reduction and dispatches those jobs to Executors based on the task-specific configuration. The Batch Executor allows jobs to accumulate for up to three seconds (default) in order to identify compatible jobs for grouping into an Array Job, which are then submitted to AWS Batch (top-right). (Bottom) As jobs complete in AWS Batch, the success (green) and failure (red) is propagated back to Executors, the Scheduler, and eventually substituted back into the Expression Graph (bottom-left).

Data Science workflows at insitro: how redun uses the advanced service features from AWS Batch and AWS Glue

Matt Rasmussen, VP of Software Engineering at insitro, expands on his first post on redun, insitro’s data science tool for bioinformatics, to describe how redun makes use of advanced AWS features. Specifically, Matt describes how AWS Batch’s Array Jobs is used to support workflows with large fan-out, and how AWS Glue’s DynamicFrame is used to run computationally heterogenous workflows with different back-end needs such as Spark, all in the same workflow definition.

Figure 1: Evaluating a sequence alignment workflow using graph reduction.** In redun, workflows are represented as an Expression Graph (left) which contain concrete value nodes (grey) and Expression nodes (blue). The redun scheduler identifies tasks that are ready to execute by finding subtrees that can be reduced (red boxes), substituting task results back into the Expression Graph (red arrows). The scheduler continues to find reductions until the Expression Graph reduces to a single concrete value (grey box, far right). If any reduction has been done before (determine by comparing an Expression's hash), the redun scheduler can replay the reduction from a central cache and skip task re-execution.

Data Science workflows at insitro: using redun on AWS Batch

Matt Rasmussen, VP of Software Engineering at insitro describes their recently released, open-source data science framework, redun, which allows data scientists to define complex scientific workflows that scale from their laptop to large-scale distributed runs on serverless platforms like AWS Batch and AWS Glue. I this post, Matt shows how redun lends itself to Bioinformatics workflows which typically involve wrapping Unix-based programs that require file staging to and from object storage. In the next blog post, Matt describes how redun scales to large and heterogenous workflows by leveraging AWS Batch features such as Array Jobs and AWS Glue features such as Glue DynamicFrame.

Choosing between AWS Batch or AWS ParallelCluster for your HPC Workloads

It’s an understatement that AWS has a lot of services (more than 200 at the time of this post!). We’re usually the first to point out that there’s more than one way to solve a problem. HPC is no different in this regard, because we offer a choice: customers can run their HPC workloads using AWS […]

Optimize your Monte Carlo simulations using AWS Batch

Introduction Monte Carlo methods are a class of methods based on the idea of sampling to study mathematical problems for which analytical solutions may be unavailable. The basic idea is to create samples through repeated simulations that can be used to derive approximations about a quantity we’re interested in, and its probability distribution. In this […]

Accelerating drug discovery with Amazon EC2 Spot Instances

We have been working with a team of researchers at the Max Planck Institute, helping them adopt the AWS cloud for drug research applications in the pharmaceutical industry. In this post, we’ll focus on how the team at Max Planck obtained thousands of EC2 Spot Instances spread across multiple AWS Regions for running their compute intensive simulations in a cost-effective manner, and how their solution will be enhanced further using the new Spot Placement Score API.

Figure 6: With three shareIdentifier values and 75% capacity reserved, each identifier has exclusive access to 25% of the capacity.

Introducing fair-share scheduling for AWS Batch

Today we are announcing fair-share scheduling (FSS) for AWS Batch, which provides fine-grain control of the scheduling behavior by using a scheduling policy. With FSS, customers can prevent “unfair” situations caused by strict first-in, first-out scheduling where high priority jobs can’t “jump the queue” without draining other jobs first. You can now balance resource consumption between groups of workloads and have confidence that the shared compute environment is not dominated by a single workload. In this post, we’ll explain how fair-share scheduling works in more detail. You’ll also find a link to a step-by-step workshop at the end of this post, so you can try it out yourself.

Running 20k simulations in 3 days to accelerate early stage drug discovery with AWS Batch

In this blog post, we’ll describe an ensemble run of 20K simulations to accelerate the drug discovery process, while also optimizing for run time and cost. We used two popular open-source packages — GROMACS, which does a molecular dynamics simulations, and pmx, a free-energy calculation package from the Computational Biomolecular Dynamics Group at Max Planck Institute in Germany.

Using AWS Batch Console Support for Step Functions Workflows

Last year, we published the Genomics Secondary Analysis Using AWS Step Functions and AWS Batch solution as a companion solution to the Genomics Data Transfer, Analytics, and Machine Learning Using AWS Services whitepaper. Since then, many customers have used the secondary analysis solution to automate their bioinformatics pipelines in AWS. A common pain point expressed […]

AWS Batch Dos and Don’ts: Best Practices in a Nutshell

AWS Batch is a service that enables scientists and engineers to run computational workloads at virtually any scale without requiring them to manage a complex architecture. In this blog post, we share a set of best practices and practical guidance devised from our experience working with customers in running and optimizing their computational workloads. The readers will learn how to optimize their costs with Amazon EC2 Spot on AWS Batch, how to troubleshoot their architecture should an issue arise and how to tune their architecture and containers layout to run at scale.

Scalable and Cost-Effective Batch Processing for ML workloads with AWS Batch and Amazon FSx

Batch processing is a common need across varied machine learning use cases such as video production, financial modeling, drug discovery, or genomic research. The elasticity of the cloud provides efficient ways to scale and simplify batch processing workloads while cutting costs. In this post, you’ll learn a scalable and cost-effective approach to configure AWS Batch Array jobs to process datasets that are stored on Amazon S3 and presented to compute instances with Amazon FSx for Lustre.