AWS HPC Blog

Category: High Performance Computing

Figure 1: Evaluating a sequence alignment workflow using graph reduction.** In redun, workflows are represented as an Expression Graph (left) which contain concrete value nodes (grey) and Expression nodes (blue). The redun scheduler identifies tasks that are ready to execute by finding subtrees that can be reduced (red boxes), substituting task results back into the Expression Graph (red arrows). The scheduler continues to find reductions until the Expression Graph reduces to a single concrete value (grey box, far right). If any reduction has been done before (determine by comparing an Expression's hash), the redun scheduler can replay the reduction from a central cache and skip task re-execution.

Data Science workflows at insitro: using redun on AWS Batch

Matt Rasmussen, VP of Software Engineering at insitro describes their recently released, open-source data science framework, redun, which allows data scientists to define complex scientific workflows that scale from their laptop to large-scale distributed runs on serverless platforms like AWS Batch and AWS Glue. I this post, Matt shows how redun lends itself to Bioinformatics workflows which typically involve wrapping Unix-based programs that require file staging to and from object storage. In the next blog post, Matt describes how redun scales to large and heterogenous workflows by leveraging AWS Batch features such as Array Jobs and AWS Glue features such as Glue DynamicFrame.

Choosing between AWS Batch or AWS ParallelCluster for your HPC Workloads

It’s an understatement that AWS has a lot of services (more than 200 at the time of this post!). We’re usually the first to point out that there’s more than one way to solve a problem. HPC is no different in this regard, because we offer a choice: customers can run their HPC workloads using AWS […]

Getting the best OpenFOAM Performance on AWS

OpenFOAM is one the most widely used Computational Fluid Dynamics (CFD) packages and helps companies in a broad range of sectors (automotive, aerospace, energy, and life-sciences) to conduct research and design new products. In this post, we’ll discuss six practical things you can do as an OpenFOAM user to run your simulations faster and more cost effectively.

Figure 2: AWS HTC-Grid’s Amazon EKS-based Compute Plane

Cloud-native, high throughput grid computing using the AWS HTC-Grid solution

We worked with our financial services customers to develop an open-source, scalable, cloud-native, high throughput computing solution on AWS — AWS HTC-Grid. HTC-Grid allows you to submit large volumes of short and long running tasks and scale environments dynamically. In this first blog of a two-part series, we describe the structure of HTC-Grid and its objective to provide a configurable blueprint for HPC grid scheduling on the cloud.

Optimize your Monte Carlo simulations using AWS Batch

Introduction Monte Carlo methods are a class of methods based on the idea of sampling to study mathematical problems for which analytical solutions may be unavailable. The basic idea is to create samples through repeated simulations that can be used to derive approximations about a quantity we’re interested in, and its probability distribution. In this […]

GROMACS performance on Amazon EC2 with Intel Ice Lake processors

We recently launched two new Amazon EC2 instance families based on Intel’s Ice Lake – the C6i and M6i. These instances provide higher core counts and take advantage of generational performance improvements on Intel’s Xeon scalable processor family architectures. In this post we show how GROMACS performs on these new instance families. We use similar methodologies as for previous posts where we characterized price-performance for CPU-only and GPU instances (Part 1, Part 2, Part 3), providing instance recommendations for different workload sizes.

Introducing AWS ParallelCluster multiuser support via Active Directory

Today we’re announcing the release of AWS ParallelCluster 3.1 which now supports multiuser authentication based on Active Directory (AD). Starting with v3.1.1 clusters can be configured to use an AD domain managed via one of the AWS Directory Service options like Simple AD or AWS Managed Microsoft AD (MSAD). This blog post describes the new feature, and gives an example of a configuration block for ParallelCluster 3 configuration files.