AWS HPC Blog

Figure 2: CDI transmits the frame buffer using EFA. SRD is a multipath, self-healing transport. This creates a kernel bypass method that effectively enables a memory copy from one framebuffer to another.

How we enabled uncompressed live video with CDI over EFA

We’re going to take you into the world of broadcast video, and explain how it led to us announcing today the general availability of EFA on smaller instance sizes. For a range of applications, this is going to save customers a lot of money because they no longer need to use the biggest instances in each instance family to get HPC-style network performance. But the story of how we got there involves our Elastic Fabric Adapter (EFA), some difficult problems presented to us by customers in the entertainment industry, and an invention called the Cloud Digital Interface (CDI). And it started not very far from Hollywood.

Read More

Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It focuses on germline analysis for whole genome and whole exome applications using GPU accelerated bwa-mem and GATK’s HaplotypeCaller.

Read More

Running a 3.2M vCPU HPC Workload on AWS with YellowDog

OMass Therapeutics, a biotechnology company identifying medicines against highly validated target ecosystems, used Yellowdog on AWS to analyze and screen 337 million compounds in 7 hours, a task which would have taken two months using an on-premises HPC cluster. YellowDog, based in Bristol in the UK, ran the drug discovery application on an extremely large, multi-region cluster in AWS with the AWS ‘pay-as-you-go’ pricing model. It provided a central, unified interface to monitor and manage AWS Region selection, compute provisioning, job allocation and execution. The entire workload completed in 65 minutes, enabling scientists to start work on analysis the same day, significantly accelerating the drug discovery process. In this post, we’ll discuss the AWS and YellowDog services we deployed, and the mechanisms used to scale to 3.2m vCPUs using multiple EC2 instance types across multiple regions in 33 minutes, running at a 95% utilization rate.

Read More

Coming soon: dedicated HPC instances and hybrid functionality

This year, we’ve launched a lot of new capabilities for HPC customers, making AWS the best place for the length and breadth of their workflows. EFA went mainstream and is now available in sixteen instance families for fast fabric capabilities for scaling MPI and NCCL codes. We’ve written deep-dive studies to explore and explain the optimizations that will drive your workloads faster in the cloud than elsewhere. We released a major new version of AWS ParallelCluster with its own API for controlling the cluster lifecycle. AWS Batch became deeply integrated into AWS Step Functions and now supports fair-share scheduling, with multiple levers to control the experience. Today we’re signaling the arrival of a new HPC-dedicated instance family – the Hpc6a – and an enhanced EnginFrame that will bring the best of the cloud and on-premises together in a single interface.

Read More

How to manage HPC jobs using a serverless API

HPC systems are traditionally access through a Command Line Interface (CLI) where the users submit and manage their computational jobs. Depending on their experience and sophistication, the CLI can be a daunting experience for users not accustomed in using it. Fortunately, the cloud offers many other options for users to submit and manage their computational jobs. In this blog post we will cover how to create a serverless API to interact with an HPC system in the the cloud built with AWS ParallelCluster.

Read More
Figure 1. Architecture of Slurm and user workflows, demonstrating two methods of interacting with Slurm. In the first method, the user accesses the Head Node via SSH and runs helper scripts like sinfo, squeue, sbatch, and scontrol. In the second method, the user issues REST API calls through HTTP to slurmrestd.

Using the Slurm REST API to integrate with distributed architectures on AWS

The Slurm Workload Manager by SchedMD is a popular HPC scheduler and is supported by AWS ParallelCluster, an elastic HPC cluster management service offered by AWS. Traditional HPC workflows involve logging into a head node and running shell commands to submit jobs to a scheduler and check job status. Modern distributed systems often use representational […]

Read More

Deep dive into the AWS ParallelCluster 3 configuration file

In September, we announced the release of AWS ParallelCluster 3, a major release with lots of changes and new features. To help get you started migrating your clusters, we provided the Moving from AWS ParallelCluster 2.x to 3.x guide. We know moving versions can be a quite an undertaking, so we’re augmenting that official documentation with additional color and context on a few key areas. With this blog post, we’ll focus on the configuration file format changes for ParallelCluster 3, and how they map back to the same configuration sections for ParallelCluster 2.

Read More

EFA is now mainstream, and that’s a Good Thing

We have recently launched three new Amazon EC2 instances types enabled with Elastic Fabric Adapter (EFA), our network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS. These bring our EFA-enabled count to sixteen different instance families covering a wide range of use cases. EFA is going mainstream and we are just getting started.

Read More
Figure 6: With three shareIdentifier values and 75% capacity reserved, each identifier has exclusive access to 25% of the capacity.

Introducing fair-share scheduling for AWS Batch

Today we are announcing fair-share scheduling (FSS) for AWS Batch, which provides fine-grain control of the scheduling behavior by using a scheduling policy. With FSS, customers can prevent “unfair” situations caused by strict first-in, first-out scheduling where high priority jobs can’t “jump the queue” without draining other jobs first. You can now balance resource consumption between groups of workloads and have confidence that the shared compute environment is not dominated by a single workload. In this post, we’ll explain how fair-share scheduling works in more detail. You’ll also find a link to a step-by-step workshop at the end of this post, so you can try it out yourself.

Read More
Figure 1: High level architecture of the file system.

Scaling a read-intensive, low-latency file system to 10M+ IOPs

Many shared file systems are used in supporting read-intensive applications, like financial backtesting. These applications typically exploit copies of datasets whose authoritative copy resides somewhere else. For small datasets, in-memory databases and caching techniques can yield impressive results. However, low latency flash-based scalable shared file systems can provide both massive IOPs and bandwidth. They’re also easy to adopt because of their use of a file-level abstraction. In this post, I’ll share how to easily create and scale a shared, distributed POSIX compatible file system that performs at local NVMe speeds for files opened read-only.

Read More