*Post Types | AWS HPC Blog

Abstract background circuit board futuristic server. Can be used as digital dynamic wallpaper, technology background. 3d rendering

Join us for our HPC “Speeds n’ Feeds” event on Feb. 9

It’s often difficult to keep track of all the announcements AWS is making around HPC. Come and join us on Feb. 9th for a quick overview of the latest and greatest AWS HPC products and services launched over the past year. You will hear directly from the AWS HPC engineers and product managers who have built these exciting new offerings.

Running Windows HPC Workloads using HPC Pack in AWS

This blog post shows you how to deploy an HPC cluster for Windows workloads. We have provided an AWS CloudFormation template that automates the creation process to deploy an HPC Pack 2019 Windows cluster. This will help you get started quickly to run Windows-based HPC workloads, while leveraging highly scalable, resilient, and secure AWS infrastructure. As an example, we show how to run a sample parametric sweep for EnergyPlus, an open source energy simulation tool maintained by the U.S. Department of Energy’s Building Technology Office.

Accelerating drug discovery with Amazon EC2 Spot Instances

We have been working with a team of researchers at the Max Planck Institute, helping them adopt the AWS cloud for drug research applications in the pharmaceutical industry. In this post, we’ll focus on how the team at Max Planck obtained thousands of EC2 Spot Instances spread across multiple AWS Regions for running their compute intensive simulations in a cost-effective manner, and how their solution will be enhanced further using the new Spot Placement Score API.

Introducing AWS HPC Connector for NICE EnginFrame

Today we’re introducing AWS HPC Connector, a new feature in NICE EnginFrame that allows customers to leverage managed HPC resources on AWS. With this release, EnginFrame provides a unified interface for administrators to make hybrid HPC resources available to their users both on-premises and within AWS. In this post, we’ll provide some context around EnginFrame’s typical use cases, and show how you can use AWS HPC Connector to stand up HPC compute resources on AWS.

Figure 2: CDI transmits the frame buffer using EFA. SRD is a multipath, self-healing transport. This creates a kernel bypass method that effectively enables a memory copy from one framebuffer to another.

How we enabled uncompressed live video with CDI over EFA

We’re going to take you into the world of broadcast video, and explain how it led to us announcing today the general availability of EFA on smaller instance sizes. For a range of applications, this is going to save customers a lot of money because they no longer need to use the biggest instances in each instance family to get HPC-style network performance. But the story of how we got there involves our Elastic Fabric Adapter (EFA), some difficult problems presented to us by customers in the entertainment industry, and an invention called the Cloud Digital Interface (CDI). And it started not very far from Hollywood.

Coming soon: dedicated HPC instances and hybrid functionality

This year, we’ve launched a lot of new capabilities for HPC customers, making AWS the best place for the length and breadth of their workflows. EFA went mainstream and is now available in sixteen instance families for fast fabric capabilities for scaling MPI and NCCL codes. We’ve written deep-dive studies to explore and explain the optimizations that will drive your workloads faster in the cloud than elsewhere. We released a major new version of AWS ParallelCluster with its own API for controlling the cluster lifecycle. AWS Batch became deeply integrated into AWS Step Functions and now supports fair-share scheduling, with multiple levers to control the experience. Today we’re signaling the arrival of a new HPC-dedicated instance family – the Hpc6a – and an enhanced EnginFrame that will bring the best of the cloud and on-premises together in a single interface.

Figure 1. Architecture of Slurm and user workflows, demonstrating two methods of interacting with Slurm. In the first method, the user accesses the Head Node via SSH and runs helper scripts like sinfo, squeue, sbatch, and scontrol. In the second method, the user issues REST API calls through HTTP to slurmrestd.

Using the Slurm REST API to integrate with distributed architectures on AWS

The Slurm Workload Manager by SchedMD is a popular HPC scheduler and is supported by AWS ParallelCluster, an elastic HPC cluster management service offered by AWS. Traditional HPC workflows involve logging into a head node and running shell commands to submit jobs to a scheduler and check job status. Modern distributed systems often use representational […]

Figure 6: With three shareIdentifier values and 75% capacity reserved, each identifier has exclusive access to 25% of the capacity.

Introducing fair-share scheduling for AWS Batch

Today we are announcing fair-share scheduling (FSS) for AWS Batch, which provides fine-grain control of the scheduling behavior by using a scheduling policy. With FSS, customers can prevent “unfair” situations caused by strict first-in, first-out scheduling where high priority jobs can’t “jump the queue” without draining other jobs first. You can now balance resource consumption between groups of workloads and have confidence that the shared compute environment is not dominated by a single workload. In this post, we’ll explain how fair-share scheduling works in more detail. You’ll also find a link to a step-by-step workshop at the end of this post, so you can try it out yourself.

Figure 1: High level architecture of the file system.

Scaling a read-intensive, low-latency file system to 10M+ IOPs

Many shared file systems are used in supporting read-intensive applications, like financial backtesting. These applications typically exploit copies of datasets whose authoritative copy resides somewhere else. For small datasets, in-memory databases and caching techniques can yield impressive results. However, low latency flash-based scalable shared file systems can provide both massive IOPs and bandwidth. They’re also easy to adopt because of their use of a file-level abstraction. In this post, I’ll share how to easily create and scale a shared, distributed POSIX compatible file system that performs at local NVMe speeds for files opened read-only.

Using AWS Batch Console Support for Step Functions Workflows

Last year, we published the Genomics Secondary Analysis Using AWS Step Functions and AWS Batch solution as a companion solution to the Genomics Data Transfer, Analytics, and Machine Learning Using AWS Services whitepaper. Since then, many customers have used the secondary analysis solution to automate their bioinformatics pipelines in AWS. A common pain point expressed […]

Category: *Post Types