AWS HPC Blog

Figure 2: AWS HTC-Grid’s Amazon EKS-based Compute Plane

Cloud-native, high throughput grid computing using the AWS HTC-Grid solution

We worked with our financial services customers to develop an open-source, scalable, cloud-native, high throughput computing solution on AWS — AWS HTC-Grid. HTC-Grid allows you to submit large volumes of short and long running tasks and scale environments dynamically. In this first blog of a two-part series, we describe the structure of HTC-Grid and its objective to provide a configurable blueprint for HPC grid scheduling on the cloud.

Optimize your Monte Carlo simulations using AWS Batch

Introduction Monte Carlo methods are a class of methods based on the idea of sampling to study mathematical problems for which analytical solutions may be unavailable. The basic idea is to create samples through repeated simulations that can be used to derive approximations about a quantity we’re interested in, and its probability distribution. In this […]

GROMACS performance on Amazon EC2 with Intel Ice Lake processors

We recently launched two new Amazon EC2 instance families based on Intel’s Ice Lake – the C6i and M6i. These instances provide higher core counts and take advantage of generational performance improvements on Intel’s Xeon scalable processor family architectures. In this post we show how GROMACS performs on these new instance families. We use similar methodologies as for previous posts where we characterized price-performance for CPU-only and GPU instances (Part 1, Part 2, Part 3), providing instance recommendations for different workload sizes.

Introducing AWS ParallelCluster multiuser support via Active Directory

Today we’re announcing the release of AWS ParallelCluster 3.1 which now supports multiuser authentication based on Active Directory (AD). Starting with v3.1.1 clusters can be configured to use an AD domain managed via one of the AWS Directory Service options like Simple AD or AWS Managed Microsoft AD (MSAD). This blog post describes the new feature, and gives an example of a configuration block for ParallelCluster 3 configuration files.

How to Arm a world-leading forecast model with AWS Graviton and Lambda

The Met Office is the UK’s National Meteorological Service, providing 24×7 world-renowned scientific excellence in weather, climate and environmental forecasts and severe weather warnings for the protection of life and property. They provide forecasts and guidance for the public, to our government and defence colleagues as well as the private sector. As an example, if you’ve been on a plane over Europe, Middle East, or Africa; that plane took off because the Met Office (as one of two World Aviation Forecast Centres) provided a forecast. This article explains one of the ways they use AWS to collect these observations, which has freed them to focus more on top quality delivery for their customers.

Abstract background circuit board futuristic server. Can be used as digital dynamic wallpaper, technology background. 3d rendering

Join us for our HPC “Speeds n’ Feeds” event on Feb. 9

It’s often difficult to keep track of all the announcements AWS is making around HPC. Come and join us on Feb. 9th for a quick overview of the latest and greatest AWS HPC products and services launched over the past year. You will hear directly from the AWS HPC engineers and product managers who have built these exciting new offerings.

Using Spot Instances with AWS ParallelCluster and Amazon FSx for Lustre

Processing large amounts of complex data often requires leveraging a mix of different Amazon EC2 instance types. These types of computations also benefit from shared, high performance, scalable storage like Amazon FSx for Lustre. A way to save costs on your analysis is to use Amazon EC2 Spot Instances, which can help to reduce EC2 costs up to 90% compared to On-Demand Instance pricing. This post will guide you in the creation of a fault-tolerant cluster using AWS ParallelCluster. We will explain how to configure ParallelCluster to automatically unmount the Amazon FSx for Lustre filesystem and resubmit the interrupted jobs back into the queue in the case of Spot interruption events.