AWS HPC Blog

Category: AWS ParallelCluster

How to manage HPC jobs using a serverless API

HPC systems are traditionally access through a Command Line Interface (CLI) where the users submit and manage their computational jobs. Depending on their experience and sophistication, the CLI can be a daunting experience for users not accustomed in using it. Fortunately, the cloud offers many other options for users to submit and manage their computational jobs. In this blog post we will cover how to create a serverless API to interact with an HPC system in the the cloud built with AWS ParallelCluster.

Figure 1. Architecture of Slurm and user workflows, demonstrating two methods of interacting with Slurm. In the first method, the user accesses the Head Node via SSH and runs helper scripts like sinfo, squeue, sbatch, and scontrol. In the second method, the user issues REST API calls through HTTP to slurmrestd.

Using the Slurm REST API to integrate with distributed architectures on AWS

The Slurm Workload Manager by SchedMD is a popular HPC scheduler and is supported by AWS ParallelCluster, an elastic HPC cluster management service offered by AWS. Traditional HPC workflows involve logging into a head node and running shell commands to submit jobs to a scheduler and check job status. Modern distributed systems often use representational […]

Deep dive into the AWS ParallelCluster 3 configuration file

In September, we announced the release of AWS ParallelCluster 3, a major release with lots of changes and new features. To help get you started migrating your clusters, we provided the Moving from AWS ParallelCluster 2.x to 3.x guide. We know moving versions can be a quite an undertaking, so we’re augmenting that official documentation with additional color and context on a few key areas. With this blog post, we’ll focus on the configuration file format changes for ParallelCluster 3, and how they map back to the same configuration sections for ParallelCluster 2.

Figure 4: Relative price-to-performance ratio ($USD/ns) while scaling the simulation across single and multi-GPU instances and comparing to CPU (EFA enabled) performance-to-price (baseline CPU perf).

Running GROMACS on GPU instances: multi-node price-performance

This three-part series of posts cover the price performance characteristics of running GROMACS on Amazon Elastic Compute Cloud (Amazon EC2) GPU instances. Part 1 covered some background no GROMACS and how it utilizes GPUs for acceleration. Part 2 covered the price performance of GROMACS on a particular GPU instance family running on a single instance. […]

Running the Harmonie numerical weather prediction model on AWS

The Danish Meteorological Institute (DMI) is responsible for running atmospheric, climate and ocean models covering the kingdom of Denmark. We worked together with the DMI to port and run a full numerical weather prediction (NWP) cycling dataflow with the Harmonie Numerical Weather Prediction (NWP) model to AWS. You can find a report of the porting and operational experience in the ACCORD community newsletter. In this blog post, we expand on that report to present the initial timing results from running the forecast component of Harmonie model on AWS. We also present these as-is timing results together with as-is timings attained on the supercomputing systems based on Cray XC40 and Intel Xeon based Cray XC50.

Cost-optimization on Spot Instances using checkpoint for Ansys LS-DYNA

A major portion of the costs incurred for running Finite Element Analyses (FEA) workloads on AWS comes from the usage of Amazon EC2 instances. Amazon EC2 Spot Instances offer a cost-effective architectural choice, allowing you to take advantage of unused EC2 capacity for up to a 90% discount compared to On-Demand Instance prices. In this post, we describe how you 0can run fault-tolerant FEA workloads on Spot Instances using Ansys LS-DYNA’s checkpointing and auto-restart utility.

Quantum Chemistry Calculation with FHI-aims code on AWS

This article was contributed by Dr. Fabio Baruffa, Sr. HPC and QC Solutions Architect at AWS, and Dr. Jesús Pérez Ríos, Group Leader at the Fritz Haber Institute, Max-Planck Society.   Introduction Quantum chemistry – the study of the inherently quantum interactions between atoms forming part of molecules – is a cornerstone of modern chemistry. […]

New: Introducing AWS ParallelCluster 3

Running HPC workloads, like computational fluid dynamics (CFD), molecular dynamics, or weather forecasting typically involves a lot of moving parts. You need a hundreds or thousands of compute cores, a job scheduler for keeping them fed, a shared file system that’s tuned for throughput or IOPS (or both), loads of libraries, a fast network, and […]

Supporting climate model simulations to accelerate climate science

The Amazon Sustainability Data Initiative (ASDI), AWS is donating cloud resources, technical support, and access to scalable infrastructure and fast networking providing high performance computing solutions to support simulations of near-term climate using the National Center for Atmospheric Research (NCAR) Community Earth System Model Version 2 (CESM2) and its Whole Atmosphere Community Climate Model (WACCM). In collaboration with ASDI, AWS, and SilverLining, a nonprofit dedicated to ensuring a safe climate, the National Center for Atmospheric Research (NCAR) will run an ensemble of 30 climate-model simulations on AWS. The climate runs will simulate the Earth system over the period of years 2022-2070 under a median scenario for warming and make them available through the AWS Open Data Program. The simulation work will demonstrate the ability to use cloud infrastructure to advance climate models in support of robust scientific studies by researchers around the world and aims to accelerate and democratize climate science.