AWS HPC Blog

Getting the Best Price Performance for Numerical Weather Prediction Workloads on AWS

This post was contributed by Timothy Brown, Principal Solutions Architect, HPC, Karthik Raman, Principal Solutions Architect, HPC, Chethan Rao, Principal GTM Specialist, HPC, and Sean Smith, Senior Solutions Architect, HPC.

The ongoing need to accurately predict, plan, and manage weather and climate forecasts in the global science community has been a key driver of High-Performance Computing (HPC) advancements on-premises since the 1950s.

More recently, the economic and societal risks posed by extreme weather events and climate change has been fueling net new demand for higher resolution global forecasts (in both spatial and temporal domains) and on-demand regional weather predictions in industries like weather and climate, renewable energy, aerospace, maritime, agriculture, and even commodity trading. You can find out more on the challenges and opportunities in weather and climate science by reading the World Meteorological Organization (WMO) whitepaper.

In this post, we will provide an overview of Numerical Weather Prediction (NWP) workloads and the AWS HPC-optimized services for it. We’ll test three popular NWP codes: WRF, MPAS, and FV3GFS.

By referring to the analysis and results presented in this blog, you will be able evaluate the performance, cost, and price performance for running your NWP workloads on AWS HPC infrastructure.

Numerical Weather Prediction (NWP)

More commonly known as weather forecasting, NWP is a family of workloads that uses mathematical models to process current weather observations to predict future weather conditions, usually over the next 24hrs, 48hrs, 5 days, 10 days in time scale.

The NWP output is calculated based on current weather observations that include temperature, precipitation, and hundreds of other meteorological elements from the oceans to the top of the atmosphere. At its core, NWP models consist of a 3-dimensional grid of cells that represent the Earth system, with each cell defined by a set of multi-physics processes. The computational results of multi-physics processes modeled in each cell are passed to neighboring cells to model the exchange of matter and energy over time.

Two main components define the resolution of the NWP model. First, the grid cell size: this defines the spatial resolution of the model, most commonly in units of kilometers (km). When the size of the grid cells is reduced, the level of computational detail in the model increases. Next, the time step: this is the forecast time increment which defines the temporal resolution of the model. Time steps can range from seconds to years. Like grid cell size, the smaller the time step, the more detailed (and potentially more accurate) the results will be. The net effect of the growing (and growing) demand for higher resolution NWP workloads is a greater demand for performant, elastic, and reliable HPC infrastructure.

AWS HPC for NWP

NWP codes benefit from features like high memory bandwidth, high performance network interconnect, and access to a fast-parallel file system to support efficient scaling to a large number of nodes. In January, AWS announced the Amazon EC2 Hpc6a instance family which delivers 100 Gbps networking through Elastic Fabric Adapter (EFA) with 96 physical cores of third-generation AMD EPYC™ (Milan) and 384 GB RAM. These instances are powered by the AWS Nitro System, an advanced hypervisor technology delivering the required compute and memory resources for increased performance and security.

The following table (Table 1) summarizes the configuration of the EC2 instance type we used.

Table 1: Amazon EC2 instances used in this blog
Instance Type Processor No. of Physical Cores (per instance) Memory (GiB) EFA Network Bandwidth (Gbps)
Hpc6a.48xlarge AMD EPYC Milan 96 384 100

Creating an HPC Cluster on AWS and Key Performance Elements

All these benchmarks ran on AWS ParallelCluster, an AWS-supported open-source cluster orchestration tool (version 3.1.1 was used for this post). Along with the EC2 instance types we mentioned, we also used Amazon FSx for Lustre, which is a fully-managed high-performance Lustre file system offering up to hundreds of GB/s of throughput and sub-millisecond latencies for improved I/O performance.

We ran the tests with simultaneous multithreading disabled on the instance. You can see all the details of the solution components, along with step-by-step instructions to recreate this environment yourself, by going to our NWP Workshop.

There are two additional important components in the workshop that help you quickly create the HPC cluster and manage your application codes once it’s built: PCluster Manager and Spack.

PCluster Manager

To create clusters, view jobs, and access your infrastructure, we use PCluster Manager, which is a web UI for interacting with AWS ParallelCluster. This simplifies tasks such as mounting a pre-existing file system, debugging cluster failures, or connecting to the cluster. The UI is built using the AWS ParallelCluster 3 API, and utilizes a low-cost serverless architecture.

In the NWP workshop, we provide a template that plugs into PCluster Manager. This template builds a cluster with best practices in mind and optimizes the compute, networking, and file system for NWP workloads. We then connect to the cluster through AWS Systems Manager Session Manager, which provides browser-based interactive shell access to your cluster without opening an inbound SSH port. After running the job, we visualize the result using a remote graphics desktop using NICE DCV and NCL.

At the end of the workflow, we delete the cluster to cost-effectively manage our usage. All these steps are done through the PCluster Manager GUI. It’s a convenient way to manage and access your clusters, and it gives you greater visibility into what the cluster is doing without incurring any unnecessary AWS costs.

Spack

To manage the installation of different NWP codes, we use a package manager called Spack. Spack is a package manager built for HPC workflows. Users can highly customize their software install with Spack. For example, the NWP workshop calls for WRF 4.3.3 and compiles it with the Intel compiler, Intel MPI, and external libfabric (which supports EFA). You can specify all these dependencies in single Spack command:

spack install wrf@4.3.3%intel build_type=dm+sm ^intel-oneapi-mpi+external-libfabric

To speed up install times for NWP codes, we provided a Spack binary cache for WRF, MPAS, and FV3GFS. This binary cache has pre-built binaries, currently optimized for Amazon EC2 Hpc6a instances. Spack is a convenient way to install complex HPC codes along with their dependencies, and this approach accelerates the timeline of code installations down from a few days to a few hours.

Scaling Up

It’s time to look at the scale-up performance and cost results across WRF, MPAS, and FV3GFS. We use two metrics in the charts, Simulation Speed and Cost per Simulation, and we define those as follows:

  • Simulation Speed = Forecast Time (sec) / Wall-clock Time (Compute + File I/O) (sec)
  • Cost Per Simulation ($) = Wall-clock Time * EC2 On-Demand Compute Cost (us-east-2 pricing) * number of instances.

Note that the Cost per Simulation does not include additional services, such as Amazon Elastic Block Storage (Amazon EBS) and FSx for Lustre.

Weather Research and Forecasting (WRF)

WRF is an NWP system designed to serve both research and operational forecasting needs. The WRF model serves a wide range of meteorological applications across scales from meters to thousands of kilometers. WRF is one of the most widely used NWP models in academia and industry with over 48,000 registered in 160 countries.

The benchmark case used for this blog is the University Corporation for Atmospheric Research (UCAR) CONUS 2.5km dataset for WRFv4. We used the first 3 hours of this 6-hour, 2.5 km case covering the Continental U.S (CONUS) domain from November 2019 with a 15-second time step and total of around 90M grid points. Note that in the past, WRFv3 has commonly been benchmarked using a similar CONUS 2.5 km dataset, however, WRFv3 benchmarks are not compatible with WRFv4.

When deciding on the best instance type to use for tightly-coupled workloads like WRF, there are several factors to consider like time to solution, cost per simulation, or both. Scale-up results across 1 to 128 instances are shown in Figure 1. Note that these WRF results denote compute times (not wall-clock times), since we did not optimize for file I/O in these runs.

Based on the WRF conus 2.5km results, the simulation scales well out to 128 instances. However, the lowest cost for the simulation occurs when 32 instances are used and this also leads to the best cost vs performance as shown in Figure 1c.

Figure 1: Performance of WRF v4.3.3 using the CONUS 2.5km model- (a) simulation speed, (b) cost per simulation, (c) cost vs simulation speed

Figure 1: Performance of WRF v4.3.3 using the CONUS 2.5km model- (a) simulation speed, (b) cost per simulation, (c) cost vs simulation speed

Model for Prediction Across Scales (MPAS)

The Model for Prediction Across Scale (MPAS) Atmosphere model is a mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. It is developed and maintained by National Center for Atmospheric Research (NCAR).

We tested the performance and scaling of MPAS in collaboration with DTN using the Hurricane Laura model, a deadly and destructive Category 4 hurricane that made landfall in Louisiana on August 29, 2020. The model resolution is 15km-Global with refined 3-km grid/mesh over the Gulf of Mexico. As seen in Figure 2, we ran scale-up tests across 32 to 128 instances, measuring total wall-clock times (compute + file I/O) over a 6hr forecast. Performance scales linearly as we scaled out to 128 instances, with a minimum simulation cost when using 64 instances.

 

Figure2: Performance of MPAS-A v7.1 using the Hurricane Laura model- (a) simulation speed, (b) cost per simulation, (c) cost vs simulation speed.

Figure2: Performance of MPAS-A v7.1 using the Hurricane Laura model- (a) simulation speed, (b) cost per simulation, (c) cost vs simulation speed.

Finite Volume Cubed Global Forecasting System (FV3GFS)

In this section we will show price-performance results for the Unified Forecast System (UFS) atmospheric model FV3GFS. The UFS is a community-based, coupled, comprehensive Earth modeling system. The UFS numerical applications span local to global domains and predictive time-scales from sub-hourly analyses to seasonal predictions. It is designed to support the National Weather Service’s (NWS) Weather Enterprise and to be the source system for NOAA‘s operational numerical weather prediction applications. The Unified Forecast System (UFS) Weather Model (WM) is a prognostic model that can be used for short- and medium-range research and operational forecasts, as exemplified by its use in the operational Global Forecast System (GFS) of the National Oceanic and Atmospheric Administration (NOAA).

The component models currently used in the UFS are the Global Forecast System 15 (GFSv15) atmosphere, the Modular Ocean Model 6 (MOM6), the WAVEWATCH III wave model, the Los Alamos sea ice model 5 (CICE5), the Noah and Noah-MP land models, the Goddard Chemistry Aerosol Radiation and Transport (GOCART) aerosol model, the Ionosphere-Plasmasphere Electrodynamics (IPE) model, and the Advanced Circulation (ADCIRC) model for storm surge, tides, and coastal circulation.

We tested the performance and scaling of FV3GFS with a 10-day global forecast with a resolution of approximately 13 km (C768). As shown in Figure 3, we ran scale-up tests across 9 to 144 instances measuring total wall-clock times (compute + file I/O). We see the performance scales linearly to 64 instances then tapers off as we go out to 144 instances.

Figure 3: Performance of UFS v2.0.0 using the C768 (13km) model- (a) simulation speed, (b) cost per simulation, (c) cost vs simulation speed.

Figure 3: Performance of UFS v2.0.0 using the C768 (13km) model- (a) simulation speed, (b) cost per simulation, (c) cost vs simulation speed.

Summary

In this post, we provided an overview of running NWP workloads on AWS and showed the scale-up performance and cost for running three popular NWP codes: WRF, MPAS, and FV3GFS.

We’ll keep evaluating the performance of NWP codes on AWS, because we know that software changes and Amazon EC2 is not standing still. Watch out in the future for further posts about running weather and climate codes on AWS, where we plan to cover topics like containerization and serverless architectures for these workflows.

If you are interested in getting started with running NWP workloads on AWS, head over to our NWP Workshop and try it yourself. For detailed HPC-specific best practices, based on the five pillars of the AWS Well-Architected Framework, download the whitepaper, High Performance Computing Lens.