Modeling clouds in the cloud for air pollution planning: 3 tips from LADCO on using HPC
In the spring of 2019, environmental modelers at the Lake Michigan Air Directors Consortium (LADCO) had a new problem to solve. Emerging research on air pollution along the shores of the Great Lakes in the United States showed that to properly simulate the pollution episodes in the region we needed to apply our models at a finer spatial granularity than the computational capacity of our in-house high performance computing (HPC) cluster could handle. The LADCO modelers turned to AWS ParallelCluster to access the HPC resources needed to do this modeling faster and scale for our member states.
LADCO provides technical assistance to the states in the Great Lakes region on problems of urban to regional-scale air quality. We use complex computer models of weather, emissions, and atmospheric chemistry to investigate the main drivers of air pollution in the region. We work with our member states—Illinois, Indiana, Michigan, Minnesota, Ohio and Wisconsin—to use the models for exploring air pollution control programs pursuant to their clean air goals.
We entered into the work with AWS with a goal to create a model that could be used by all member states, but no clear path on how to meet that goal. We also had few constraints. Our infrastructure needed to be cost-effective, reliable, and secure. So we got to work developing and testing modeling platforms on the Amazon Elastic Compute Cloud (Amazon EC2).
After extensive prototyping and testing a weather modeling platform using Amazon EC2 Spot Instances, we configured the system to simulate atmospheric chemistry. LADCO then ran our first operational application on AWS for modeling ground level ozone in Chicago. We used the results of this simulation to support an Environmental Protection Agency (EPA) regulatory process called a State Implementation Plan (SIP).
The pcluster-Spot Instance combination is now a demonstrated winner for us and it provides massive potential for running computationally intensive simulations.
We originally configured Message Passing Interface (MPI) clusters of eight c4.2xlarge On-Demand Instances (32 virtual CPUs/instance) and reduced the simulation time relative to our local compute server by over 90%. To optimize our costs, we ran the model on C4 instances though Spot Instances which offered us 75% cost savings over the On-Demand Instances, but took a bit longer to complete because the jobs had to occasionally wait in the queue for spot instances to become available.
3 tips for using HPC resources
Based on our experience with weather and air quality modeling on AWS, here are three tips for flattening the learning curve for getting operational with HPC resources:
1. Use AWS ParallelCluster. Configuring and deploying custom HPC clusters is quick and simple with AWS ParallelCluster. It took a few weeks of experimenting with the AWS ParallelCluster configuration file to learn how to launch a new cluster, attach storage devices, and select the best instances for our jobs. Now it takes only minutes on AWS compared to days on our local servers to bring up a new cluster that has a fully functional modeling platform.
2. Use Amazon EBS Snapshots. Once we had an operational modeling system that included all of our codes, libraries, scripts, and executables on an Amazon Elastic Block Store (Amazon EBS) volume, we created a snapshot of that volume. Starting a new volume from the snapshot and attaching the volume to the instances we launched through AWS ParallelCluster allows us to have an operational modeling system up and running in minutes with no need to go through a software installation and compilation process. This approach can also work for input data volumes, but make sure to monitor the snapshot storage costs.
3. Rethink data management. LADCO’s traditional data management paradigm was to save everything and to buy more disks. As AWS has different costs for different types of data storage hardware, you can take advantage of inexpensive, long-term storage solution options in the cloud. For example, after we complete a modeling run, we post-process the results and move the raw model outputs to one of the less expensive Amazon Simple Storage Solution (Amazon S3) buckets. All of this workflow is automated through a combination of traditional modeling scripts and the AWS Command Line Interface (CLI).
LADCO’s experience with AWS HPC resources has been overwhelmingly positive. The flexibility and elasticity provided by the AWS Cloud to provision computing and storage, modify existing operational resources, optimize cost performance, and track costs in real time lead to a paradigm shift in how we plan and develop modeling applications. We are no longer planning around constraints or bottlenecks on our local cluster. The AWS ParallelCluster capability to customize clusters and deploy inexpensive, compute nodes on Spot Instances only when we need them meets the computing needs of our organization. The AWS ParallelCluster-Spot Instance combination is now a demonstrated winner for us and it provides massive potential for running computationally intensive simulations.
LADCO is now working with state air pollution planning agencies in our region and across the country, and with U.S. EPA, to help integrate similar cloud computing solutions into their own environmental modeling work.