AWS Public Sector Blog

What if swapping your weather model was boring? How dynamical.org is making AI weather forecasting accessible on AWS

What if swapping your weather model was boring? How dynamical.org is making AI weather forecasting accessible on AWS

As part of the Registry of Open Data on AWS, AWS invited Marshall Moutenot of dynamical.org to share how their team is making operational weather forecasts easier to use for researchers, developers, and downstream applications.

Weather data can be hard to use

I’ve spent the last 10 years forecasting rivers. As you can imagine, weather (past, present, and future) played an outsized role in how much water is going to flow through a river tomorrow. It also represented an outsized portion of the infrastructure our engineering team had to build.

To train our deep learning foundational hydrology models, we needed to process and store not just the latest forecast, but a vast historical archive in our own cloud. Integrating a new weather model required a big processing effort, which in some cases would take months of arduous backfill (tape data access is slow!).

Integrating new weather models was a primary driver of performance improvements, especially as we expanded globally—but the ingestion effort for each new model was painful enough that it was sometimes hard to justify before we knew the benefits.

With the relatively recent proliferation of AI weather forecasts, the urgency to speed up experimentation was high. That’s when the idea took shape.

In this post, I share how dynamical.org is making weather data products, including AI weather forecasts, accessible on AWS.

It doesn’t have to be hard

We weren’t alone. Talking to organizations ranging from peer startups to giant energy utilities, we realized the challenge of integrating new weather models was widespread. Ease of access was a bottleneck. If we were going to solve it for ourselves, we might as well solve it for everyone.

So we started dynamical.org, a not-for-profit with the mission to advance humanity’s ability to access, understand, and act on accurate weather and climate data. The most useful weather datasets in the world should be as simple to open as any other modern dataset.

No data format spelunking, no renaming files, no parameter name bingo, no reprojecting, and most importantly: no more manually creating a separate ingest pipeline for every model you want to try.

The catalog

dynamical.org hosts a set of the world’s most widely used operational weather forecast models on AWS, each in the same cloud-optimized Icechunk 2.0 Zarr format on Amazon Simple Storage Service (Amazon S3) in the Registry of Open Data on AWS:

Each of these comes from a different modeling center, with its own metadata conventions, variable naming, release cadence, and (in the case of ECMWF) its own opinions about European geography. When we re-host them, we normalize: same chunking, same variable names where possible, same coordinate conventions, same access pattern. We create full-history analysis versions of the forecast archives. The goal is that once you have learned how to read one of our datasets, you have learned how to read all of them.

AI weather with a few keystrokes

The latest addition to our catalog, now available on the Registry of Open Data on AWS, marks an exciting milestone. Our existing suite formed the foundation of the most popular traditional numerical weather prediction models. ECMWF’s Artificial Intelligence Forecasting System (AIFS) Single is our first AI weather model—which means, for the first time, you can line up physics-based forecasting with AI-based forecasting side by side, with just a few keystrokes.

The proliferation of AI weather models is, in our opinion, one of the most exciting advancements in weather forecasting of the last decade. Rather than numerically integrating the equations of atmospheric motion on a supercomputer for several hours, AIFS produces a full global forecast in minutes on a single GPU using a neural network trained on decades of reanalysis data. It is competitive with (and on some metrics, frequently ahead of!) the best operational physics-based models.

In the dynamical.org catalog, the AIFS Single dataset looks like the following table.

Spatial Domain Global
Spatial Resolution 0.25° (~28 km)
Time Domain Forecasts initialized from ECMWF operational runs to present
Forecast Domain 0–15 days
Forecast Resolution 6-hourly
Format Icechunk 2 Zarr

And in code, it looks like the following example:

import dynamical_catalog

# Physics-based forecast
gfs = dynamical_catalog.open("noaa-gfs-forecast")

# AI-based forecast
aifs = ds = dynamical_catalog.open("ecmwf-aifs-single-forecast")

Boring, right? Well, that’s the point. We wanted the implementation details of ingesting and aligning the data products to vanish into the proverbial abstraction that “just works.”

Get your hands dirty

To make all of the above concrete, we’ve published a Python notebook that walks through opening different weather models and comparing their outputs.

You can open it in Amazon SageMaker AI, or in Amazon SageMaker Studio Lab if you don’t have an AWS account. We’ve also listed it on the dataset’s Registry of Open Data on AWS page under Tutorials.

In the notebook, we compute heating degree days at Nashville International Airport from both NOAA GFS and ECMWF AIFS, then compare both against hourly ASOS observations. The downstream analysis code is the same in both cases! Swapping in the AI model becomes trivial.

Charts showing 1-day, 3-day, and 5-day lead forecasts

Figure 1: Charts showing 1-day, 3-day, and 5-day lead forecasts comparing GFS and AIFS models against observed heating degree days

Start building

The full catalog (GFS, GEFS, HRRR, IFS ENS, MRMS, and now AIFS Single) is available on the Registry of Open Data on AWS through the AWS Open Data Sponsorship Program.

You can read directly from computer environments that can talk to Amazon S3, from SageMaker, to your laptop, to your cluster.

If you build something interesting, we’d love to hear about it—reach out at feedback@dynamical.org.

Thanks to Chris Stoner and the AWS Open Data team for the sponsorship and collaboration, and to the contributors and supporters of dynamical.org!

Enjoy the weather.

—MM, co-founder, dynamical.org

Marshall Moutenot

Marshall Moutenot

Marshall Moutenot is the co-founder of dynamical.org and CEO of Upstream Tech. Marshall likes computers, computing, and making things that feel a little magical.