Making weather forecasts more accessible using serverless infrastructure and open data on AWS

As part of the Registry of Open Data on AWS, Amazon Web Services (AWS) invited Alexander Rey, creator of Pirate Weather, to share how AWS technologies and open data are supporting his efforts to provide a no cost and open weather forecast API.

Weather is everywhere

For everything from farming to picnic planning, weather data is valuable for a wide range of applications. Weather forecasts are central to making informed decisions as extreme weather events become more common.

Where do weather forecasts come from? While there are many data sources available, forecasts primarily rely on the same government-run models to determine the forecasts. Similar to how cities keep track of roads and addresses and share the data with companies to make usable maps, agencies like the National Oceanic and Atmospheric Administration (NOAA) run computer simulations to predict the weather, providing the raw data that drives most weather services.

I created Pirate Weather, an open weather API, which is built using Amazon Web Services (AWS) serverless tools and the Registry of Open Data on AWS to move weather forecasts from weather models to a no-cost, open, and documented API.

Starting from the source: Open data on AWS

Since a weather API needs source data, finding forecast data was the first step. The Registry of Open Data on AWS, in collaboration with the NOAA Open Data Dissemination (NODD) Program, makes most of NOAA’s forecast models available at no cost on AWS. It was simple to find out what was available and integrate it into a serverless architecture. To create a comprehensive, global, and high-resolution weather API, the solution uses four different datasets: the NOAA High-Resolution Rapid Refresh (HRRR) model and datasets from NOAA Global Forecast System (GFS), NOAA Global Ensemble Forecast System (GEFS), and European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 Reanalysis (provided by Intertrust) datasets. Each dataset provides different parameters for different areas, and together they cover everything that is required for a comprehensive forecast.

The HRRR model provides high-resolution (3 km) forecasts every hour for a 48-hour period over the US and southern Canada, offering detailed short term weather data. This dataset includes most of the important variables in a weather forecast, like temperature, humidity, and wind speed. Since it provides results in 15-minute intervals, minute-by-minute forecast of precipitation (rainfall or snowfall) can be estimated, which is useful to track weather like fast-approaching storms. Outside of the US and Canada, or for longer-range forecasts, the GFS model provides the same comprehensive range of parameters, but at a lower resolution. Some specialized parameters (such as the UV index) are also only available as part of the GFS model, so these are used globally.

One very important, but challenging to calculate, forecast parameter is the Probability of Precipitation, or POP. This number describes the chance that it will rain or snow; however, since the HRRR and GFS models only run one simulation, a statistical estimate of precipitation likelihood cannot be calculated. The GEFS model addresses this issue, since this model runs 21 simulations at the same time and can produce a probability that it will rain or snow at a given location, depending on how many of its simulations show precipitation.

Finally, accessing historic weather is an important part of many weather APIs. However, since the NOAA models are designed for weather forecasts, they do not capture what the weather actually was. Answering these questions needs another type of model, called a reanalysis, which simulates what happened in the past. This is provided on Registry of Open Data by the ERA5 dataset. For more details on the datasets used in this service, check out the Data Sources section of the Pirate Weather Documentation.

Figure 1: Pirate Weather architecture.

Data ingest and processing

Most of the NOAA datasets are stored on Amazon Simple Storage Service (Amazon S3) in an archival format called GRIB, which is useful for storing model results, but it can be a challenge to access. To address this, Pirate Weather uses a combination of AWS serverless tools to read, process, and save these files every hour, when new forecasts become available:

Amazon EventBridge triggers an AWS Step Function to start the processing pipeline.
This pulls a public docker image from the Amazon Elastic Container Registry (Amazon ECR) containing Python and wgrib2, a tool for processing GRIB files.
This image then starts on Amazon Elastic Container Service (Amazon ECS) using AWS Fargate, which copies the model results from their public Amazon S3 bucket and runs an ingest script.
The script extracts the important variables, performs some processing to correct wind directions, and saves them as a chunked NetCDF file on AWS Elastic File System (Amazon EFS).

This pipeline quickly and automatically takes NOAA’s model results and turns them into simple-to-access files. AWS Step Functions provides orchestration, restarting the pipeline if it crashes, letting multiple copies run at the same time, and eliminating the need to manage servers. For more information on this workflow, check out the data pipeline section of the Pirate Weather Documentation.

To try this process out, an Amazon SageMaker Studio Lab script is available from GitHub. This Jupyter Notebook follows the data processing steps to go from GRIB files to chunked NetCDF files for fast data retrieval.

Models in, forecasts out

With the model results processed and saved on Amazon Elastic File System (Amazon EFS), the Pirate Weather open weather forecast API is now available. When a forecast is requested, Amazon API Gateway receives the request, checks that the requester hasn’t exceeded their quota for the month, and then starts an AWS Lambda Python function. This snippet of code quickly reads the model results at the nearest point from Amazon EFS, calculates a few additional parameters (such as wind chill and humidex), formats the data into a (Dark Sky–compatible) JSON structure, and passes it back to the API Gateway. This reply is then sent back to requester, providing an hourly seven-day forecast in just under a second.

While there are many steps to make this happen, the result is that the unique combination of the Registry of Open Data on AWS and AWS serverless tools makes everything this possible. This pairing allows large amounts of data to be provided in a simple to use format ready for development or research.

The datasets used by this project can be found in the Registry of Open Data on AWS which contains over 100 PB of data for you to use in your applications.

Learn more about open data on AWS and how you can get started with serverless technology on AWS.

AWS Public Sector Blog

Making weather forecasts more accessible using serverless infrastructure and open data on AWS

Weather is everywhere

Starting from the source: Open data on AWS

Data ingest and processing

Models in, forecasts out

Read more about open data on the AWS Public Sector Blog:

Resources

Follow