AWS Public Sector Blog

The ERA5 Reanalysis Dataset Provides a Sharper View on Past Weather

Reanalysis is the term for using modern-day technology to analyze weather data from the past. By doing so, meteorologists and climatologists can produce a more accurate analysis of previous weather conditions, which is important for climate change research.

The European Centre of Medium Range Forecasts (ECMWF) is producing its latest reanalysis dataset, called ERA5. Recently, Chris Kalima and his team at Intertrust, in conjunction with the AWS Public Datasets Program, have been working to bring the ERA5 data to AWS.

Chris is the Director of Product Management for Intertrust’s secure data platform and is directly involved with the development of the Planet OS Datahub. Planet OS provides easy access to earth science data, particularly for application developers and solution providers who may not be comfortable with scientific data formats and protocols.

Read on to understand more about reanalysis, why ERA5 is special, and what it can be used for.

What is the ERA5 dataset and why is it special?

ERA5 is the latest generation of ECMWF atmospheric reanalysis of the global climate, and the first reanalysis produced as an operational service. The dataset is produced using ECMWF’s Integrated Forecast System (IFS) numerical model, the same model used to produce their global weather forecast.

Reanalysis datasets are important because they assimilate vast amounts of observational data with computer models to create a detailed and consistent overview of weather and climate conditions. This is valuable in areas where observational data is either sparse or non-existent, such as the oceans and polar regions. ERA5 is particularly exciting because of its improved model accuracy and higher geographical and temporal resolution than its predecessors.

Where is the data available on AWS?

An initial tranche of ERA5 data has been published to Amazon S3. This data has been acquired from the ECMWF Meteorological Archival and Retrieval System (MARS) in GRIB format and transcoded into NetCDF files, each containing one month of data for one variable.

The S3 bucket currently contains a curated set of 18 widely used surface and single-level parameters from the ERA5 High Resolution Realisation (HRES) sub-daily forecast stream. This includes 10-meter and 100-meter wind components, 2-meter temperature, total precipitation, significant wave height, and downwards solar radiation.

More information on accessing the data, additional resources and details on the available variables and data structure are available from the Registry of Open Data on AWS.

How can I use this data and what can it be used for?

Because reanalysis datasets provide comprehensive, gridded estimates of atmospheric conditions at consistent spatial resolution and temporal frequency, they are widely used for meteorological and climatological studies. They are most valuable as a proxy for observational data, particularly for long-term studies where such data is limited or unavailable.

We’ve published a sample Jupyter notebook on GitHub to help you get started using ERA5 data on Amazon S3. The notebook examines how the data is organized, shows how to download files in NetCDF format, and demonstrates some basic data analysis.

An additional Jupyter notebook that uses ERA5 data to analyze potential wind energy production is also available in GitHub. This example uses the Planet OS Datahub API to acquire a subset of the ERA5 data, however it can easily be adapted to use Amazon S3 instead.

If you’re not familiar with the NetCDF file format, we recommend reviewing Unidata’s Introduction to NetCDF. Unidata also provides a list of software tools for manipulating and displaying NetCDF data that is quite comprehensive. For quick visualization of NetCDF data, we recommend NASA’s Panoply data viewer.

If you’re looking to access specific subsets of the ERA5 data, it is also available on the Planet OS Datahub, which provides additional query options and response formats. This interface may be more suitable for those performing coordinate-based analyses or for those who are more familiar with ASCII data formats, such as CSV and JSON.

Can you describe the solution reached and the AWS technology used?

We’re using Amazon EC2 instances to run our data extraction and transformation pipeline, which acquires the data from ECMWF and transcodes it into the variable-based NetCDF granules. These granules are then written to the Amazon S3 bucket where they’re publicly accessible.

What is next?

Intertrust will continuously update the Amazon S3 bucket with new ERA5 data as it is released and plans to support the full temporal extent once published by ECMWF.

We also love feedback, and encourage ERA5 users to reach out with their comments or suggestions. If there are additional variables that you’d like to see made available on Amazon S3, please email datahub@intertrust.com and let us know!