AWS Public Sector Blog

Now available: CMIP6 dataset to foster climate innovation and study the impact of future climate conditions

Today, Amazon announced that it is now hosting petabytes of data from the largest and most updated climate simulation dataset in the world. Through two Cloud grants from the Amazon Sustainability Data Initiative (ASDI) to the Earth System Grid Federation (ESGF), Amazon is enabling climate researchers worldwide to access and analyze the dataset used for the United Nation’s Intergovernmental Panel on Climate Change’s Sixth Assessment Report (IPCC-AR6) on the Amazon Web Services (AWS) Cloud. The report—scheduled to be published in May 2022—provides policymakers worldwide with the latest assessment of the scientific basis of climate change, its impacts and future risks, and options for adaptation and mitigation.

The climate simulation dataset, also known as the Coupled Model Intercomparison Project Phase 6 (CMIP6) data archive, traditionally hosted and distributed through the ESGF servers, aggregates the climate models created across approximately 30 working groups and 1,000 researchers working on IPCC-AR6. A portion of the CMIP6 data archive is now hosted in the first cloud-resident ESGF data node available to climate researchers. The initial subset of data represents CMIP6 model runs that climate scientists consider high value. By hosting the climate simulation dataset in the cloud, researchers can carry out their analysis in the AWS Cloud, eliminating the need for downloading and storing data replicas. Given the large size of these datasets, providing access to compute next to the data in the cloud helps democratize access and use of this data.

The ESGF data node on AWS allows users to search and discover the CMIP6 data right on the cloud, and to facilitate analysis of the data the ESGF team is making available sample Jupyter notebooks. A few groups have already benefited from this improved accessibility. “The AWS CMIP6 data has made it possible to scale up a novel experimental machine learning analysis from one model to the whole CMIP6 archive,“ said Redouane Lguensat, researcher at Laboratoire des Sciences de Climat et Environnement (LSCE), in Saclay, France. “Combined with data-cataloging libraries such as ‘intake-esm’, the development process became very efficient and allowed us to spend more time on the scientific problem than on data preparation issues.”

Through this collaboration between ASDI and ESGF, CMIP6 data staged on AWS is traceable to the reference ESGF datasets. In addition to providing access to the CMIP6 data in the native NetCDF format, in partnership with the Pangeo initiative members Columbia University, NCAR, and the Rhodium Group, AWS is also hosting the dataset in Zarr format so that researchers can expedite and use the data in the AWS Cloud. ESGF, Pangeo and AWS will continue to work with the climate community of users to identify additional relevant CMIP6 data layers and stage them on AWS.

In addition to supporting climate scientists, hosting the CMIP6 data on AWS is also enabling a number of private sector companies build products and services that help assess climate related risks and develop climate resilience. Climate data poses processing challenges due to the raw file size of climate model outputs, where a single file can be hundreds of megabytes or more, and an entire dataset can be anywhere from tens of terabytes to multiple petabytes. On-demand assessments of climate risks requires rapid access to data and a cloud infrastructure to process this data and deliver on-demand results.” Accessing CMIP6 through AWS has been critical to our ability to quickly test and better understand the latest climate data. Having this data on AWS allows us to efficiently pipe the updates through to our climate risk tools,” said Nik Steinberg, managing director and head of research, Four Twenty Seven (Moody’s).

According to Truman Semans, executive director for OS-Climate, democratizing access to this dataset will support sustainable investing. “Open access to CMIP6 data in the AWS cloud will enable the development of new and better climate data and analytic tools to help us understand how future climate conditions will impact the assets and operations of companies and governments. This is crucial for informing decisions about investment and scaling up reallocation of capital to mitigation and resilience, to close the $1.2 trillion gap in finding to meet Paris goals. OS-Climate, as an open source platform for climate risk assessment, [and pension funds, asset managers, and banks] will greatly benefit from the knowledge derived from this data.”

ASDI seeks to foster sustainability innovation and problem solving by promoting cloud literacy and leveraging the AWS Cloud to facilitate access and analyses of key data. This global program invites industry leaders from academia to the government, and both the nonprofit and for-profit worlds, to participate in cloud-based experimentation to generate sustainability insights.

Researchers wishing to use the CMIP6 data on AWS can also apply for cloud grants through the AWS Promotional Credit Program. Through this program, sustainability researchers, developers, and decision-makers can harness AWS technology to develop solutions for their biggest data-related sustainability challenges. Specifically, we encourage applications by those enabling a rich ecosystem of open source tooling for sustainability work, innovative methods for sustainability-focused solutions, transitioning sustainability on-premise workloads to the AWS Cloud, and cloud-optimized data formats for sustainability-relevant data.

View the CMIP6 dataset and learn more about ASDI.