AWS Public Sector Blog
24 new or updated datasets available on the Registry of Open Data on AWS

The Amazon Web Services (AWS) Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on AWS. AWS works with data providers to democratize access to data by making it available to the public.
People can use it for analyzing data on AWS or developing new cloud-based techniques, formats, and tools that lower the cost of working with data. This is to encourage the development of communities that benefit from access to shared datasets.
Through the AWS Open Data Sponsorship Program, customers are making over 300 petabytes of high-value, cloud-optimized data available for public use.
All publicly available datasets can be found in the Registry of Open Data on AWS and are also discoverable on AWS Data Exchange. This quarter, AWS released 24 new or updated datasets.
What are people doing with the Registry of Open Data on AWS?
Organizations are using the Registry of Open Data on AWS in many different ways, including:
- Purdue University democratizes geospatial data through the AWS Open Data Sponsorship Program with Purdue University’s Data to Science Initiative (D2S). With this program, researchers across disciplines can share and access a unified collection of geospatial datasets from around the world. AWS recently participated in and helped sponsor Purdue Geographic Information Systems (GIS) Day 2025: Unlocking GeoAI Data and Tools, where we presented to faculty, students, and researchers about the value of cloud technology in the geospatial space.
- Scientists Map Aging Brain in Unprecedented Detail, Revealing Clues to Alzheimer’s and More using the Registry of Open Data on AWS. Hosting the dataset on the Registry of Open Data on AWS makes it widely accessible while removing the heavy computational barriers typically required to handle large biological datasets. Having nearly 900,000 spatially mapped cells in the cloud means scientists around the world can explore the data without needing specialized infrastructure.
- Arnis, an open source tool, transforms real-world locations into playable Minecraft worlds by processing geospatial data hosted on AWS. By migrating to Terrain Tiles—a global elevation dataset on the Registry of Open Data on AWS—Arnis eliminated data retrieval costs while serving nearly 300,000 users.
- Columbia University’s Learning the Earth with Artificial Intelligence and Physics (LEAP) and the U.S. National Science Foundation (NSF) Science and Technology Center collaborated with AWS to build AutoClimDS, an agentic AI system that researchers with no specialized coding expertise can use to conduct climate data science workflows using natural language.
OpenFold3 Training Data from OpenFold Consortium
The OpenFold Consortium announced a major OpenFold3 update as well as the public release of training datasets and full-stack tooling for reproducible biomolecular AI. OpenFold3 is an open source deep learning system for cofolding that predicts the 3D structures of biomolecular complexes from sequence and molecular inputs, including proteins interacting with small molecules and nucleic acids. OpenFold3 enables structure prediction for biomolecular complexes relevant to drug discovery, protein engineering, and basic research, supporting both evaluation workflows and downstream method development.
With this update, OpenFold3 is available as an end-to-end open cofolding stack, including training datasets, model weights, training and inference code, and evaluation scripts released under permissive licenses. This full-stack release facilitates independent reproduction of reported results, rigorous benchmarking, and extension through fine-tuning and method development, which are difficult capabilities to achieve with closed or inference-only systems.
The OpenFold3 dataset joins 23 other new or updated datasets on the Registry of Open Data on AWS in the following categories:
Climate and weather
- Global Cache of Japan from Japan Meteorological Agency
- Met Office UK Land Surface Observations from Met Office
- NOAA GEFS – dynamical.org Icechunk Zarr from National Oceanic Atmosphere Administration (NOAA)
- NOAA MRMS – dynamical.org Icechunk Zarr from NOAA
- NOAA S-104 Water Level Data from NOAA
- Met Office UK Marine Observations from Met Office
- NEXRAD on AWS from Unidata
- NEXRAD ARCO – Analysis-Ready Cloud-Optimized Weather Radar from Atmoscale
- ECMWF AIFS Single – dynamical.org Icechunk Zarr from dynamical.org
Geospatial
- Data to Science Catalog from Geospatial Data Science Lab at Purdue University
- LGND Clay v1.5 Sentinel-2 from Source Cooperative
- Version 2 High Resolution Canopy Height Maps from Meta
- CANOE (Canadian Aquatic Navigation for Observation of the Environment) Dataset from Autonomous Space Robotics Laboratory (ASRL)
- FoMo – A Multi-Season Dataset for Robot Navigation in Forêt Montmorency from Norlab, Université Laval
- Kepler Mission Data from Space Telescope Science Institute
- NOAA JISAO’s Seasonal Coastal Ocean Prediction of the Ecosystem (J-SCOPE) from NOAA
Life sciences
- NHGRI AnVIL Project from the AnVIL project
- OpenFold3 Training Data from OpenFold Consortium
- run_dbcan CAZyme and CGC annotation database on AWS from run_dbCAN
- Somatic Mosaicism across Human Tissues (SMaHT) from SMaHT Data Analysis Center (DAC)
- Epilepsy.Science from University of Pennsylvania
- OpenTargets from OpenTargets platform
- RNA structure by fragmentation frequency from The Genome Institute of Singapore and UMass Chan Medical School’s RNA Therapeutics Institute
AI/ML
How can you make your data available?
The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:
- Democratize access to data by making it available for analysis on AWS
- Develop new cloud-based techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Learn how to propose your dataset to the AWS Open Data Sponsorship Program.