AWS Public Sector Blog
32 new or updated datasets available on the Registry of Open Data on AWS
The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). AWS works with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-based techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through the AWS Open Data Sponsorship Program, customers are making over 300 PB of high-value, cloud-optimized data available for public use.
All publicly available datasets can be found in the Registry of Open Data on AWS and are now also discoverable on AWS Data Exchange. This quarter, AWS released 32 new or updated datasets.
What are people currently doing with AWS Open Data?
- AI3 Protein-Ligand Binding Affinity Dataset is now available on AWS as part of the Registry of Open Data on AWS. This dataset features molecular dynamics (MD) trajectories for over 16,000 protein-ligand complexes (PLCs). This represents a valuable resource for research at the intersection of machine learning and structural biology. This dataset can be applied to addressing essential challenges in modern drug design— potentially aiding the discovery of new therapeutics.
- The AWS Open Data team partnered with NVIDIA to host an Open Data knowledge graph hackathon October 1-3, 2025. This was 3-day hackathon that brought together seven teams of 53 researchers. Projects leveraged AWS services including Amazon Neptune for graph database management, Open Data on AWS for accessing public datasets, NVIDIA resources for PyTorch Geometric (PyG) RAG, and various compute and machine learning services to build end-to-end solutions.
- Brightband is democratizing advanced ML weather forecasting by building artificial intelligence (AI) powered weather forecasting tools with Open Data on AWS. Brightband is at the forefront of a new era in weather forecasting, developing accessible AI-powered tools designed to help humanity adapt to increasingly extreme weather conditions. The startup recently won the 2024 Compute for Climate Fellowship, a global funding initiative created by the International Research Centre on Artificial Intelligence (IRCAI) under the auspices of UNESCO and AWS.
- Ocean Biodiversity Information System (OBIS) has released new data products in Parquet format, including the species grids, and the full OBIS occurrence dataset, which is now widely accessible through the the Registry of Open Data on AWS. The full OBIS occurrence dataset provides the most comprehensive view of marine biodiversity available through OBIS and now includes sequence records as well as the full set of variables from the Extended Measurement or Fact (eMoF) extension, such as sampling effort, environmental variables, and biological traits.
What will you build with these datasets?
RSNA Intracranial Aneurysm Detection Dataset
We are excited to announce the release of the Radiological Society of North America Intracranial Aneurysm Detection (RSNA-ICA) dataset. The dataset is a collection of over 4,000 CT brain scans annotated by a cohort of over 40 volunteer radiologists from RSNA and the American Society of Neuroradiology to show the presence and location of intracranial aneurysms. It also includes a set of about 200 imaging studies that are annotated with AI-generated segmentations highlighting abnormalities. The imaging data was provided by 18 institutions. Initially compiled in 2025 for the RSNA Intracranial Aneurysm Detection AI Challenge hosted on Kaggle competition platform, it represents the largest publicly available collection of its kind.
RSNA Intracranial Aneurysm Detection Dataset joins 31 other new or updated datasets on the Registry of Open Data in the following categories.
Climate and weather
- ARCO-OCEAN from OGS
- ECMWF IFS ENS from Dynamical.org
- EPA Hourly Prognostic Meteorological Data from US Environmental Protection Agency
- NOAA GFS from National Oceanic and Atmospheric Administration (NOAA)
- NOAA HRRR from NOAA
- NOAA nClimGrid and Livneh Gridded Historical Climate Observation Thresholds from NOAA
- Planette ERA5 Archive from Planette AI
- Rain over Africa from Geoscience and Remote Sensing at Chalmers University of Technology
Geospatial
- ASKAP Radio Telescope from Australia Telescope National Facility, CSIRO
- CCRS MODIS albedo over Canada | Albédo CCRS MODIS au-dessus du Canada from Canada Centre for Remote Sensing (CCRS), Canada Centre for Mapping and Earth Observation (CCMEO), and Department of Natural Resources Canada (NRCan)
- Danish Meteorological Institute (DMI) Reanalysis dataset v0.5 from Danish Meteorological Institute
- Google Satellite Embedding V1 from Source Cooperative
- Japan Prefectures, 3D Point Cloud Data from Association for Promotion of Infrastructure Geospatial Information Distribution (AIGID)
- Kanagawa, 3D Point Cloud Data from AIGID
- NUVIEW – Multi-State Geospatial Data from NUVIEW
Life sciences
- Aging Mouse Brain Epigenetic from Salk Institute
- Alliance of Genome Resources from Alliance of Genome Resources Consortium
- BrainGlobe Atlases from BrainGlobe
- CHAMMI-75 from Morgridge Institute for Research
- Epigenomes of the Human Pangenome Reference Consortium (HPRC) Release 2 from Ting Wang Lab
- FLAb: Fitness Landscapes for Antibodies from Jeffrey Gray Lab, Johns Hopkins University
- Human and Mammalian Brain Atlas from Allen Institute
- Impact of Variation on Function Consortium (IGVF) from IGVF Data Administration and Coordination Center at Stanford University
- LongBench – cross-platform reference dataset profiling cancer cell lines with bulk and single-cell approaches from Richie Lab, Walter and Eliza Hall Institute of Medical Research
- ONT Methylation Benchmarking Datasets from CSIR-Centre for Cellular and Molecular Biology
- Open Human Genome Library from Heng Li lab at Dana-Farber Cancer Institute and Harvard Medical School
- OpenRoboCare Multi-Modal Expert Demonstration Dataset for Robot-Assisted Caregiving from EmPRISE Lab at Cornell University
- Reference Indexes for krepp from Mirarab Lab at UC San Diego
- RNA structure by fragmentation frequency from The Genome Institute of Singapore and UMass Chan Medical School’s RNA Therapeutics Institute
- RSNA Intracranial Aneurysm Detection Dataset (RSNA-ICA) from Radiological Society of North America
- SnpEff & SnpSift Genomic Variant Annotation Databases from Pablo Cingolani
AI/ML
How can you make your data available?
Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:
- Democratize access to data by making it available for analysis on AWS
- Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Learn how to propose your dataset to the AWS Open Data Sponsorship Program.
