AWS Public Sector Blog
82 new or updated datasets available on the Registry of Open Data on AWS
The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). AWS works with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-based techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through the AWS Open Data Sponsorship Program, customers are making over 300 PB of high-value, cloud-optimized data available for public use.
All publicly available datasets can be found in the Registry of Open Data on AWS and are now also discoverable on Exchange. This quarter, AWS released 82 new or updated datasets.
What are people currently doing with AWS Open Data?
- Amazon employees are revolutionizing earth observation with geospatial foundation models on AWS utilizing open data. In this post, we explore how Clay Foundation’s Clay foundation model, available on Hugging Face, can be deployed for large-scale inference and fine-tuning on Amazon SageMaker.
- A tutorial on how to use life sciences data from AWS Open Data program in Amazon Bedrock. A look at how to use datasets in the Registry of Open Data on AWS with Amazon Bedrock Knowledge Bases. With Amazon Bedrock Knowledge Bases, you can give foundation models (FMs) and agents contextual information from private and public data sources to deliver more relevant, accurate, and customized responses.
- The AWS Open Data team hosted an Open Data Life Sciences Hackathon from October 1-3, 2025 at Amazon HQ2 in Arlington, Virgina. This was an in-person only, 3-day hackathon for researchers interested in building knowledge graphs using large publicly available life sciences datasets from AWS Open Data.
- The POWER Project from NASA provides direct access to its complete datastore in Amazon Simple Storage Service (Amazon S3) buckets in cloud-optimized formats. This datastore and associated access is provided by AWS’s Registry of Open Data and is accessible free of charge to everyone.
- E11 Bio released a new brain tissue dataset (E11bio PRISM) within the Registry of Open Data on AWS. This new dataset is a key first demonstration of a novel technology that will increase our ability to trace neurons and their connections through complicated brain tissue. This will allow neuroscientists to better understand the wiring of mammalian brains and ultimately revolutionize neuroscience and the treatment of neurological diseases.
- Interactive access and visualization of geospatial data from the AWS Open Data Program. Access to high-quality geospatial data is no longer limited to technical experts with large computing resources. Thanks to collaborations between open data initiatives such as AWS Open Data, Amazon Sustainability Data Initiative (ASDI), and the Maxar Open Data program, coupled with intuitive tools such as Leafmap and Solara, anyone can explore and visualize critical Earth data in minutes.
What will you build with these datasets?
E11 Bio PRISM
We are excited to announce the release of E11 Bio’s brain tissue dataset on AWS as part of the Registry of Open Data on AWS (E11bio PRISM). This novel dataset from E11 Bio, a nonprofit Convergent Research Focused Research Organization (FRO) in collaboration with the Francis Crick Institute, Massachusetts Institute of Technology (MIT), and the Max Planck Institute contains light microscopy images and the traced paths of individual neurons. The publication of this dataset is a key first demonstration of a novel technology that will increase our ability to trace neurons and their connections through complicated brain tissue. This will allow neuroscientists to better understand the wiring of mammalian brains and ultimately revolutionize neuroscience and the treatment of neurological diseases.
E11 Bio joins 81 other new or updated datasets on the Registry of Open Data in the following categories.
Climate and weather
- MERRA-2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics 0.625 x 0.5 degree from NASA
- MERRA-2 inst3_3d_aer_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Aerosol Mixing Ratio 0.625 x 0.5 degree from NASA
- MERRA-2 inst3_3d_asm_Np: 3d,3-Hourly,Instantaneous,Pressure-Level,Assimilation,Assimilated Meteorological Fields from NASA
- MERRA-2 inst3_3d_asm_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Assimilated Meteorological Fields 0.625 x 0.5 degree from NASA
- AIRS/Aqua L1C Infrared (IR) resampled and corrected radiances V6.7 (AIRICRAD) at GES DISC from NASA
- GPM IMERG Early Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 (GPM_3IMERGHHE) at GES DISC from NASA
- GPM IMERG Final Precipitation L3 1 month 0.1 degree x 0.1 degree V07 (GPM_3IMERGM) at GES DISC from NASA
- GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 (GPM_3IMERGHH) at GES DISC from NASA
- GPM IMERG Late Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 (GPM_3IMERGHHL) at GES DISC from NASA
- OPERA Dynamic Surface Water Extent from Harmonized Landsat Sentinel-2 product (Version 1) from NASA
- GPM IMERG Early Precipitation L3 1 day 0.1 degree x 0.1 degree V07 (GPM_3IMERGDE) at GES DISC from NASA
- GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V07 (GPM_3IMERGDF) at GES DISC from NASA
- GPM IMERG Late Precipitation L3 1 day 0.1 degree x 0.1 degree V07 (GPM_3IMERGDL) at GES DISC from NASA
- OPERA Radiometric Terrain Corrected SAR Backscatter from Sentinel-1 validated product (Version 1) from NASA
- GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1) from NASA
- OPERA Land Surface Disturbance Annual from Harmonized Landsat Sentinel-2 product (Version 1) from NASA
- OPERA Radiometric Terrain Corrected SAR Backscatter from Sentinel-1 Static Layers validated product (Version 1) from NASA
- GEDI L2A Elevation and Height Metrics Data Global Footprint Level V002 from NASA
- GHRSST Level 2P Global Sea Surface Skin Temperature from the MODIS on the NASA Terra satellite (GDS2) from NASA
- GPM DPR Precipitation Profile L2A 1.5 hours 5 km V07 (GPM_2ADPR) at GES DISC from NASA
- ABoVE: Bias-Corrected IMERG Monthly Precipitation for Alaska and Canada, 2000-2020 from NASA
- AIRS/Aqua L1B Infrared (IR) geolocated and calibrated radiances V005 (AIRIBRAD) at GES DISC from NASA
- MODIS/Aqua Surface Reflectance Daily L2G Global 1km and 500m SIN Grid V061 from NASA
- MODIS/Aqua Surface Reflectance Daily L2G Global 250m SIN Grid V061 from NASA
- MODIS/Terra Calibrated Radiances 5-Min L1B Swath 500m from NASA
- MODIS/Terra Net Evapotranspiration 8-Day L4 Global 500m SIN Grid V061 from NASA
- MODIS/Terra Surface Reflectance 8-Day L3 Global 500m SIN Grid V061 from NASA
- MODIS/Terra Surface Reflectance Daily L2G Global 1km and 500m SIN Grid V061 from NASA
- MODIS/Terra Surface Reflectance Daily L2G Global 250m SIN Grid V061 from NASA
- MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061 from NASA
- MODIS/Terra+Aqua BRDF/Albedo Albedo Daily L3 Global – 500m V061 from NASA
- MODIS/Terra+Aqua BRDF/Albedo Model Parameters Daily L3 Global – 500m V061 from NASA
- MODIS/Terra+Aqua BRDF/Albedo Nadir BRDF-Adjusted Ref Daily L3 Global – 500m V061 from NASA
Geospatial
- OPERA Coregistered Single-Look Complex from Sentinel-1 Static Layers validated product (Version 1) from NASA
- OPERA Coregistered Single-Look Complex from Sentinel-1 validated product (Version 1) from NASA
- OPERA Dynamic Surface Water Extent from Sentinel-1 (Version 1) from NASA
- OPERA Land Surface Disturbance Alert from Harmonized Landsat Sentinel-2 product (Version 1) from NASA
- OPERA Land Surface Disturbance Alert from Harmonized Landsat Sentinel-2 provisional product (Version 0) from NASA
- NCEP/CPC L3 Half Hourly 4km Global (60S – 60N) Merged IR V1 (GPM_MERGIR) at GES DISC from NASA
- OPERA Surface Displacement from Sentinel-1 validated product (Version 1) from NASA
- SENTINEL-1A_DUAL_POL_GRD_HIGH_RES from NASA
- SENTINEL-1A_SLC from NASA
- SENTINEL-1B_DUAL_POL_GRD_HIGH_RES from NASA
- SENTINEL-1B_SLC from NASA
- MISR Level 1B2 Ellipsoid Data V004 from NASA
- Gaia DR3 from Space Telescope Science Institute
- SatPM2.5 from Atmospheric Composition Analysis Group
- Sanborn Maps Data Package from Library of Congress
- GeoJSON Files for Geo-TIDE from MIT Climate & Sustainability Consortium
- GLAD Landsat ARD from Global Land Analysis and Discovery Lab
- CanElevation – Canada Digital Elevation Models from Natural Resources Canada
- ICEYE Synthetic Aperture Radar (SAR) Open Dataset from ICEYE
- NOAA National Blend of Models (NBM) Parallel from National Oceanic and Atmospheric Administration (NOAA)
- ASTER Level 1T Precision Terrain Corrected Registered At-Sensor Radiance V004 from NASA
- ATLAS/ICESat-2 L2A Global Geolocated Photon Data V006 from NASA
- ATLAS/ICESat-2 L3A Land and Vegetation Height V006 from NASA
- GEDI L4A Footprint Level Aboveground Biomass Density, Version 2.1 from NASA
- HLS Landsat Operational Land Imager Surface Reflectance and TOA Brightness Daily Global 30m v2.0 from NASA
- HLS Sentinel-2 Multi-spectral Instrument Surface Reflectance Daily Global 30m v2.0 from NASA
- Land/Sea static mask relevant to IMERG precipitation 0.1×0.1 degree V2 (GPM_IMERG_LandSeaMask) at GES DISC from NASA
- State of Colorado Elevation Data from State of Colorado Governors Office of Information Technology OIT GIS team
- Community coral reef image classification training data from MERMAID
- SpaceEye-T VVHR EO Open Data from SI Imaging Services
Life sciences
- AI3 Protein-Ligand Binding Affinity Dataset from International Institute of Information Technology Hyderabad
- CartoStore from University of Michigan School of Public Health
- Dendritic Consortium Multimodal Dataset from Dendritic Consortium
- BUSCO Datasets from Computational Evolutionary Genomics Group, University of Geneva
- BioLiP from Zhang Lab
- Meta-Organized Stimuli And fMRI Imaging data for Computational modeling (MOSAIC) from Massachusetts Institute of Technology
- IBL Neuropixels Brainwide Map on AWS from International Brain Laboratory
- PET 1.6k – Whole-/Total-Body [18F]FDG-PET/CT with CT-Derived Segmentations from ENHANCE.PET Team
- Clinical Ultrasound Image Repository from MONAI Development Team
- UCSF Renal Mass CT Dataset from UCSF Larson Advanced Imaging Lab
- IBL Neuropixels Brainwide Map on AWS from International Brain Laboratory
- Meta-Organized Stimuli And fMRI Imaging data for Computational modeling (MOSAIC) from Massachusetts Institute of Technology (MIT)
- Clinical Ultrasound Image Repository from MONAI Development Team
- E11bio PRISM from E11 Bio
- DeepDrug Protein Embeddings Bank (DPEB) from Louisiana State University
- The Cancer Dependency Map (DepMap) Cancer Cell Line Encyclopedia (CCLE) Dataset from Broad Institute
Machine learning
- Essential-Web v1.0: 24T tokens of organized web data from EssentialAI
- CarbonPDF from Pittcps Lab
- Indian Supreme Court Judgments from Dattam Labs
How can you make your data available?
Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:
- Democratize access to data by making it available for analysis on AWS
- Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Learn how to propose your dataset to the AWS Open Data Sponsorship Program.