AWS Public Sector Blog
34 new or updated datasets available on the Registry of Open Data on AWS
The Amazon Web Services (AWS) Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on AWS. We work with data providers to:
- Democratize access to data by making it available to the public for analysis on AWS
- Develop new cloud-based techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Through this program, customers are making more than 100 petabytes (PB) of high-value, cloud-optimized data available for public use. The full list of publicly available datasets is on the Registry of Open Data on AWS and these datasets are also discoverable on AWS Data Exchange. This quarter, AWS released 34 new or updated datasets. What will you build with these datasets?
More AI analysis-ready datasets on the Registry of Open Data
The Wind AI Bench data lake contains multiple datasets related to fundamental problems in wind energy research. This includes data for wind plant power production for various layouts and wind flow scenarios, data for two- and three-dimensional flow around different wind turbine airfoils or blades, and wind turbine noise production, among others. The purpose of these datasets is to establish a standard benchmark against which new artificial intelligence and machine learning (AI/ML) methods can be tested, compared, and deployed. Details regarding the generation and formatting of the data for each dataset are included in the metadata, as well as example notebooks and documentation that show how to access the data for ML modeling.
Full list of new or updated datasets
The Wind AI Bench dataset joins 33 other new or updated datasets on the Registry of Open Data in the following categories.
Climate and weather
- NOAA Historical Maps and Charts from NOAA
- USGS COAWST (Coupled Ocean Atmosphere Wave and Sediment Transport) Forecast Model Archive from Fathom Science
- Wind AI Bench from National Renewable Energy Laboratory
- Department of Energy’s Marine Energy Data Lake from National Renewable Energy Laboratory
- Department of Energy’s Geothermal Data Repository (GDR) Data Lake from National Renewable Energy Laboratory
- Southern South American Drought Information System (SISSA) daily forecast retrospective database from SISSA
- World Bank Climate Change Knowledge Portal (CCKP) from the World Bank Group
- Open-Meteo Weather API Database from Open-Meteo
- Sofar Spotter Archive buoy global network from Sofar Ocean
- Blended TROPOMI+GOSAT Satellite Data Product for Atmospheric Methane from Nicholas Balasus
- HYCOM-OceanTrack Integrated HYCOM Eulerian Fields and Lagrangian Trajectories Dataset from Shane Elipot
- Chalmers Cloud Ice Climatology from Geoscience and Remote Sensing at Chalmers University of Technology
- Whiffle WINS50 Open Data on AWS from Whiffle
- NOAA Multi-Year Reanalysis of Remotely Sensed Storms (MYRORSS) from NOAA
- NOAA GraphCast Global Forecast System (GFS) from NOAA
- Community Multiscale Air Quality (CMAQ) 2019 3D Gridded and Column data from the EPA’s Air Quality Time Series (EQUATES) Project from EPA
- My School Today from SDSN SDGs Today
Geospatial
- Northern California Earthquake Data from Northern California Earthquake Data Center
- OpenStreetMap real-changesets from OpenStreetMap US
- Global 30m Height Above Nearest Drainage (HAND) from the Alaska Satellite Facility (ASF)
- Overture Open Map Data from Overture Maps Foundation
- CitrusFarm Dataset from Autonomous Robots and Control Systems Lab
- Pan-STARRS PS1 Survey from Space Telescope Science Institute
- Alaska Satellite Facility (ASF) Synthetic Aperture Radar (SAR) data products for Disaster Events from the Alaska Satellite Facility (ASF)
- Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE) from The Alaska Satellite Facility (ASF)
- Catalina Sky Survey (CSS) subset data on AWS from the Planetary Data Systems Small Bodies Node (SBN)
Life sciences
- USearch Molecules from Ash Vardanian
- International Cardiac Arrest REsearch consortium (I-CARE) Database from Brain Data Science Platform
- UK Biobank Pharma Proteomics Project (UKB-PPP) from Sage Bionetworks
- Automated Segmentation of Intracellular Substructures in Electron Microscopy (ASEM) from Kirchhausen Lab at Harvard Medical School
- Single-Cell Atlas of Human Blood During Healthy Aging from Sage Bionetworks
Machine learning
- nuPLan autonomous driving benchmark from Motional, Inc.
- nuScenes large-scale dataset for autonomous driving from Motional, Inc.
- DARPA Invisible Headlights Dataset from Kitware
What are people doing with open data?
- Amazon SageMaker uses datasets from the Registry of Open Data in the SageMaker Geospatial feature.
- The COVID Moonshot Consortium crowdsourced the discovery of potent COVID-19 therapeutics.
- Scion Research applies AI to open aerial data to assess the impact of Cyclone Gabrielle on New Zealand’s forests.
How can you make your data available?
Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:
- Democratize access to data by making it available for analysis on AWS
- Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Learn how to propose your dataset to the AWS Open Data Sponsorship Program.