AWS Public Sector Blog
53 new or updated datasets available on the Registry of Open Data on AWS
The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). AWS works with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-based techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through this program, customers are making more than 100 petabytes (PB) of high-value, cloud-optimized data available for public use.
The full list of publicly available datasets are on the Registry of Open Data on AWS and are now also discoverable on AWS Data Exchange. This quarter, AWS released 53 new or updated datasets.
What will you build with these datasets?
Biodiversity Heritage Library Metadata and Page Images
BHL data is now hosted with AWS Open Data, and comprises more than 62 million pages of scientific text from the 15th to the 21st centuries. BHL’s vast collection represents an unparalleled biodiversity resource with enormous potential to be used for longitudinal studies and conservation efforts.
The Biodiversity Heritage Library Metadata and Page Images dataset joins 52 other new or updated datasets on the Registry of Open Data in the following categories.
Climate and weather
- EPA Dynamically Downscaled Ensemble (EDDE) Version 2 from National Oceanic and Atmospheric Administration (NOAA)
- Sentinel-2 ACOLITE-DSF Aquatic Reflectance for the Conterminous United States from United States Geological Survey
- IGP Coal Plant from Air Pollution Asset Database (APAD)
- SeeFar V0 from Coastal Carbon
- Global Carbon Budget Data from Global Carbon Budget Office at the University of Exeter, UK
- WIS2 Global Cache on AWS from Met Office
- NOAA Unified Forecast System (UFS) Global Ensemble Forecast System (GEFS) Version 13 Replay from NOAA
- EPA Dynamically Downscaled Ensemble (EDDE) Version 2 from Environmental Protection Agency
- Met Office Global Ocean model on a 2-year rolling archive from Met Office
- Marine Animal – Satellite Relay Tagging – Quality controlled profiles from Open Access to Ocean Data (AODN)
- Ocean Radar – Bonney coast site – Sea water velocity – Delayed mode from AODN
- Ocean Radar – Capricorn bunker group site – Sea water velocity – Delayed mode from AODN
- Ocean Radar – Coffs Harbour site – Sea water velocity – Delayed mode from AODN
- Ocean Radar – Coral coast site – Sea water velocity – Delayed mode from AODN
- Ocean Radar – Newcastle site – Sea water velocity – Delayed mode from AODN
- Ocean Radar – Northwest shelf site – Sea water velocity – Delayed mode from AODN
- Ocean Radar – Rottnest shelf site – Sea water velocity – Delayed mode from AODN
- Ocean Radar – South Australia Gulfs site (South Australia, Australia) – Sea water velocity – Delayed mode from AODN
- Ocean Radar – Turquoise coast site – Sea water velocity – Delayed mode from AODN
- Ships of Opportunity – Air-sea fluxes – Meteorological and flux – Real time from AODN
- Ships of Opportunity – Air-sea fluxes – Meteorological and sea surface temperature – Real time from AODN
Geospatial
- Satellogic EarthView dataset from Satellogic
- NASA SOTERIA Simulation Testbed Data from NASA
- James Webb Space Telescope (JWST) from Space Telescope Science Institute
- Indiana Statewide Elevation Catalog from Indiana Geographic Information Office
- DE Africa Waterbodies Monitoring Service from Digital Earth Africa
- OpenAerialMap on AWS from Humanitarian OpenStreetMap Team
- Virtual Shizuoka, 3D Point Cloud Data from Association for promotion of Infrastructure Geospatial Information Distribution (AIGID)
- SatPM2.5 from Washington University in St. Louis
- DIWASA Rainfed and Irrigated Cropland Map for Africa from IWMI
- NOAA NASA Joint Archive (NNJA) of Observations for Earth System Reanalysis from AODN
- SSL4EO S12 Landsat Multi Product Dataset from AIGID
- Unblurred Coadds of the Wide-field Infrared Survey Explorer (unWISE) from NASA/IPAC Infrared Science Archive (IRSA)
- Animal Tracking – Acoustic Telemetry – Quality controlled detections from AODN
- Satellite – Sea surface temperature – Level 4 – Multi sensor – Global Australian from AODN
- Satellite – Sea surface temperature – Level 4 – Multi sensor – Regional Australian from AODN
- Satellite – Sea surface temperature – Level 4 – Multi sensor – Regional Australian from AODN
Life sciences
- GenomeKit genomic data from Deep Genomics
- Emory Knee Radiograph (MRKR) dataset from HITI lab
- SocialGene RefSeq Databases from University of Wisconsin–Madison
- Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021 from PhysioNet
- MIMIC-IV Clinical Database Demo from PhysioNet
- MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset from PhysioNet
- Blue Brain Open Data from Ecole Polytechnique FEdErale De Lausanne
- Platinum Pedigree from Platinum Pedigree Consortium
- LEarning biOchemical Prostate cAncer Recurrence from histopathology sliDes challenge (LEOPARD) Dataset from Radboud University Medical Center
- Boltz-1 Training Data from MIT Computer Science & Artificial Intelligence Laboratory
Machine learning
- Estimating Confidence Intervals for 2020 Census Statistics Using Approximate Monte Carlo Simulation (2020 Census Production Run) from United States Census Bureau
- MAN TruckScenes from MAN Truck and Bus SE
- Met Office Global Ensemble Prediction System (MOGREPS-G) on a 30-day rolling archive
AI Weather Prediction (AIWP) Model Reforecasts from Met Office - PD12M from Spawning
- Biodiversity Heritage Library Metadata and Page Images from The Biodiversity Heritage Library
- Gretel Synthetic Safety Alignment Dataset from Gretel AI
What are people doing with open data?
- Satellogic EarthView Dataset now Openly Accessible via Registry of Open Data on AWS.
- SonarX Partners with Amazon to integrate on-chain datasets into AWS Open Data.
- Researchers can now leverage Amazon Q to extract insights from NCBI’s Pubmed Central, which is the leading resource for biomedical literature and offers a vast repository of full-text biomedical and life sciences journal articles.
- Examine genomic variation across populations with the 1000 genomes dataset on AWS Open Data.
- Amazon, Berkeley release dataset of product images and metadata. Dataset includes multiple images of 147,702 products, including 360 degree rotations and 3D models for thousands of them.
- Data dissemination for public sector on AWS with AWS Open Data.
- A global community of researchers and innovators are using open data for sustainability-related uses as part of the Amazon Sustainability Data Initiative (ASDI).
How can you make your data available?
Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:
- Democratize access to data by making it available for analysis on AWS
- Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Learn how to propose your dataset to the AWS Open Data Sponsorship Program.