AWS Public Sector Blog
33 new or updated datasets on the Registry of Open Data for Earth Day and more
The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). AWS works with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through this program, customers are making over 100PB of high-value, cloud-optimized data available for public use.
The full list of publicly available datasets are on the Registry of Open Data on AWS and are also discoverable on AWS Data Exchange. As April 22 is Earth Day, the AWS Open Data team wanted to highlight some new datasets from our geospatial and environmental communities of practice.
Ensemble Meteorological Dataset for Planet Earth
Gridded meteorological estimates are essential for applications in hydrological, meteorological, and climate research. Many of these datasets are deterministic in nature, limiting their use and application. The Ensemble Meteorological Dataset for Planet Earth (EM-Earth) was developed to address these limitations and meet the diverse requirements for global hydro-meteorological applications. It provides hourly and daily deterministic estimates, and daily probabilistic estimates for global land areas from 1950 to 2019 at 0.1 degree grids.
VENUS L2A Cloud-Optimized GeoTIFFs
The VENµS (Vegetation and Environment New micro (µ) Satellite) science mission captures repeat Earth observations every two days for unique locations around the world at a spatial resolution of five meters. By precisely monitoring plant growth and health status, VENµS helps scientists study the impacts of human- and environmental-influenced change on the Earth’s land surfaces. The VENUS L2A Cloud-Optimized GeoTIFFs dataset is distributed by EarthDaily Analytics in Analysis Ready Data (ARD) formats that include Cloud Optimized GeoTIFFs (COGs) and SpatioTemporal Asset Catalog (STAC) metadata.
Umbra Synthetic Aperture Radar (SAR) open data and JAXA PALSAR-2 Turkey & Syria Earthquake dataset
The Umbra Synthetic Aperture Radar (SAR) open dataset monitors ten diverse locations around the world, allowing users to detect changes in each location over time by providing the highest spatial resolution commercial SAR imagery captured from space. Applications include monitoring floating oil lid inventory, deforestation, container ports, and more. The PALSAR-2 ScanSAR Turkey & Syria Earthquake dataset from the Japan Aerospace Exploration Agency (JAXA) provides SAR imagery for areas impacted by the magnitude 7.8 earthquake that struck Turkey and Syria on February 6, 2023.
Full list of new or updated datasets
These datasets join 29 other new or updated datasets on the Registry of Open Data in the following categories.
Astronomy:
- Mars Spectrometry: Detect Evidence for Past Habitability from National Aeronautics and Space Administration (NASA)
- Mars Spectrometry 2: Gas Chromatography for the Sample Analysis at Mars Data (SAM) Instrument from NASA
- NASA / USGS Controlled THEMIS Mosaics from NASA
Climate and weather:
- Grid Algorithms and Data Analytics Library (GADAL) from National Renewable Energy Laboratory (NREL)
- Ensemble Meteorological Dataset for Planet Earth, EM-Earth from Computational Hydrology at the University of Saskatchewan
- NOAA Unified Forecast System (UFS) Land Data Assimilation (DA) System from National Oceanic and Atmospheric Administration (NOAA)
- NOAA Whole Atmosphere Model-Ionosphere Plasmasphere Electrodynamics (WAM-IPE) Forecast System (WFS) from NOAA
- NOAA Wang Sheeley Arge (WSA) Enlil from NOAA
- NOAA Multi-Radar/Multi-Sensor System (MRMS) from NOAA
- HYbrid Coordinate Ocean Model Global Ocean Forecast System Reanalysis from Center for Ocean-Atmospheric Prediction Studies (COAPS)
Internet and networking:
- End of Term Web Archive Dataset from End of Term Web Archive
Geospatial:
- Sentinel Near Real-time Canada Mirror from Natural Resources Canada
- PALSAR-2 ScanSAR Turkey & Syria Earthquake (L2.1 & L1.1) from JAXA
- Umbra Synthetic Aperture Radar (SAR) Open Data from Umbra
- High Resolution Canopy Height Maps by WRI and Meta from Meta
- VENUS L2A Cloud-Optimized GeoTIFFs from EarthDaily Analytics
- Argoverse autonomous driving dataset from Argoverse
- (Updated) 10m Annual Land Use Land Cover (9-class) now includes STAC metadata and catalog endpoint
- (Updated) National Agriculture Imagery Program (NAIP) now includes 2021 imagery
Life sciences:
- Allen Institute for Neural Dynamics – Mouse Neuroanatomy and Physiology Data from the Allen Institute for Neural Dynamics
- Classification Of Basal cell carcinoma, Risky skin cancers and Abnormalities (COBRA) from Radboud University Medical Center
- EMory BrEast Imaging Dataset (EMBED) from Emory University
- GX database for NCBI Foreign Contamination Screen (FCS) Tool Suite from the National Institutes of Health
- Guy’s Breast Cancer Lymph Nodes (GRAPE) from King’s College London
- Synthea Coherent Data Set from MITRE
- VitalDB from Seoul National University
- Allen Institute for Brain Science – Synaptic Physiology Public Data Set from the Allen Institute for Brain Science
- NASA Physical Sciences Informatics (PSI) from NASA
- (Updated) Gabriella Miller Kids First Pediatric Research Program (Kids First) from the Children’s Hospital of Philadelphia now lists additional study cohorts
- (Updated) UniProt from the Swiss Institute for Bioinformatics now includes the 2023_01 UniProt release
- (Updated) The Singapore Nanopore Expression Dataset from the Genome Institute of Singapore now includes BLOW5 files for sequenced cell lines
Machine learning:
- Shopping Humor Generation from Amazon
- Multi Token Completion from Amazon
Using open data available on AWS
What are people doing with open data? Here are a few projects using open data available on AWS.
- The University of Wisconsin-Madison used the Speedtest by Ookla Global Fixed and Mobile Network Performance Maps dataset to define the digital gap in internet speed during the COVID-19 pandemic in this peer-reviewed publication.
- Pirate Weather uses NOAA High-Resolution Rapid Refresh (HRRR) model, NOAA Global Forecast System (GFS), NOAA Global Ensemble Forecast System (GEFS) Re-forecast, and ERA5 datasets from the European Centre for Medium-Range Weather Forecasts (ECMWF) to provide a no-cost, open, and documented API for weather forecasters.
- FruitPunch.AI is running an AI for Forest Elephants challenge using data from the Elephant Listening Project at K. Lisa Yang Center for Conservation Bioacoustics at Cornell University, who makes the Sounds of Central Africa dataset openly available.
- Meta provides the Daylight Map Distribution of OpenStreetMap in Analysis-Ready parquet files on the Registry of Open Data containing nearly one billion queryable map features. Learn how you can query this data with Amazon Athena in this step-by-step tutorial.
Get started with open data on AWS
Looking to make your data openly accessible on AWS? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to democratize access to data by making it available for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Learn how to propose your dataset to the AWS Open Data Sponsorship Program.
Learn more about open data on AWS.
Read more about open data on AWS:
- 34 new or updated datasets on the Registry of Open Data: New data for land use, Alzheimer’s Disease, and more
- Creating access control mechanisms for highly distributed datasets
- How researchers can meet new open data policies for federally-funded research with AWS
- Making weather forecasts more accessible using serverless infrastructure and open data on AWS
- Understanding wildfire risk in a changing climate with open data and AWS
- Accelerating and democratizing research with the AWS Cloud
- Introducing 10 minute cloud tutorials for research
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.
Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.