AWS Public Sector Blog

Satellite imagery over Africa, a large-scale climate ensemble, and product listings with 3D renderings: The latest open data on AWS

The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). We work with data providers to democratize access to data by making it available for analysis on AWS; to develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and to encourage the development of communities that benefit from access to shared datasets.

Our full list of publicly available datasets are on the Registry of Open Data on AWS. This quarter, we released 44 new or updated datasets including satellite imagery over Africa, a large-scale climate ensemble, and product listings with 3D renderings. Check out some highlights below.

Landsat, Sentinel-2, and Sentinel-1 data over Africa

Digital Earth Africa aims to process openly accessible data to produce decision-ready products with a focus on meeting the information needs, challenges, and priorities of the African continent. As part of this work, they have built a platform hosted in the AWS Cape Town Region. To make that platform as performant as possible, they needed access to high-quality, cloud-optimized data in the region as well. To that end, Digital Earth Africa is making several datasets available, with a footprint to cover the African continent, in the AWS Cape Town Region. They include a Sentinel-1 backscatter product, the Sentinel-2 Level 2A product, and the Landsat Collection 2 Level 2 products. Using their platform and with access to the data in-region, Digital Earth Africa has worked with organizations to monitor drought conditions in Lake Sulunga, Tanzania; protect mangroves in Zanzibar; and image the Table Mountain Fire in South Africa.

CSIRO Climate retrospective Analysis and Forecast Ensemble system (CAFE60)

CAFE60 provides the first large ensemble reconstruction of the climate over the last six decades. The ensemble size is an order of magnitude larger than any other comparable reanalysis, making it the first self-consistent data product with sufficiently many realizations and at spatio-temporal resolutions suitable to enable probabilistic studies of the recent climate. It is the first reconstruction of the climate to explicitly account for correlations between ocean, atmosphere, sea ice, and ocean biogeochemistry observations and provide robust uncertainty estimates of the reconstructed mean climate.

CAFE60 is a comprehensive and unique data resource for studying internal climate variability and predictability, including the recent climate response to human interactions with the environment on the multi-year to decadal time scales.

Amazon Berkeley Objects

The Amazon Berkeley Objects (ABO) are a collection of over 100,000 Amazon product listings with multilingual metadata and over 350,000 unique catalog images. Over 8,000 of the listings come with turntable photography, showing a 360-degree view of the product, as sequences of 24 or 72 images, for a total of over 550,000 images. Additionally, for almost 8,000 of the products, the collection also provides high-quality three-dimensional (3D) models in the standard GL Transmission Format (glTF) for easy use with 3D rendering and visualization applications.

Read more about the Amazon Berkeley Objects on the Amazon.science blog post.

PubMed Central full-text biomedical and life sciences journal articles

PubMed Central (PMC) is a free full-text archive of biomedical and life sciences journal articles at the United States National Institutes of Health’s National Library of Medicine (NIH/NLM). The PMC collection on AWS includes article packages for almost 3.2 million articles; further, the Author Manuscript (AM) Collection includes the metadata, full text, and table data of about 700,000 articles and counting.

The PMC collection presents an unprecedented opportunity to apply text mining and natural language processing to biomedical research articles that span over a decade.

Find these and other recently released datasets in the latest What’s New documentation.

How to get involved in the new datasets

We’re excited to see how you can put these great datasets to work. If you have examples of tutorials, applications, tools, or publications that use these datasets, make sure to list them on the Registry of Open Data on AWS so the community can find them. Learn how to propose your dataset to the AWS Open Data Sponsorship Program and learn more about open data on AWS.

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Joe Flasher

Joe Flasher

Joe Flasher is the open data lead at Amazon Web Services (AWS), helping organizations most effectively make data available for analysis in the cloud. The AWS Open Data program has democratized access to petabytes of data, including satellite imagery, climate & weather data, genomic data, and data used for natural language processing. He has been working with geospatial data and open source projects for the past decade, both as a contributor and maintainer. He has been a member of the Landsat Advisory Group and has worked on projects ranging from building GIS software to making the space shuttle fly. His background is in astrophysics, but kindly requests you don’t ask him any questions about constellations.