AWS Public Sector Blog

OpenFold, OpenAlex catalog of scholarly publications, and Capella Space satellite data: The latest open data on AWS

The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). We work with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through this program, customers are making over 100 petabytes (PB) of high-value, cloud-optimized data available for public use.

Our full list of publicly available datasets are on the Registry of Open Data on AWS and are now also discoverable on AWS Data Exchange. This quarter, we released 15 new or updated datasets including OpenFold, OpenAlex, and radar data from Capella Space. Check out some highlights:

OpenFold training data for protein structure prediction

OpenFold, an Open Molecular Science Foundation project driven by a private-public consortium including Columbia University, Arzeda, and Cyrus Biotechnology, was developed as a trainable, fully open source improvement on AlphaFold2, which disrupted the protein structure prediction space with its debut in 2021. Its accompanying training dataset is a comprehensive, open source, machine-learning (ML) ready dataset for protein structure prediction.

OpenAlex, an index of the entire scholarly research ecosystem

Launched this quarter, OpenAlex is an open and comprehensive index of the entire scholarly research ecosystem. Named after the ancient Library of Alexandria, the dataset aims to discover, disambiguate, index, and document the connections between all the world’s scholarly papers, journals, authors, institutions, and concepts. In keeping with the theme of openness, the code behind it is all open source, and the data is all permissively licensed and designed to be simply used within production workloads. Whether you want to understand the impact of a given research area, discover how ideas and authors are linked through time, or build a front-end to help researchers find papers, the data is all there. OpenAlex joins PubMed Central® and CORD-19 as textual repositories collecting scholarly articles across a number of domains.

Capella Space Open Data and Sentinel-1 Single Look Complex (SLC) data for Germany

Two new Synthetic Aperture Radar (SAR) datasets launched this quarter: Capella Space SAR Open Dataset and Sentinel-1 Single Look Complex(SLC) for Germany. Capella Space is providing a growing collection of radar products and formats from its constellation of very high resolution SAR satellites to help further its mission to make Earth observation (EO) an essential tool for problem solving. Sentinel-1 SLC data for Germany includes radar data processed in a format that enables a wide array of applications including natural hazards and emergency response, oil spill monitoring, and monitoring sea-ice conditions. LiveEO has released the historical archive from 2014 to present from the Alaska Satellite Facility (ASF) DAAC as unzipped files, which drastically improves the efficiency and processing of this data.

Here is a full list of the datasets released this quarter joining over 300 datasets already available:

Climate and weather:


Life sciences:

Machine learning:

Renewable energy:


Statistical and regulatory:

We’re excited to see how you can put these great datasets to work. If you have examples of tutorials, applications, tools, or publications that use these datasets, make sure to list them on the Registry of Open Data on AWS so the community can find them. Learn how to propose your dataset to the AWS Open Data Sponsorship Program and learn more about open data on AWS.

Read related stories about AWS and open data:

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.