AWS Public Sector Blog

24 new or updated datasets available on the Registry of Open Data on AWS

 

24 new or updated datasets available on the Registry of Open Data on AWS

The Amazon Web Services (AWS) Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on AWS. AWS works with data providers to democratize access to data by making it available to the public.

People can use it for analyzing data on AWS or developing new cloud-based techniques, formats, and tools that lower the cost of working with data. This is to encourage the development of communities that benefit from access to shared datasets.

Through the AWS Open Data Sponsorship Program, customers are making over 300 petabytes of high-value, cloud-optimized data available for public use.

All publicly available datasets can be found in the Registry of Open Data on AWS and are also discoverable on AWS Data Exchange. This quarter, AWS released 24 new or updated datasets.

What are people doing with the Registry of Open Data on AWS?

Organizations are using the Registry of Open Data on AWS in many different ways, including:

OpenFold3 Training Data from OpenFold Consortium

The OpenFold Consortium announced a major OpenFold3 update as well as the public release of training datasets and full-stack tooling for reproducible biomolecular AI. OpenFold3 is an open source deep learning system for cofolding that predicts the 3D structures of biomolecular complexes from sequence and molecular inputs, including proteins interacting with small molecules and nucleic acids. OpenFold3 enables structure prediction for biomolecular complexes relevant to drug discovery, protein engineering, and basic research, supporting both evaluation workflows and downstream method development.

With this update, OpenFold3 is available as an end-to-end open cofolding stack, including training datasets, model weights, training and inference code, and evaluation scripts released under permissive licenses. This full-stack release facilitates independent reproduction of reported results, rigorous benchmarking, and extension through fine-tuning and method development, which are difficult capabilities to achieve with closed or inference-only systems.

The OpenFold3 dataset joins 23 other new or updated datasets on the Registry of Open Data on AWS in the following categories:

Climate and weather

Geospatial

Life sciences

AI/ML

How can you make your data available?

The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:

  • Democratize access to data by making it available for analysis on AWS
  • Develop new cloud-based techniques, formats, and tools that lower the cost of working with data
  • Encourage the development of communities that benefit from access to shared datasets

Learn how to propose your dataset to the AWS Open Data Sponsorship Program.

Learn more about open data on AWS.

Kyle Cook

Kyle Cook

Kyle is the technical program manager for Amazon Web Services (AWS) Open Data, focusing on initiatives to make high-value datasets publicly accessible. He works with customers across all of AWS globally as well as supporting internal Amazon teams who seek to democratize access to data by making it available on AWS. In his free time, he enjoys traveling, cooking, reading, and watching sports.