36 new or updated datasets on the Registry of Open Data: AI analysis-ready datasets and more

The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). AWS works with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through this program, customers are making over 100PB of high-value, cloud-optimized data available for public use.

The full list of publicly available datasets are on the Registry of Open Data on AWS and are now also discoverable on AWS Data Exchange. This quarter, AWS released 36 new or updated datasets. As July 16 is Artificial Intelligence (AI) Appreciation Day, the AWS Open Data team is highlighting three unique datasets that are analysis-ready for AI.

What will you build with these datasets?

Three AI analysis-ready datasets on the Registry of Open Data

NYUMets Brain Dataset from the NYU Langone Medical Center is one of the largest datasets in existence of cranial imaging, and the largest dataset of metastatic cancer, containing over 8,000 brain MRI studies, clinical data, and treatment records from cancer patients. Over 2,300 images have been annotated for metastatic tumor segmentations, making NYUMets: Brain a valuable source of segmented medical imaging. An AI model for segmentation tasks as well as a longitudinal tracking tool are available for NYUMets through MONAI. Learn more about this dataset.

RACECAR Dataset from the University of Virginia is the first open dataset for full-scale and high-speed autonomous racing. RACECAR is suitable to explore issues regarding localization, object detection and tracking (LiDAR, Radar, and Camera), and mapping that arise at the limits of operation of the autonomous vehicle. You can get started with RACECAR with this SageMaker Studio Lab notebook.

Aurora Multi-Sensor Dataset from Aurora Operations, Inc. is a large-scale multi-sensor dataset with highly accurate localization ground truth, captured between January 2017 and February 2018 in the metropolitan area of Pittsburgh, PA, USA. The de-identified dataset contains rich metadata, such as weather and semantic segmentation, and spans all four seasons, rain, snow, overcast and sunny days, different times of day, and a variety of traffic conditions. This data can be used to develop and evaluate large-scale long-term approaches to autonomous vehicle localization. Aurora is applicable to many research areas including 3D reconstruction, virtual tourism, HD map construction, and map compression.

Full list of new or updated datasets

These three datasets join 33 other new or updated datasets on the Registry of Open Data in the following categories.

What are people doing with open data?

Amazon Location Service launched Open Data Maps for Amazon Location Service, a data provider option for the Maps feature based on OpenStreetMap.
Oxford Nanopore Technologies benchmarked their genomic basecalling algorithms, which decodes DNA or RNA to sequence for analysis, on 20 different Amazon Elastic Compute Cloud (Amazon EC2) instances.
HuggingFace hosted a Bio x ML Hackathon that challenged teams to leverage AI tools, open data, and cloud resources to solve problems at the intersection of the life sciences and artificial intelligence.

How can you make your data available?

Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:

Democratize access to data by making it available for analysis on AWS
Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
Encourage the development of communities that benefit from access to shared datasets

Learn how to propose your dataset to the AWS Open Data Sponsorship Program.
Learn more about open data on AWS.

AWS Public Sector Blog

36 new or updated datasets on the Registry of Open Data: AI analysis-ready datasets and more

Three AI analysis-ready datasets on the Registry of Open Data

Full list of new or updated datasets

Climate and weather:

Geospatial:

Life sciences:

Machine learning:

What are people doing with open data?

How can you make your data available?

Read more about open data on AWS:

Resources

Follow

Learn

Resources

Developers

Help