About AWS Open Data Sponsorship Program

This program exists to help people discover and share data sets that are available by using AWS resources. Unless specifically stated in the applicable data set documentation, data sets available through the Registry of Open Data on AWS are not provided or maintained by AWS. Data sets are provided and maintained by a variety of third parties under a variety of licenses. Please check data set licenses and related documentation to determine if a data set may be used for you application. If you have a project using a listed data set please tell us about it at opendata@amazon.com.

AWS Open Data Sponsorship Program

Visit the AWS Open Data Sponsorship Program website

AWS Open Data Sponsorship Program Products (297)

showing 141 - 150

Medical Segmentation Decathlon

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

With recent advances in machine learning, semantic segmentation algorithms are becoming increasingly general purpose and translatable to unseen tasks. Many key algorithmic advances in the field of medical imaging are commonly validated on a small number of tasks, limiting our understanding of the generalisability of the proposed contributions. A model which works out-of-the-box on many tasks, in the spirit of AutoML, would have a tremendous impact on healthcare. The field of medical imaging is also missing a fully open source and comprehensive benchmark for general purpose algorithmic validation and testing covering a large span of challenges, such as: small data, unbalanced labels, large-ranging object scales, multi-class labels, and multimodal imaging, etc. This challenge and dataset aims to provide such resource through the open sourcing of large medical imaging datasets on several highly different tasks, and by standardising the analysis and validation process.

Epoch of Reionization Dataset

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

The data are from observations with the Murchison Widefield Array (MWA) which is a Square Kilometer Array (SKA) precursor in Western Australia. This particular dataset is from the Epoch of Reionization project which is a key science driver of the SKA. Nearly 2PB of such observations have been recorded to date, this is a small subset of that which has been exported from the MWA data archive in Perth and made available to the public on AWS. The data were taken to detect signatures of the first stars and galaxies forming and the effect of these early stars and galaxies on the evolution of the universe.

Genome in a Bottle on AWS

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

Several reference genomes to enable translation of whole human genome sequencing to clinical practice. On 11/12/2020 these data were updated to reflect the most up to date GIAB release.

AI2 Meaningful Citations Data Set

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

630 paper annotations

VirtualFlow Ligand Libraries

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

VirtualFlow Versions of Ligand Libraries in Ready-To-Dock Format

NASA / USGS Europa Controlled Observations

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

The Solid State Imager (SSI) on NASA's Galileo spacecraft acquired more than 500 images of Jupiter's moon, Europa. These images vary from relatively low-resolution hemispherical imaging, to high-resolution targeted images that cover a small portion of the surface. Here we provide a set of 481 minimally processed, projected Galileo images with photogrammetrically improved locations on Europa's surface. These individual images were subsequently used as input into a set of 92 observation mosaics. These images provide users with nearly the entire Galileo Europa imaging dataset at its native resolution and with improved relative image locations. The Solid State Imager on NASA's Galileo spacecraft provided the only moderate- to high-resolution images of Jupiter's moon, Europa. Unfortunately, uncertainty in the position and pointing of the spacecraft, as well as the position and orientation of Europa, when the images were acquired resulted in significant errors in image locati[...]

10m Annual Land Use Land Cover (9-class)

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

This dataset, produced by Impact Observatory, Microsoft, and Esri, displays a global map of land use and land cover (LULC) derived from ESA Sentinel-2 imagery at 10 meter resolution for the years 2017 - 2023. Each map is a composite of LULC predictions for 9 classes throughout the year in order to generate a representative snapshot of each year. This dataset was generated by Impact Observatory, which used billions of human-labeled pixels (curated by the National Geographic Society) to train a deep learning model for land classification. Each global map was produced by applying this model to the Sentinel-2 annual scene collections from the Mircosoft Planetary Computer. Each of the maps has an assessed average accuracy of over 75%. These maps have been improved from Impact Observatory’s previous release and provide a relative reduction in the amount of anomalous change between classes, particularly between “Bare” and any of the vegetative classes “Trees,” “Crops,” “Flooded V[...]

Oregon Health & Science University Chronic Neutrophilic Leukemia Dataset

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

The OHSU-CNL study offers the whole exome and RNA-sequencing on a cohort of 100 cases with rare hematologic malignancies such as Chronic neutrophilic leukemia (CNL), atypical chronic myeloid leukemia (aCML), and unclassified myelodysplastic syndrome/myeloproliferative neoplasms (MDS/MPN-U). This dataset contains open RNA-Seq Gene Expression Quantification data.

Refgenie reference genome assets

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

Pre-built refgenie reference genome data assets used for aligning and analyzing DNA sequence data.

The Singapore Nanopore Expression Data Set

Sold by AWS Open Data Sponsorship Program

Free | Publicly available

The Singapore Nanopore Expression (SG-NEx) project is an international collaboration to generate reference transcriptomes and a comprehensive benchmark data set for long read Nanopore RNA-Seq. Transcriptome profiling is done using PCR-cDNA sequencing (PCR-cDNA), amplification-free cDNA sequencing (direct cDNA), direct sequencing of native RNA (direct RNA), and short read RNA-Seq. The SG-NEx core data includes 5 of the most commonly used cell lines and it is extended with additional cell lines and samples that cover a broad range of human tissues. All core samples are sequenced with at least 3 high quality replicates. For a subset of samples spike-in RNAs are used and matched m6A profiling data is available.

showing 141 - 150