
About AWS Open Data Sponsorship Program
This program exists to help people discover and share data sets that are available by using AWS resources. Unless specifically stated in the applicable data set documentation, data sets available through the Registry of Open Data on AWS are not provided or maintained by AWS. Data sets are provided and maintained by a variety of third parties under a variety of licenses. Please check data set licenses and related documentation to determine if a data set may be used for you application. If you have a project using a listed data set please tell us about it at opendata@amazon.com.
AWS Open Data Sponsorship Program
AWS Open Data Sponsorship Program Products (319)
showing 11 - 20
Free | Publicly available
The data are a subset of the EPA Dynamically Downscaled Ensemble (EDDE), Version 1. EDDE is a collection of physics-based modeled data that represent 3D atmospheric conditions for historical and future periods under different scenarios. The EDDE Version 1 datasets cover the contiguous United States at a horizontal grid spacing of 36 kilometers at hourly increments. EDDE Version 1 includes simulations that have been dynamically downscaled from multiple global climate models (GCMs) under both mid- and high-emission scenarios from the Fifth Coupled Model Intercomparison Project (CMIP5) using the Weather Research and Forecasting (WRF) model. Scenarios were downscaled from the Community Earth System Model (CESM) and the Geophysical Fluid Dynamics Laboratory (GFDL) Coupled Model version 3 (CM3). Simulations followed the historical periods 1975-2005 (CESM only) and 1995-2005 (both CESM and CM3), and Representative Concentration Pathways (RCP) 4.5 for 2025-2100 (CESM only), RCP6.0 for 20[...]
Free | Publicly available
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. CPTAC-2 is the Phase II of the CPTAC Initiative (2011-2016). Datasets contain open RNA-Seq Gene Expression Quantification, miRNA-Seq Isoform Expression Quantification, and miRNA Expression Quantification data.
Free | Publicly available
Radiant MLHub is an open library for geospatial training data that hosts datasets generated by Radiant Earth Foundation's team as well as other training data catalogs contributed by Radiant Earth’s partners. Radiant MLHub is open to anyone to access, store, register and/or share their training datasets for high-quality Earth observations. All of the training datasets are stored using a SpatioTemporal Asset Catalog (STAC) compliant catalog and exposed through a common API. Training datasets include pairs of imagery and labels for different types of machine learning problems including image classification, object detection, and semantic segmentation. Labels are generated from ground reference data and/or image annotation.
Free | Publicly available
Tabula Muris is a compendium of single cell transcriptomic data from the model organism Mus musculus comprising more than 100,000 cells from 20 organs and tissues. These data represent a new resource for cell biology, reveal gene expression in poorly characterized cell populations, and allow for direct and controlled comparison of gene expression in cell types shared between tissues, such as T-lymphocytes and endothelial cells from different anatomical locations. Two distinct technical approaches were used for most organs: one approach, microfluidic droplet-based 3’-end counting, enabled the survey of thousands of cells at relatively low coverage, while the other, FACS-based full length transcript analysis, enabled characterization of cell types with high sensitivity and coverage. The cumulative data provide the foundation for an atlas of transcriptomic cell biology. See: https://www.nature.com/articles/s41586-018-0590-4
Free | Publicly available
Multiple sequence alignments (MSAs) for 140,000 unique Protein Data Bank (PDB) chains and 16,000,000 UniClust30 clusters. Template hits are also provided for the PDB chains and 270,000 UniClust30 clusters chosen for maximal diversity and MSA depth. MSAs were generated with HHBlits (-n3) and JackHMMER against MGnify, BFD, UniRef90, and UniClust30 while templates were identified from PDB70 with HHSearch, all according to procedures outlined in the supplement to the AlphaFold 2 Nature paper, Jumper et al. 2021. We expect the database to be broadly useful to structural biologists training or validating deep learning models for protein structure prediction and related tasks.
Free | Publicly available
Blunt force abdominal trauma is among the most common types of traumatic injury, with the most frequent cause being motor vehicle accidents. Abdominal trauma may result in damage and internal bleeding of the internal organs, including the liver, spleen, kidneys, and bowel. Detection and classification of injuries are key to effective treatment and favorable outcomes. A large proportion of patients with abdominal trauma require urgent surgery. Abdominal trauma often cannot be diagnosed clinically by physical exam, patient symptoms, or laboratory tests. Prompt diagnosis of abdominal trauma using medical imaging is thus critical to patient care. AI tools that assist and expedite diagnosis of abdominal trauma have the potential to substantially improve patient care and health outcomes in the emergency setting. To create the ground truth dataset, RSNA collected imaging data sourced from 23 sites in 14 countries on six continents, including more than 4,000 CT exams with various abdomina[...]
Free | Publicly available
Umbra satellites generate the highest resolution Synthetic Aperture Radar (SAR) imagery ever offered from space, up to 16-cm resolution. SAR can capture images at night, through cloud cover, smoke and rain. SAR is unique in its abilities to monitor changes. The Open Data Program (ODP) features over twenty diverse time-series locations that are updated frequently, allowing users to experiment with SAR's capabilities. We offer single-looked spotlight mode in either 16cm, 25cm, 35cm, 50cm, or 1m resolution, and multi-looked spotlight mode. The ODP also features an assorted collection of over 250+ images and counting of various locations around the world, ranging from emergency response, to gee-whiz sites. If you have a suggestion for a new location, feedback on the dataset, or any questions, contact us at umbra.space/open-data.
Free | Publicly available
The OIDA Data on AWS contain the metadata, documents, and extracted text for all of the documents in the UCSF-JHU Opioid Industry Documents Archive, a growing corpus of internal corporate records and other documents arising from the opioid industry.
Free | Publicly available
9092 crowd-sourced science questions and 68 tables of curated facts
Free | Publicly available
OCMR is an open-access repository that provides multi-coil k-space data for cardiac cine. The fully sampled MRI datasets are intended for quantitative comparison and evaluation of image reconstruction methods. The free-breathing, prospectively undersampled datasets are intended to evaluate their performance and generalizability qualitatively.
showing 11 - 20