
About AWS Open Data Sponsorship Program
This program exists to help people discover and share data sets that are available by using AWS resources. Unless specifically stated in the applicable data set documentation, data sets available through the Registry of Open Data on AWS are not provided or maintained by AWS. Data sets are provided and maintained by a variety of third parties under a variety of licenses. Please check data set licenses and related documentation to determine if a data set may be used for you application. If you have a project using a listed data set please tell us about it at opendata@amazon.com.
AWS Open Data Sponsorship Program
AWS Open Data Sponsorship Program Products (319)
showing 41 - 50
Free | Publicly available
The study describes integrative analysis of genetic lesions in 574 diffuse large B cell lymphomas (DLBCL) involving exome and transcriptome sequencing, array-based DNA copy number analysis and targeted amplicon resequencing. The dataset contains open RNA-Seq Gene Expression Quantification data.
Free | Publicly available
NASA missions like the Curiosity and Perseverance rovers carry a rich array of instruments suited to collect data and build evidence towards answering if Mars ever had livable environmental conditions. These rovers can collect rock and soil samples and can take measurements that can be used to determine their chemical makeup. Because communication between rovers and Earth is severely constrained, with limited transfer rates and short daily communication windows, scientists have a limited time to analyze the data and make difficult inferences about the chemistry in order to prioritize the next operations and send those instructions back to the rover. This project aimed at building a model to automatically analyze gas chromatography mass spectrometry (GCMS) data collected for Mars exploration in order to help the scientists in their analysis of understanding the past habitability of Mars. More information are available at https://mars.nasa.gov/msl/sp[...]
Free | Publicly available
This dataset contains 8,000+ brain MRIs of 2,000+ patients with brain metastases.
Free | Publicly available
MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework. The MIMIC-III dataset is freely-available. Researchers seeking to use the database must formally request access. For details, see the getting started page. Once you have a PhysioNet account, you must enable access to the MIMIC-III dataset from your AWS account. To do this, please input your AWS account number
Free | Publicly available
Collection of 7 billion small molecules in SMILES notation with 28 billion fingerprints, including MACCS, ECFP4, FCFP4, and PubChem, with pre-constructed USearch indexes over them.
Free | Publicly available
Space weather forecast and observation data is collected and disseminated by NOAA’s Space Weather Prediction Center (SWPC) in Boulder, CO. SWPC produces forecasts for multiple space weather phenomenon types and the resulting impacts to Earth and human activities. A variety of products are available that provide these forecast expectations, and their respective measurements, in formats that range from detailed technical forecast discussions to NOAA Scale values to simple bulletins that give information in laymen's terms. Forecasting is the prediction of future events, based on analysis and modeling of the past and present conditions of the environment you are interested in. In Space Weather, persistence and recurrence of active regions on the sun over the 27-day solar rotational period play an important role in accurately forecasting the space environment.
Free | Publicly available
Comprehensive, large-scale single-cell profiling of healthy human blood at different ages is one of the critical pending tasks required to establish a framework for systematic understanding of human aging. Here, using single-cell RNA/TCR/BCR-seq with protein feature barcoding (20 antibodies), we profiled 317 samples from 166 healthy individuals aged 25 to 85 years old drawn over 3-year period. Dataset spanning ~2 million cells describes 50 subpopulations of blood immune cells, with 14 subpopulations changing with age, including a novel NKG2C+ CD8 Tcm population that decreases with age. We describe age-associated accumulation of Th2 and HLA-DR+ memory CD4 T cells, CCR4+ CD8 Tcm cells and GZMK+ CD8 Tem cells. We validate key findings using 30-plex spectral cytometry panel. We characterize patterns of antigen receptor clonality across subpopulations of T and B cells and describe their age-dependence. Our work provides novel insights into healthy human aging and unique annotated resou[...]
Free | Publicly available
A centralized repository of pre-formatted BLAST databases created by the National Center for Biotechnology Information (NCBI).
Free | Publicly available
This dataset captures Sunflower's genetic diversity originating from thousands of wild, cultivated, and landrace sunflower individuals distributed across North America. The data consists of raw sequences and associated botanical metadata, aligned sequences (to three different reference genomes), and sets of SNPs computed across several cohorts.
Free | Publicly available
EMBED is a racially diverse mammography dataset containing 3.4M screening and diagnostic images from 110,000 patients collected from 2013-2020, with an equal representation of black and white women. The dataset is comprised of 2D, synthetic 2D (C-view), and 3D (digital breast tomosynthesis, i.e. DBT) images. It contains 60,000 annotated lesions linked to structured imaging descriptors and ground truth pathologic outcomes grouped into six severity classes. This release represents 20% of the total 2D and C-view dataset and is available for research use. DBT, US, and MRI exams will be added at a later date. Acknowledgements - We would like to thank Glendor, Inc and MD.ai for assistance with image de-identification.
showing 41 - 50