
About AWS Open Data Sponsorship Program
This program exists to help people discover and share data sets that are available by using AWS resources. Unless specifically stated in the applicable data set documentation, data sets available through the Registry of Open Data on AWS are not provided or maintained by AWS. Data sets are provided and maintained by a variety of third parties under a variety of licenses. Please check data set licenses and related documentation to determine if a data set may be used for you application. If you have a project using a listed data set please tell us about it at opendata@amazon.com.
AWS Open Data Sponsorship Program
AWS Open Data Sponsorship Program Products (321)
showing 271 - 280
Free | Publicly available
OpenCRAVAT is a module variant annotation tool developed by KarchinLab at Johns Hopkins. This dataset is a mirror of the OpenCRAVAT store available at https://store.opencravat.org. You can configure OpenCRAVAT to use this mirror by editing the "cravat-system.yml" file. The path to this file is in the first output line of the command "oc config system". In that file, change the value of "store_url" to "https://opencravat-store-aws.s3.amazonaws.com".
Free | Publicly available
The GATK test data resource bundle is a collection of files for resequencing human genomic data with the Broad Institute's Genome Analysis Toolkit (GATK).
Free | Publicly available
Detection of nighttime combustion (fire and gas flaring) from daily top of atmosphere data from NASA's Black Marble VNP46A1 product using VIIRS Day/Night Band and VIIRS thermal bands.
Free | Publicly available
Broad maintained human genome reference builds hg19/hg38 and decoy references.
Free | Publicly available
Therapeutically Applicable Research to Generate Effective Treatments (TARGET) is the collaborative effort of a large, diverse consortium of extramural and NCI investigators. The goal of the effort is to accelerate molecular discoveries that drive the initiation and progression of hard-to-treat childhood cancers and facilitate rapid translation of those findings into the clinic. TARGET projects provide comprehensive molecular characterization to determine the genetic changes that drive the initiation and progression of childhood cancers.The dataset contains open Clinical Supplement, Biospecimen Supplement, RNA-Seq Gene Expression Quantification, miRNA-Seq Isoform Expression Quantification, miRNA-Seq miRNA Expression Quantification data from Genomic Data Commons (GDC), and open data from GDC Legacy Archive.
Free | Publicly available
Uncompressed video used for video compression and video processing research.
Free | Publicly available
A dataset of all images of Open Food Facts, the biggest open dataset of food products in the world.
Free | Publicly available
Beat AML 1.0 is a collaborative research program involving 11 academic medical centers who worked collectively to better understand drugs and drug combinations that should be prioritized for further development within clinical and/or molecular subsets of acute myeloid leukemia (AML) patients. Beat AML 1.0 provides the largest-to-date dataset on primary acute myeloid leukemia samples offering genomic, clinical, and drug response. This dataset contains open Clinical Supplement and RNA-Seq Gene Expression Quantification data. This dataset also contains controlled Whole Exome Sequencing (WXS) and RNA-Seq Aligned Reads, WXS Annotated Somatic Mutation, WXS Raw Somatic Mutation, and RNA-Seq Splice Junction Quantification
Free | Publicly available
The Human Sleep Project (HSP) sleep physiology dataset is a growing collection of clinical polysomnography (PSG) recordings. Beginning with PSG recordings from from ~15K patients evaluated at the Massachusetts General Hospital, the HSP will grow over the coming years to include data from >200K patients, as well as people evaluated outside of the clinical setting. This data is being used to develop CAISR (Complete AI Sleep Report), a collection of deep neural networks, rule-based algorithms, and signal processing approaches designed to provide better-than-human detection of conventional PSG scoring metrics, including sleep stages, arousals, apnea and hypopnea events and their subtypes, and periodic limb movements. Beyond conventional scoring, the HSP dataset is intended to support research seeking to identify "hidden" information within the brain's activity during sleep that can be used to directly measure brain health. These brain health indicators include measures of risk for co[...]
Free | Publicly available
This dataset consists of images of glioblastoma human brain tumor tissue sections that have been probed for expression of particular genes believed to play a role in development of the cancer. Each tissue section is adjacent to another section that was stained with a reagent useful for identifying histological features of the tumor. Each of these types of images has been completely annotated for tumor features by a machine learning process trained by expert medical doctors.
showing 271 - 280