Free | Publicly available
Preprocessed databases for use with the Hecatomb pipeline for viral and phage sequence annotation.
This program exists to help people discover and share data sets that are available by using AWS resources. Unless specifically stated in the applicable data set documentation, data sets available through the Registry of Open Data on AWS are not provided or maintained by AWS. Data sets are provided and maintained by a variety of third parties under a variety of licenses. Please check data set licenses and related documentation to determine if a data set may be used for you application. If you have a project using a listed data set please tell us about it at opendata@amazon.com.
Free | Publicly available
Preprocessed databases for use with the Hecatomb pipeline for viral and phage sequence annotation.
Free | Publicly available
The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File (2023-06-30) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022] https://doi.org/10.1162/99608f92.529e3cb9 , and implemented in https://github.com/uscensusbureau/DAS2020RedistrictingProductionCode). The NMF was produced using the official “production settings,” the final set of algorithmic parameters and privacy-loss budget allocations, that were used to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File and the 2020 Census Demographic and Housing Characteristics File. The NMF consists of the full set of privacy-protected statistical queries (counts of individuals or housing units with particular combinations of characteristics) of confidential 2010 Census data relating to the 2010 Demonstration Data Products Suite – Redistricting (P.L. 94-171) and De[...]
Free | Publicly available
DANDI is a public archive of neurophysiology datasets, including raw and processed data, and associated software containers. Datasets are shared according to a Creative Commons CC0 or CC-BY licenses. The data archive provides a broad range of cellular neurophysiology data. This includes electrode and optical recordings, and associated imaging data using a set of community standards: NWB:N - NWB:Neurophysiology, BIDS - Brain Imaging Data Structure, and NIDM - Neuro Imaging Data Model. Development of DANDI is supported by the National Institute of Mental Health.
Free | Publicly available
The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), aims to generate comprehensive, multi-dimensional maps of the key genomic changes in major types and subtypes of cancer. TCGA has analyzed matched tumor and normal tissues from 11,000 patients, allowing for the comprehensive characterization of 33 cancer types and subtypes, including 10 rare cancers. The dataset contains open Clinical Supplement, Biospecimen Supplement, RNA-Seq Gene Expression Quantification, miRNA-Seq Isoform Expression Quantification, miRNA Expression Quantification, Genotyping Array Copy Number Segment, Genotyping Array Masked Copy Number Segment, Genotyping Array Gene Level Copy Number Scores, and WXS Masked Somatic Mutation data from Genomic Data Commons (GDC). This dataset also contains controlled Whole Exome Sequencing (WXS), RNA-Seq, miRNA-Seq, ATAC-Seq Aligned Reads, WXS Annotated Somatic Mutation, WXS Raw Somat[...]
Free | Publicly available
Various kinds of weather raw data and charts from Central Weather Administration.
Free | Publicly available
The Sequence Read Archive (SRA), produced by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) at the National Institutes of Health (NIH), stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms. The SRA provides open access to these biological sequence data to support the research community's efforts to enhance reproducibility and make new discoveries by comparing data sets. Buckets in this registry contain public SRA data in the original (user submitted) format from select high value and newly-released studies as well as all public-access SRA formatted ETL+BQS data. Also included is all SRA metadata that can be leveraged for attribute-based data discovery.
Free | Publicly available
The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. The CCLE provides public access to genomic data, visualization and analysis for over 1100 cancer cell lines. This dataset contains RNA-Seq Aligned Reads, WXS Aligned Reads, and WGS Aligned Reads data.
Free | Publicly available
Japanese Tokenizer Dictionaries for use with MeCab.
Free | Publicly available
1,076 textbook lessons, 26,260 questions, 6229 images
Free | Publicly available
Canopy Tree Height maps for California in 2020. Created using a deep learning model on very-high-resolution airborne imagery from the National Agriculture Imagery Program (NAIP) by United States Department of Agriculture (USDA).
showing 281 - 290