Free | Publicly available
PhysioNet offers free web access to large collections of recorded physiologic signals (PhysioBank) and related open-source software (PhysioToolkit).
This program exists to help people discover and share data sets that are available by using AWS resources. Unless specifically stated in the applicable data set documentation, data sets available through the Registry of Open Data on AWS are not provided or maintained by AWS. Data sets are provided and maintained by a variety of third parties under a variety of licenses. Please check data set licenses and related documentation to determine if a data set may be used for you application. If you have a project using a listed data set please tell us about it at opendata@amazon.com.
Free | Publicly available
PhysioNet offers free web access to large collections of recorded physiologic signals (PhysioBank) and related open-source software (PhysioToolkit).
Free | Publicly available
The real-changesets is an augmented representation of OpenStreetMap changesets in JSON format. It contains the current and the previous version of each feature in a changeset. It's primary used by OSMCha, the main OpenStreetMap validation tool, to have a visualization of the changeset and provide to the user the understanding of what was changed on the map. The real-changesets are created by combining the changeset metadata and the augmented diff generated by overpass.
Free | Publicly available
Open-Meteo integrates weather models from reputable national weather services, offering a swift and efficient weather API. Real-time weather forecasts are unified into a time-series database that provides historical and future weather data for any location worldwide. Through Open-Meteo on AWS Open Data, you can download the Open-Meteo weather database and analysis weather data locally. Docker images are provided to download data and to expose an HTTP API endpoint. Using Open-Meteo SDKs, you can seamlessly integrate weather data into your Python, Typescript, Swift, Kotlin, or Java applications. The entire source code is open-source, and contributions are welcome! To get started, familiarize yourself with the available weather models and explore tutorials on downloading 80 years of historical weather data from ERA5 or set up your own real-time weather API
Free | Publicly available
The Molecular Profiling to Predict Response to Treatment (MP2PRT) program is part of the NCI's Cancer Moonshot Initiative. The aim of this program is the retrospective characterization and analysis of biospecimens collected from completed NCI-sponsored trials of the National Clinical Trials Network and the NCI Community Oncology Research Program. This study, titled "Identification of Genetic Changes Associated with Relapse and/or Adaptive Resistance in Patients Registered as Favorable Histology Wilms Tumor on AREN03B2", performs genomic characterization (WGS 30X, Total RNAseq, miRNAseq) on a discovery set of 70 trio cases (normal tissue, tumor tissue at time of diagnosis, tumor tissue at time of relapse) from patients who relapsed with Favorable Histology Wilms Tumor. Prioritized findings from the discovery set will be validated using Targeted Sequencing in an independent validation set of 47 relapse samples. The MP2PRT study is made available on AWS via the NIH STRIDES Initiative[...]
Free | Publicly available
The Public Utility Data Liberation Project (PUDL) provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists. PUDL is an open source data processing pipeline that makes US energy data easier to access and use programmatically. Hundreds of gigabytes of valuable data are published by US government agencies, but it's often difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation. This information allows users to explore the operating costs of individual power plants, and see how fuel costs impact the viability of different types of generation. It can highlight the competitiveness of renewable electricity in the market today. It can show how the generation mix of different util[...]
Free | Publicly available
Some of the most important datasets for image localization research, including Camvid and PASCAL VOC (2007 and 2012). This is part of the fast.ai datasets collection hosted by AWS for convenience of fast.ai students. See documentation link for citation and license details for each dataset.
Free | Publicly available
Database for use with Kraken2 (taxonomic annotation of metagenomic sequencing reads) including all NCBI RefSeq genomes available in release V205
Free | Publicly available
A collection of 51,701 product pages from 8175 e-commerce websites across 8 markets (US, GB, SE, NL, FI, NO, DE, AT) with 5 manually labelled elements, specifically, the product price, name and image, add-to-cart and go-to-cart buttons. The dataset was collected between 2018 and 2019 and is made available has MHTML and as WebTraversalLibrary-format snapshots.
Free | Publicly available
The 3000 Rice Genome Project is an international effort to sequence the genomes of 3,024 rice varieties from 89 countries.
Free | Publicly available
recount3 is an online resource consisting of RNA-seq gene, exon, and exon-exon junction counts as well as coverage bigWig files for 8,679 and 10,088 different studies for human and mouse respectively. It is the third generation of the ReCount project and part of recount.bio. recount2 is also included for historical purposes. The pipeline used to generate the data in recount3 (but not recount2) is available here.
showing 61 - 70