Free | Publicly available
VirtualFlow Versions of Ligand Libraries in Ready-To-Dock Format
This program exists to help people discover and share data sets that are available by using AWS resources. Unless specifically stated in the applicable data set documentation, data sets available through the Registry of Open Data on AWS are not provided or maintained by AWS. Data sets are provided and maintained by a variety of third parties under a variety of licenses. Please check data set licenses and related documentation to determine if a data set may be used for you application. If you have a project using a listed data set please tell us about it at opendata@amazon.com.
Free | Publicly available
VirtualFlow Versions of Ligand Libraries in Ready-To-Dock Format
Free | Publicly available
The Solid State Imager (SSI) on NASA's Galileo spacecraft acquired more than 500 images of Jupiter's moon, Europa. These images vary from relatively low-resolution hemispherical imaging, to high-resolution targeted images that cover a small portion of the surface. Here we provide a set of 481 minimally processed, projected Galileo images with photogrammetrically improved locations on Europa's surface. These individual images were subsequently used as input into a set of 92 observation mosaics. These images provide users with nearly the entire Galileo Europa imaging dataset at its native resolution and with improved relative image locations. The Solid State Imager on NASA's Galileo spacecraft provided the only moderate- to high-resolution images of Jupiter's moon, Europa. Unfortunately, uncertainty in the position and pointing of the spacecraft, as well as the position and orientation of Europa, when the images were acquired resulted in significant errors in image locati[...]
Free | Publicly available
This dataset, produced by Impact Observatory, Microsoft, and Esri, displays a global map of land use and land cover (LULC) derived from ESA Sentinel-2 imagery at 10 meter resolution for the years 2017 - 2023. Each map is a composite of LULC predictions for 9 classes throughout the year in order to generate a representative snapshot of each year. This dataset was generated by Impact Observatory, which used billions of human-labeled pixels (curated by the National Geographic Society) to train a deep learning model for land classification. Each global map was produced by applying this model to the Sentinel-2 annual scene collections from the Mircosoft Planetary Computer. Each of the maps has an assessed average accuracy of over 75%. These maps have been improved from Impact Observatory’s previous release and provide a relative reduction in the amount of anomalous change between classes, particularly between “Bare” and any of the vegetative classes “Trees,” “Crops,” “Flooded V[...]
Free | Publicly available
The OHSU-CNL study offers the whole exome and RNA-sequencing on a cohort of 100 cases with rare hematologic malignancies such as Chronic neutrophilic leukemia (CNL), atypical chronic myeloid leukemia (aCML), and unclassified myelodysplastic syndrome/myeloproliferative neoplasms (MDS/MPN-U). This dataset contains open RNA-Seq Gene Expression Quantification data.
Free | Publicly available
Pre-built refgenie reference genome data assets used for aligning and analyzing DNA sequence data.
Free | Publicly available
The Singapore Nanopore Expression (SG-NEx) project is an international collaboration to generate reference transcriptomes and a comprehensive benchmark data set for long read Nanopore RNA-Seq. Transcriptome profiling is done using PCR-cDNA sequencing (PCR-cDNA), amplification-free cDNA sequencing (direct cDNA), direct sequencing of native RNA (direct RNA), and short read RNA-Seq. The SG-NEx core data includes 5 of the most commonly used cell lines and it is extended with additional cell lines and samples that cover a broad range of human tissues. All core samples are sequenced with at least 3 high quality replicates. For a subset of samples spike-in RNAs are used and matched m6A profiling data is available.
Free | Publicly available
The lunar orbiter laser altimeter (LOLA) has collected and released almost 7 billion individual laser altimeter returns from the lunar surface. This dataset includes individual altimetry returns scraped from the Planetary Data System (PDS) LOLA Reduced Data Record (RDR) Query Tool, V2.0. Data are organized in 15˚ x 15˚ (longitude/latitude) sections, compressed and encoded into the Cloud Optimized Point Cloud (COPC) file format, and collected into a Spatio-Temporal Asset Catalog (STAC) collection for query and analysis. The data are in latitude, longitude, and radius (X, Y, Z) format with the proper IAU 2015 30100 well-known text projection string. These data are in the -180 to 180, center longitude 0 domain. Users of this data set are encouraged to use the Point Data Abstract Library (PDAL) STAC driver to query at the collection level or the COPC driver to query individual COPC tiles within the dataset. Queries of these data using bounding boxes, buffered points, or other geometri[...]
Free | Publicly available
CZ CELLxGENE Discover (cellxgene.cziscience.com) is a free-to-use platform for the exploration, analysis, and retrieval of single-cell data. CZ CELLxGENE Discover hosts the largest aggregation of standardized single-cell data from the major human and mouse tissues, with modalities that include gene expression, chromatin accessibility, DNA methylation, and spatial transcriptomics. This year, CZ CELLxGENE Discover has made available all of its human and mouse RNA single-cell data through Census (https://chanzuckerberg.github.io/cellxgene-census/) – a free-to-use service with an API and data that allows for querying its single-cell data corpus directly from Python or R. The API uses a new technology, TileDB-SOMA, that allows for efficient and low-latency querying. The data are fully standardized and hosted publicly for free access, and they are composed by a count matrix of tens of millions of cells (observations) by >60 k genes (features) accomp[...]
Free | Publicly available
ParaCrawl is a set of large parallel corpora to/from English for all official EU languages by a broad web crawling effort. State-of-the-art methods are applied for the entire processing chain from identifying web sites with translated text all the way to collecting, cleaning and delivering parallel corpora that are ready as training data for CEF.AT and translation memories for DG Translation.
Free | Publicly available
An open multi-sensor dataset for autonomous driving research. This dataset comprises semantically segmented images, semantic point clouds, and 3D bounding boxes. In addition, it contains unlabelled 360 degree camera images, lidar, and bus data for three sequences. We hope this dataset will further facilitate active research and development in AI, computer vision, and robotics for autonomous driving.
showing 141 - 150