- Sold by Open Data on AWS Data Exchange
Free | Publicly available
A collection of product reviews summaries automatically generated by PASS for 32 Amazon products from the FewSum dataset
Unless specifically stated in the applicable data set documentation, data sets available through the Registry of Open Data on AWS are not provided or maintained by AWS. Data sets are provided and maintained by a variety of third parties under a variety of licenses. Please check data set licenses and related documentation to determine if a data set may be used for you application
Free | Publicly available
A collection of product reviews summaries automatically generated by PASS for 32 Amazon products from the FewSum dataset
Free | Publicly available
This dataset provides how-to articles from wikihow.com and their summaries, written as a coherent paragraph. The dataset itself is available at wikisum.zip, and contains the article, the summary, the wikihow url, and an official fold (train, val, or test). In addition, human evaluation results are available at wikisum-human-eval.zip. It consists of human evaluation of the summary of the Pegasus system, annotators response regarding the difficulty of the task, and words they marked as unknown.
Free | Publicly available
A collection of sentences extracted from customer reviews labeled with their helpfulness score.
Free | Publicly available
The Registry of Open Data on AWS contains publicly available datasets that are available for access from AWS resources. Note that datasets in this registry are available via AWS resources, but they are not provided by AWS; these datasets are owned and maintained by a variety of government organizations, researchers, businesses, and individuals. This dataset contains derived forms of the data in https://github.com/awslabs/open-data-registry that have been transformed for ease of use with machine interfaces. Currently, only the ndjson form of the registry is populated here.
Free | Publicly available
When sellers need help from Amazon, such as how to create a listing, they often reach out to Amazon seller support through email, chat or phone. For each contact, we assign an intent so that we can manage the request more easily. The data we present in this release includes 548k contacts with 118 intents from 70k sellers sampled from recent years. There are 3 columns. 1. De-identified seller id - selleridanon; 2. Noisy inter-arrival time in the unit of hour between contacts - interarrivaltimehrnoisy; 3. An integer that represents the contact intent - contactintent. Note that, to balance the need between data anonymization and usefulness, we randomly perturbed the interarrival time in an intricate way such that the temporal pattern are preserved and seller identity are anonymized to the largest extent. We also note that for each selleridanon, the interarrivaltimehrnoisy are already arranged in chonological order, the first contactintentidanon is always the origin when[...]
Free | Publicly available
See https://lowcontext-ner-gaz.s3.amazonaws.com/readme.html
Free | Publicly available
The MegaScenes Dataset is an extensive collection of around 430k scenes, featuring over 100k structure-from-motion reconstructions and over 2 million registered images. MegaScenes includes a diverse array of scenes, such as minarets, building interiors, statues, bridges, towers, religious buildings, and natural landscapes. The images of these scenes are captured under varying conditions, including different times of day, various weather and illumination, and from different devices with distinct camera intrinsics.
Free | Publicly available
The State of Vermont has partnered with Amazon's Open Data Initative to make a wide range of geospatial data available in the public domain. Vermont acquires aerial imagery and LiDAR during leaf-off conditions. The imagery typically ranges from 30-centimeter to 15-centimeter in resolution and is available from Vermont's Amazon S3 bucket in a Cloud Optimized GeoTiff (COG) format. LiDAR data has been acquired and is available as USGS Quality Level-1 (QL1) and Level-2 (QL2) compliant datasets in COG format. Geospatial datasets derived from imagery and/or lidar are also available as COGs, including high resolution landcover.
Free | Publicly available
DE-SynPUF is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,300,000 persom (2.3m) data sets in the OMOP Common Data Model format. The DE-SynPUF was created with the goal of providing a realistic set of claims data in the public domain while providing the very highest degree of protection to the Medicare beneficiaries’ protected health information. The purposes of the DE-SynPUF are to: allow data entrepreneurs to develop and create software and applications that may eventually be applied to actual CMS claims data; train researchers on the use and complexity of conducting analyses with CMS claims data prior to initiating the process to obtain access to actual CMS data; and, support safe data mining innovations that may reveal unanticipated knowledge gains while preserving beneficiary privacy. The files have been designed so that programs and procedures created on the DE-SynPUF will function on CMS Limited[...]
Free | Publicly available
This both the original .tfrecords and a Parquet representation of the YouTube 8 Million dataset. YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. It comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This dataset also includes the YouTube-8M Segments data from June 2019. This dataset is 'Lakehouse Ready'. Meaning, you can query this data in-place straight out of the Registry of Open Data S3 bucket. Deploy this dataset's corresponding CloudFormation template to create the AWS Glue Catalog entries[...]
showing 21 - 30