
Overview
This dataset is part of the paper "McPhraSy: Multi-Context Phrase Similarity and Clustering" by DN Cohen et al (2022). The purpose of PCD is to evaluate the quality of semantic-based clustering of noun phrases. The phrases were collected from the [Amazon Review Dataset] (https://nijianmo.github.io/amazon/ ).
Features and programs
Open Data Sponsorship Program
Pricing
This is a publicly available data set. No subscription is required.
How can we make this page better?
Legal
Content disclaimer
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Open data resources
Available with or without an AWS account.
- How to use
- To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more
- Description
- Phsrase Clustering Dataset (PCD)
- Resource type
- S3 bucket
- Amazon Resource Name (ARN)
- arn:aws:s3:::amazon-phrase-clustering
- AWS region
- us-west-2
- AWS CLI access (No AWS account required)
- aws s3 ls --no-sign-request s3://amazon-phrase-clustering/
Resources
Vendor resources
Support
Contact
Post any questions to re:Post and use the AWS Open Data tag.
Managed By
How to cite
Phrase Clustering Dataset (PCD) was accessed on DATE from https://registry.opendata.aws/pcd .
License
This data is available for anyone to use under the terms of the CDLA-permissive license, which is available here
Similar products

