
Phrase Clustering Dataset (PCD)
Provided by: Amazon, part of the Open Data
Provided by: Amazon, part of the Open Data

Phrase Clustering Dataset (PCD)
Provided by: Amazon, part of the Open Data
Provided by: Amazon, part of the Open Data
This product is part of the Open Data and contains data sets that are publicly available for anyone to access and use. No subscription is required. Unless specifically stated in the applicable data set documentation, data sets available through the Open Data are not provided and maintained by AWS.
Description
This dataset is part of the paper "McPhraSy: Multi-Context Phrase Similarity and Clustering" by DN Cohen et al (2022). The purpose of PCD is to evaluate the quality of semantic-based clustering of noun phrases. The phrases were collected from the Amazon Review Dataset .
License
This data is available for anyone to use under the terms of the CDLA-permissive license, which is available here
How to cite
Phrase Clustering Dataset (PCD) was accessed on DATE
from https://registry.opendata.aws/pcd .
Update frequency
Not updated
General AWS Data Exchange support
Resources on AWS
Description
Phsrase Clustering Dataset (PCD)
Resource type
S3 Bucket
Amazon Resource Name (ARN)
arn:aws:s3:::amazon-phrase-clustering
AWS Region
us-west-2
AWS CLI Access (No AWS account required)
aws s3 ls --no-sign-request s3://amazon-phrase-clustering/
Usage examples
Publications
- McPhraSy: Multi context phrase similarity and clustering by Amir DN Cohen, Hila Gonen, Ori Shapira, Ran Levy, and Yoav Goldberg