1/1
4 files

Paraphrase choice based on user traits

dataset
posted on 30.11.2015, 02:40 by Daniel Preotiuc-Pietro, Wei Xu, Lyle Ungar

PPDB paraphrase pairs and clusters with their associated usage score across three user traits:
a. Gender: male or female
b. Age: <25 or >30
c. Occupational Class: low or high

Contents:
frequencies.tar.gz - contains the raw frequency statistics for all phrases and each trait
pairs.tar.gz - contains files with pairwise usage scores for each trait
clusters.tar.gz - contains files with cluster usage scores for each trait

In pairs and clusters, the negative values are phrases which are more associated with: females, lower occupational class and users over 30 years old.

 If you are using this dataset, please reference our work:

@inproceedings{paraphrase16aaai,
author = {Preo\c{t}iuc-Pietro, Daniel and Xu, Wei and Ungar, Lyle},
title = {{Discovering user attribute stylistic differences via paraphrasing}},
booktitle = {{Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence}},
series = {AAAI},
year = {2016}
}

 

 

History

Licence

Exports

Licence

Exports