A distributional semantics analysis of the two English suffixes -ity and -ness
This repository contains data and the scripts used for the following paper:
Schäfer, M. (accepted with minor revisions). The role of meaning in the rivalry of -ity and -ness: evidence from distributional semantics. [to be published in English Language and Linguistics]
Python scripts are used for all distributional semantics analyses, including the t-SNE dimension reduction and subsequent visualization and LDA. R scripts are used for further statistical analyses and figures. More information on the files is provided in the README file.
The scripts build on the ukWaC corpus (see Baroni et al. 2009) and the pretrained vectorspaces published with Mikolov et al. (2017), see the links below.
Required corpus [Links last checked 2024-07-16]
ukWaC: https://wacky.sslmit.unibo.it/doku.php?id=corpora
Required pretrained vectorspaces [Links last checked 2024-07-16]:
fasttext vectorspace without subword information:
File "wiki-news-300d-1M.vec.zip" from
https://fasttext.cc/docs/en/english-vectors.html
References
Baroni, M. & S. Bernardini & A. Ferraresi & E. Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation 43 (3): 209-226.
Mikolov, T., E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin (2017). Advances in pre-
training distributed word representations. CoRR abs/1712.09405.