figshare
Browse

A distributional semantics analysis of the two English suffixes -ity and -ness

online resource
posted on 2024-07-16, 12:39 authored by Martin SchäferMartin Schäfer

This repository contains data and the scripts used for the following paper:

Schäfer, M. (accepted with minor revisions). The role of meaning in the rivalry of -ity and -ness: evidence from distributional semantics. [to be published in English Language and Linguistics]


Python scripts are used for all distributional semantics analyses, including the t-SNE dimension reduction and subsequent visualization and LDA. R scripts are used for further statistical analyses and figures. More information on the files is provided in the README file.

The scripts build on the ukWaC corpus (see Baroni et al. 2009) and the pretrained vectorspaces published with Mikolov et al. (2017), see the links below.

Required corpus [Links last checked 2024-07-16]

ukWaC: https://wacky.sslmit.unibo.it/doku.php?id=corpora


Required pretrained vectorspaces [Links last checked 2024-07-16]:

fasttext vectorspace without subword information:
File "wiki-news-300d-1M.vec.zip" from
https://fasttext.cc/docs/en/english-vectors.html


References

Baroni, M. & S. Bernardini & A. Ferraresi & E. Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation 43 (3): 209-226.

Mikolov, T., E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin (2017). Advances in pre-
training distributed word representations. CoRR abs/1712.09405.



Funding

This work was partially funded by DFG project PL 151/11-1 ‘The semantics of derivational morphology'

History