Quantitative and distributional analyses of English out-prefixation
This repository contains data and the scripts used for the following paper:
Sven Kotowski & Martin Schäfer (2022). Semantic relatedness across base verbs and derivatives. Quantitative and distributional analyses of English out-prefixation
This includes corpus data and annotations used in studies 1 and 2, and the full sets of Python and R scripts used for studies 3 and 4. For those two studies, Python scripts are used to create the vectors and calculate the cosine similarities, and R scripts are used for the statistical analysis and to produce the figures.
More information on the files is provided in the README file. Please also see the README file for all further details.
Required other resources (see README for details) [Links last checked 2022-07-22]:
ukWaC: https://wacky.sslmit.unibo.it/doku.php?id=corpora
iWeb: https://corpus.byu.edu/iweb/
baroni vectorspace: File "baroni.rda" from https://sites.google.com/site/fritzgntr/software-resources/semantic_spaces
References
Baroni, M. & S. Bernardini & A. Ferraresi & E. Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation 43 (3): 209-226.
Baroni, M., G. Dinu, and G. Kruszewski (2014a). Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference 1, 238–247.
Davies, M. (2018). The 14 billion word iweb corpus: Available online at https://corpus.byu.edu/iweb/.