8 files

Quantitative and distributional analyses of English out-prefixation

online resource
posted on 2022-09-09, 12:56 authored by Martin SchäferMartin Schäfer, Sven Kotowski

This repository contains data and the scripts used for the following paper:

Sven Kotowski & Martin Schäfer (2022). Semantic relatedness across base verbs and derivatives. Quantitative and distributional analyses of English out-prefixation

This includes corpus data and annotations used in studies 1 and 2, and the full sets of Python and R scripts used for studies 3 and 4. For those two studies, Python scripts are used to create the vectors and calculate the cosine similarities, and R scripts are used for the statistical analysis and to produce the figures. 

More information on the files is provided in the README file. Please also see the README file for all further details.

Required other resources (see README for details) [Links last checked 2022-07-22]:



baroni vectorspace: File "baroni.rda" from


Baroni, M. & S. Bernardini & A. Ferraresi & E. Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation 43 (3): 209-226. 

Baroni, M., G. Dinu, and G. Kruszewski (2014a). Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference 1, 238–247.

Davies, M. (2018). The 14 billion word iweb corpus: Available online at


This work was funded/partially funded by DFG project PL 151/11-1 ‘Semantics of derivational morphology’ (Sven Kotowski) and SFB 833 ‘The Construction of Meaning: The Dynamics and Adaptivity of Linguistic Structures’ (Martin Schäfer)