ColloCaidDataSample.zip (72.32 kB)

ColloCaid Sample Data

Download (72.32 kB)
dataset
posted on 02.10.2020 by Ana Frankenberg-Garcia, Geraint Paul Rees, Robert Lew

COLLOCAID SAMPLE DATA

The ColloCaid Sample Data comprises approximately 2% of the ColloCaid lexical database. The sample covers 692 strong academic English collocations (LogDice >5.0) for 16 core academic lemmas used as collocation bases (or nodes): 5 nouns, 5 verbs, and 6 adjectives. The selection aims to give an overview of the range of data included in the full dataset. This includes collocations with bases classified with more than one part-of-speech tag (e.g. DEBATE, INDIVIDUAL), polysemous collocation bases giving rise to distinct collocation patterns (e.g. CODE), as well as collocation bases that evoke a very large and a very small number of collocations. The strongest eight lexical collocations listed for each base are enriched with three different curated example sentences adapted from corpora of expert academic English writing.

COLLOCAID LEXICAL DATA 1.0

The full ColloCaid lexical dataset consists of:

• 572 core academic English lemmas

• 32,655 academic collocations with the above lemmas

• 29,055 example sentences of collocations in context

Further information at http://www.collocaid.uk/


Funding

AHRC AH/P003508/1

History

Licence

Exports