ColloCaid Sample Data
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
COLLOCAID SAMPLE DATA
The ColloCaid Sample Data comprises approximately 2% of the ColloCaid lexical database. The sample covers 692 strong academic English collocations (LogDice >5.0) for 16 core academic lemmas used as collocation bases (or nodes): 5 nouns, 5 verbs, and 6 adjectives. The selection aims to give an overview of the range of data included in the full dataset. This includes collocations with bases classified with more than one part-of-speech tag (e.g. DEBATE, INDIVIDUAL), polysemous collocation bases giving rise to distinct collocation patterns (e.g. CODE), as well as collocation bases that evoke a very large and a very small number of collocations. The strongest eight lexical collocations listed for each base are enriched with three different curated example sentences adapted from corpora of expert academic English writing.
COLLOCAID LEXICAL DATA 1.0
The full ColloCaid lexical dataset consists of:
• 572 core academic English lemmas
• 32,655 academic collocations with the above lemmas
• 29,055 example sentences of collocations in context
Further information at http://www.collocaid.uk/