baza_engleskih_rijeci_u_hrvatskome (1).xlsx (383.9 kB)

The database of English words in Croatian.xlsx

dataset

posted on 2022-06-07, 11:14 authored by Irena BogunovićIrena Bogunović, Mario Kučić

To build a dataset to train and test the model, 60,000 words were manually labelled according to language membership by three independent evaluators. N-gram feature representation was used in combination with a linear Support Vector Machine classification algorithm (SVM) (Smola & Schölkopf, 2004) to extract English words from the ENGRI corpus (Bogunović & Kučić, 2021; Kučić, 2021). An F1 score of 0.9669 was achieved on the test set. The database contains 9,453 English words as well as their absolute and relative frequencies.

Funding

English words in Croatian: Identification, affective-semantic norming and investigation into cognitive processing via behavioural and neuroscientific methods

Croatian Science Foundation

Find out more...

The database of English words in Croatian.xlsx

Funding

English words in Croatian: Identification, affective-semantic norming and investigation into cognitive processing via behavioural and neuroscientific methods

History

Usage metrics

Categories

Keywords

Licence

Exports