Simulating Grammatical Encoding with Multi-Label Decision Tree Classifiers - EMCL Thesis 2018 - Data

dataset

posted on 2018-08-21, 07:52 authored by Atilla AtasoyAtilla Atasoy

Data are accompanying the European Master's Thesis by Atilla Atasoy from August 2018, titled 'Simulating Grammatical Encoding with Multi-Label Decision Tree Classifiers - An Investigation into the Loci of Impairment in Turkish-Speaking Individuals with Agrammatism'.

All data is provided in comma-seperated text files (txt) or comma-seperated values (csv). Six files are provided:

1. Binary input strings (preverbal message and lemma)

2. Binary output strings (surface structure)

3. Semantic frame per verb root

4. Appended, binary data (input and output)

5. Input string template with example verb koy- 'to put'

6. Output string template with example verb koy- 'to put'

7. Raw quantitative results

8. Raw qualitative results¹

Note: Due to the large number of rows (i.e., the high number of verb forms encoded), the textfiles cannot be opened with regular spreadsheet programs (e.g., Microsoft Excel). Please use software specially developed for big data (e.g., EmEditor).

¹ In the raw qualitative results, duplicate headers need to be removed. This is easily accomplished in R, using the duplicated or unique function.