This project proposes a computational, distributional semantic approach (DSM) with Vector Space Model (VSM) in exploring semantic cluster and (dis)similarity between a set of Indonesian denominal verbs with three verbal morphological schemas (i.e., me-, me- -kan, and me- -i). The VSM approach captures the semantics of the verbs from their co-occurrence properties in texts (i.e. from the words co-occurring in either side of a given denominal verb in large collection of Indonesian Leipzig Corpora). In particular we use "word2vec" (developed by Thomas Mikolov and colleagues) to create VSM from Indonesian Leipzig Corpora. We are interested to see how the verbs cluster given their distributional properties (e.g., whether we find verb cluster of certain semantic type [e.g., PSYCH an MOTION verbs], or whether there are split between verbs of the same root but different morphological schemas). The additional layer of morphology in the analysis are relevant to the description of the suffix -i and -kan in an Indonesian grammar textbook (Sneddon et al, 2010). One of the views is that there are a set of verbs (of the same root) occurring with both -i and -kan whose semantics are clearly different; our study detects those -i and -kan verbs that are split in the cluster (we plot it as dendrogram based on Hierarchical Cluster Analysis), such as membuahi vs. membuahkan; mengatai vs. mengatakan; melangkahi vs. melangkahkan. Their semantic differences are characterised by differences in the semantic domain of their co-occurring words. Another view is that there are -i and -kan verbs of the same root form that differ only at the arrangement of their arguments (rather than the semantic domain the verbs referring to). We found such -i and -kan verbs that fall within the same cluster, reflecting their similar co-occurrence profiles (e.g., mewariskan & mewarisi cluster together; mendasar, mendasari, mendasarkan also cluster together; other similar cases include the cluster for menempati, menempatkan, and menempat). The paper we are working on (under review for a special issue in NUSA journal since 15 March 2019) also addresses a number of issues regarding orthographical relics in the input corpora (which are raw, unannotated texts) that influence the automatic morphological parser (MorphInd) providing input verbs for the VSM of those verbs. We demonstrate how distance relation captured in VSM (i.e. using the nearest neighbours technique) can be used to resolve orthographic anomalies in our data. Watch this space for the data and R Notebook for the analyses in the paper.
Cite items from this project
DataCiteDataCite
3 Biotech3 Biotech
3D Printing in Medicine3D Printing in Medicine
3D Research3D Research
3D-Printed Materials and Systems3D-Printed Materials and Systems
4OR4OR
AAPG BulletinAAPG Bulletin
AAPS OpenAAPS Open
AAPS PharmSciTechAAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität HamburgAbhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)ABI Technik (German)
Academic MedicineAcademic Medicine
Academic PediatricsAcademic Pediatrics
Academic PsychiatryAcademic Psychiatry
Academic QuestionsAcademic Questions
Academy of Management DiscoveriesAcademy of Management Discoveries
Academy of Management JournalAcademy of Management Journal
Academy of Management Learning and EducationAcademy of Management Learning and Education
Academy of Management PerspectivesAcademy of Management Perspectives
Academy of Management ProceedingsAcademy of Management Proceedings
Academy of Management ReviewAcademy of Management Review