Integrating Morphology with Multi-word Expression Processing in Turkish

posted on 01.07.2004, 00:00 by Kemal OflazerKemal Oflazer, Ozlem Cetinoglu, Bilge Say
This paper describes a multi-word expression processor for preprocessing Turkish text for various language engineering applications. In addition to the fairly standard set of lexicalized collocations and multi-word expressions such as named-entities, Turkish uses a quite wide range of semi-lexicalized and non-lexicalized collocations. After an overview of relevant aspects of Turkish, we present a description of the multi-word expressions we handle. We then summarize the computational setting in which we employ a series of components for tokenization, morphological analysis, and multi-word expression extraction. We finally present results from runs over a large corpus and a small gold-standard corpus.


Published in Second ACL Workshop on Multiword Expressions: Integrating Processing, July 2004, pp. 64-71, Barcelona, Spain



