W04-0409.pdf (206.96 kB)
Integrating Morphology with Multi-word Expression Processing in Turkish
journal contribution
posted on 2004-07-01, 00:00 authored by Kemal OflazerKemal Oflazer, Ozlem Cetinoglu, Bilge SayThis paper describes a multi-word expression processor
for preprocessing Turkish text for various
language engineering applications. In addition to
the fairly standard set of lexicalized collocations
and multi-word expressions such as named-entities,
Turkish uses a quite wide range of semi-lexicalized
and non-lexicalized collocations. After an overview
of relevant aspects of Turkish, we present a description
of the multi-word expressions we handle. We
then summarize the computational setting in which
we employ a series of components for tokenization,
morphological analysis, and multi-word expression
extraction. We finally present results from runs over
a large corpus and a small gold-standard corpus.