P12-2035.pdf (223.62 kB)
Transforming Standard Arabic to Colloquial Arabic
journal contribution
posted on 2012-07-08, 00:00 authored by Emad Mohamed, Behrang Mohit, Kemal OflazerKemal OflazerWe present a method for generating Colloquial
Egyptian Arabic (CEA) from morphologically disambiguated
Modern Standard Arabic (MSA).
When used in POS tagging, this process improves
the accuracy from 73.24% to 86.84% on unseen
CEA text, and reduces the percentage of out-of vocabulary
words from 28.98% to 16.66%. The
process holds promise for any NLP task targeting
the dialectal varieties of Arabic; e.g., this approach
may provide a cheap way to leverage MSA data
and morphological resources to create resources
for colloquial Arabic to English machine translation.
It can also considerably speed up the annotation
of Arabic dialects.