figshare
Browse
file.pdf (114.72 kB)

The Impact of Arabic Morphological Segmentation on Broad-coverage English-to-Arabic Statistical Machine Translation

Download (114.72 kB)
journal contribution
posted on 2010-10-01, 00:00 authored by Hassan Al-Haj, Alon LavieAlon Lavie

Morphologically rich languages pose a challenge for statistical machine translation (SMT). This challenge is magnified when translating into a morphologically rich language. In this work we address this challenge in the framework of a broad-coverage English-to-Arabic phrase based statistical machine translation (PBSMT). We explore the full spectrum of Arabic segmentation schemes ranging from full word form to fully segmented forms and examine the effects on system performance. Our results show a difference of 2.61 BLEU points between the best and worst segmentation schemes indicating that the choice of the segmentation scheme has a significant effect on the performance of a PBSMT system in a large data scenario. We also show that a simple segmentation scheme can perform as good as the best and more complicated segmentation scheme. We also report results on a wide set of techniques for recombining the segmented Arabic output

History

Publisher Statement

Copyright 2010 AMTA

Date

2010-10-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC