figshare
Browse
A Human Judgment Corpus and a Metric for Arabic MT Evaluation.pdf (276.16 kB)

A Human Judgment Corpus and a Metric for Arabic MT Evaluation

Download (276.16 kB)
journal contribution
posted on 2014-10-01, 00:00 authored by Houda BouamorHouda Bouamor, Hanan AlshikhabobakrHanan Alshikhabobakr, Behrang Mohit, Kemal OflazerKemal Oflazer
We present a human judgments dataset
and an adapted metric for evaluation of
Arabic machine translation. Our mediumscale
dataset is the first of its kind for Arabic
with high annotation quality. We use
the dataset to adapt the BLEU score for
Arabic. Our score (AL-BLEU) provides
partial credits for stem and morphological
matchings of hypothesis and reference
words. We evaluate BLEU, METEOR and
AL-BLEU on our human judgments corpus
and show that AL-BLEU has the highest
correlation with human judgments. We
are releasing the dataset and software to
the research community.

History

Publisher Statement

This is the published version of Bouamor, H., Alshikhabobakr, H., Mohit, B., & Oflazer, K. (2014). A Human Judgement Corpus and a Metric for Arabic MT Evaluation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1026 © 2014 Association for Computational Linguistics

Date

2014-10-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC