figshare
Browse
J01-1003.pdf (1.68 MB)

Bootstrapping Morphological Analyzers by Combining Human Elicitation and Machine Learning

Download (1.68 MB)
journal contribution
posted on 2001-03-01, 00:00 authored by Kemal OflazerKemal Oflazer, Marjorie McShane, Sergei Nirenburg
This paper presents a semiautomatic technique for developing broad-coverage finite-state morphological analyzers for use in natural language processing applications. It consists of three components--elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inflectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.

History

Publisher Statement

This is the published version of Oflazer, Kemal, Sergei Nirenburg, and Marjorie Mcshane. "Bootstrapping Morphological Analyzers by Combining Human Elicitation and Machine Learning." Computational Linguistics 27, no. 1 (March 2001): 59-85. doi:10.1162/089120101300346804. © 2001 Association for Computational Linguistics

Date

2001-03-01

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC