figshare
Browse
1-s2.0-S0747563215002071-main.pdf (1.35 MB)

A synergistic strategy for combining thesaurus-based and corpus-based approaches in building ontology for multilingual search engines

Download (1.35 MB)
journal contribution
posted on 2016-06-09, 00:03 authored by Leyla ZhuhadarLeyla Zhuhadar
In this article we illustrate a methodology for building cross-language search engine. A synergistic
approach between thesaurus-based approach and corpus-based approach is proposed. First, a bilingual
ontology thesaurus is designed with respect to two languages: English and Spanish, where a simple
bilingual listing of terms, phrases, concepts, and subconcepts is built. Second, term vector translation is
used – a statistical multilingual text retrieval techniques that maps statistical information about term
use between languages (Ontology co-learning). These techniques map sets of t f id f term weights from
one language to another. We also applied a query translation method to retrieve multilingual documents
with an expansion technique for phrasal translation. Finally, we present our findings.

History