Low-Dimensional Context-Dependent Translation Models

Saluja, Avneesh

doi:10.1184/R1/7347104.v1

asaluja_ECE_2015.pdf (1.65 MB)

Low-Dimensional Context-Dependent Translation Models

thesis

posted on 2015-09-01, 00:00 authored by Avneesh Saluja

Context matters when modeling language translation, but state-of-the-art approaches predominantly model these dependencies via larger translation units. This decision results in problems related to computational efficiency (runtime and memory) and statistical efficiency (millions of sentences, but billions of translation rules), and as a result such methods stop short of conditioning on extreme amounts of local context or global context. This thesis takes a step back from the current zeitgeist and posits another view: while context influences translation, its influence is inherently low-dimensional, and problems of computational and statistical tractability can be solved by using
dimensionality reduction and representation learning techniques. The lowdimensional representations we recover intuitively capture this observation, that the phenomena that drive translation are controlled by context residing in a more
compact space than the lexical-based (word or n-gram) “one-hot” or count-based spaces. We consider low-dimensional representations of context, recovered via a multiview canonical correlations analysis, as well as low-dimensional representations of translation units that are expressed (featurized) in terms of context, recovered by
a rank-reduced SVD of a feature space defined over inside and outside trees in a synchronous grammar. Lastly, we test our low-dimensional hypothesis in the limit, by considering a semi-supervised learning scenario where contextual information is gleaned from large amounts of unlabeled data. All empirical setups show improvements
by taking into account the low-dimensional hypothesis, indicating that this route is an effective way to boost performance while maintaining model parsimony.

History

Date

2015-09-01

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Chris Dyer

Usage metrics

Keywords

Low-Dimensional Context-Dependent Translation Models language translation

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Low-Dimensional Context-Dependent Translation Models

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports