CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

Saha, Tanay Kumar

doi:10.6084/m9.figshare.5414173.v3

pkdd-cons2v-presentation.pdf (235.99 kB)

CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

Version 3 2017-09-17, 15:48

Version 2 2017-09-17, 15:48

Version 1 2017-09-17, 15:42

journal contribution

posted on 2017-09-17, 15:48 authored by Tanay Kumar SahaTanay Kumar Saha

We present a novel approach to learn distributed representation of sentences from unlabeled data by modeling both content and context of a sentence. The content model learns sentence representation by predicting its words. On the other hand, the context model comprises a neighbor prediction component and a regularizer to model distributional and proximity hypotheses, respectively. We propose an online algorithm to train the model components jointly. We evaluate the models in a setup, where contextual information is available. The experimental results on tasks involving classification, clustering, and ranking of sentences show that our model outperforms the best existing models by a wide margin across multiple datasets.