UK Twitter word embeddings (II)

2018-01-16T23:08:58Z (GMT) by Vasileios Lampos
<div><div><b>Word embeddings trained on UK Twitter content (II)</b></div><div><br>The total number of tweets used was approximately 1.1 billion, covering the years 2012 to and including 2016.</div><div><br></div><div><b>Settings:</b> Skip-gram with negative sampling (10 noise words), a window of 9 words, 512 layers (dimensionality) and 10 epochs of training. </div><div><br></div><div>After filtering out words with less than 100 occurrences, an embedding corpus of 470,194 unigrams was obtained (see <b>embd_voc</b>). </div><div><br></div><div>The corresponding 512-dimensional embeddings are held in <b>embd_vec.bz2</b>.</div></div><div><br></div><div><br></div>