en_nel_first300000wiki_core_web_lg-2.3.1.tar.gz (1.06 GB)

Spacy pipeline for English Named Entity Linking to Wikipedia/Wikidata

Download (1.06 GB)
dataset
posted on 17.10.2020 by Ben Hammersley
A modified version of the standard spaCy model en_core_web_lg (described as an "English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, POS tags, dependency parses and named entities.") with Entity Linking trained on the first 300,000 lines of a gold_entities.jsonl that was itself created from a complete dump of Wikidata and en-Wikipedia on October 11 2020.


History

Licence

Exports

Licence

Exports