sorry, we can't preview this file

...but you can still download topicsForAllWikipediaPages.csv.gz

topicsForAllWikipediaPages.csv.gz (1.33 GB)

Topics for each Wikipedia Article across Languages

Download (1.33 GB)
dataset
posted on 15.04.2020, 04:05 by Diego Saez-Trumper
This dataset contains the predicted topic(s) for (almost) each Wikipedia article across languages.

Each row contains the following columns:
Qid,topic,probability,page_id,page_title,wiki_db

Where:

* Qid: Wikidata Item Id
* topic: Topic based on the ORES draft topic model (https://www.mediawiki.org/wiki/Talk:ORES/Draft_topic)
* probability: Probability to belong to the topic
* page_id: page_id
* page_title: page_title
* wiki_db: wiki_db, for example for english Wikipedia is enwiki

For example
Q1000211,Geography.Regions.Europe.Western_Europe,1.0,166578,Frières-Faillouël,euwiki
Topics are predicted using the Wikidata-Topic model developed by Isaac Johnson (https://github.com/geohci/wikidata-topic-model)
The source code to create this dataset can be found here:
https://github.com/digitalTranshumant/wikidata-topic-model

History

Licence

Exports

Licence

Exports