sorry, we can't preview this file
topicsForAllWikipediaPages.csv.gz (1.33 GB)
Topics for each Wikipedia Article across Languages
This dataset contains the predicted topic(s) for (almost) each Wikipedia article across languages.
Each row contains the following columns:
Qid,topic,probability,page_id,page_title,wiki_db
Where:
* Qid: Wikidata Item Id
* topic: Topic based on the ORES draft topic model (https://www.mediawiki.org/wiki/Talk:ORES/Draft_topic)
* probability: Probability to belong to the topic
* page_id: page_id
* page_title: page_title
* wiki_db: wiki_db, for example for english Wikipedia is enwiki
For example
Q1000211,Geography.Regions.Europe.Western_Europe,1.0,166578,Frières-Faillouël,euwiki
Topics are predicted using the Wikidata-Topic model developed by Isaac Johnson (https://github.com/geohci/wikidata-topic-model)
The source code to create this dataset can be found here:
https://github.com/digitalTranshumant/wikidata-topic-model