sorry, we can't preview this file
...but you can still download topicsForAllWikipediaPages.csv.gz
Topics for each Wikipedia Article across Languages
datasetposted on 15.04.2020 by Diego Saez-Trumper
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This dataset contains the predicted topic(s) for (almost) each Wikipedia article across languages.
Each row contains the following columns:
* Qid: Wikidata Item Id
* topic: Topic based on the ORES draft topic model (https://www.mediawiki.org/wiki/Talk:ORES/Draft_topic)
* probability: Probability to belong to the topic
* page_id: page_id
* page_title: page_title
* wiki_db: wiki_db, for example for english Wikipedia is enwiki
Topics are predicted using the Wikidata-Topic model developed by Isaac Johnson (https://github.com/geohci/wikidata-topic-model)
The source code to create this dataset can be found here: