figshare
Browse

sorry, we can't preview this file

topicsForAllWikipediaPages.csv.gz (1.33 GB)

Topics for each Wikipedia Article across Languages

Download (1.33 GB)
dataset
posted on 2020-04-15, 04:05 authored by Diego Saez-TrumperDiego Saez-Trumper
This dataset contains the predicted topic(s) for (almost) each Wikipedia article across languages.

Each row contains the following columns:
Qid,topic,probability,page_id,page_title,wiki_db

Where:

* Qid: Wikidata Item Id
* topic: Topic based on the ORES draft topic model (https://www.mediawiki.org/wiki/Talk:ORES/Draft_topic)
* probability: Probability to belong to the topic
* page_id: page_id
* page_title: page_title
* wiki_db: wiki_db, for example for english Wikipedia is enwiki

For example
Q1000211,Geography.Regions.Europe.Western_Europe,1.0,166578,Frières-Faillouël,euwiki
Topics are predicted using the Wikidata-Topic model developed by Isaac Johnson (https://github.com/geohci/wikidata-topic-model)
The source code to create this dataset can be found here:
https://github.com/digitalTranshumant/wikidata-topic-model

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC