figshare
Browse
enwiki.labeled_wikiprojects.json (35.2 MB)

English Wikipedia labeled mid-level wikiprojects set

Download (35.2 MB)
dataset
posted on 2017-11-28, 17:56 authored by Sumit AsthanaSumit Asthana, Aaron HalfakerAaron Halfaker
This dataset contains a set of 93,449 observations providing wikiproject mid-level category labels associated with talk pages for respective Wikipedia articles.
Each observation includes a talk page title, talk page id, latest revision id when the extraction was done, associated wikiproject templates and mid-level wikiproject categories the corresponding article page belongs to.
The dataset was generated using a python script that ran mysql queries on Wikimedia PAWS.
To ensure a balanced set, the script extracts a random set of 2000 page-ids per mid-level category totaling about 93,449 observations
This dataset opens up immense possibilities for topic oriented research around Wikipedia as it exposes high level topic data associated with Wikipedia pages.

History