WikiProjects Machine Readable Dataset

2017-10-16T19:23:08Z (GMT) by Sumit Asthana Aaron Halfaker
Machine readable format of WikiProjects listed at https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council/Directory

The dataset is generated using the code at - https://github.com/wiki-ai/drafttopic/

The dataset is modeled in the form of a nested tree structure after the original hierarchical mappings on the WikiProejcts home page and its child pages.

* Each non-leaf entry represents a sub-category with a name and some associated information like the level in the page it was parsed at and the root url of the page it was parsed from.
* Each non-leaf node has a mandatory key "topics" which leads to further sub-categories within it.
* Each leaf node is a WikiProject entry, with actual WikiProject name and its active status.