Wikipedia Articles and Associated WikiProject Templates

2019-12-17T15:14:14Z (GMT) by Isaac Johnson Aaron Halfaker
== wikiproject_to_template.halfak_20191202.yaml ==
The mapping of the canonical names of WikiProjects to all the templates that might be used to tag an article with this WikiProject that was used for generating this dump. For instance, the line 'WikiProject Trade: ["WikiProject Trade", "WikiProject trade", "Wptrade"]' indicates that WikiProject Trade (https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Trade) is associated with the following templates:
* https://en.wikipedia.org/wiki/Template:WikiProject_Trade
* https://en.wikipedia.org/wiki/Template:WikiProject_trade
* https://en.wikipedia.org/wiki/Template:Wptrade

== wikiproject_taxonomy.halfak_20191202.yaml ==
A proposed mapping of WikiProjects to higher-level categories. This mapping has not been applied to the JSON dump contained here. It is based on the WikiProjects' canonical names.

== gather_wikiprojects_per_article.py ==
Python script to build the JSON dump described below.

== enwiki_wptemplates.json.bz2 ==
Each line of this bzipped JSON file corresponds with an English Wikipedia article. The intended usage of this JSON file is to build topic models for English Wikipedia (and potentially other languages based on these WikiProject mappings).

The following properties are recorded:
* title: English Wikipedia title
* article_pid: Page ID corresponding with the English Wikipedia article
* article_revid: Revision ID associated with the English Wikipedia article (current version for the dump processed)
* talk_pid: Page ID corresponding with the talk page for the English Wikipedia article
* talk_revid: Revision ID associated with the talk page for the English Wikipedia article (current version for the dump processed)
* wp_templates: List of WikiProject templates from the talk page. Where possible, the talk page titles have been canonicalized to the default template name (based on redirects). There might be some false positives here based on string matching.
* qid: Wikidata ID corresponding to the English Wikipedia article
* sitelinks: Based on Wikidata, the other languages in which this article exists and the corresponding title.

This version is based on the 1 December 2019 English Wikipedia dump and 9 December 2019 Wikidata entities dump, for which 5,926,244 articles were found that had associated WikiProjects. Articles with no associated WikiProject templates are not included.

For example, the line for Agatha Christie would pull information from her English Wikipedia article (https://en.wikipedia.org/wiki/Agatha_Christie), the associated talk page (https://en.wikipedia.org/wiki/Talk:Agatha_Christie), and her Wikidata item (https://www.wikidata.org/wiki/Q35064):

{
'title': 'Agatha Christie',
'article_pid': 984,
'article_revid': 928518693,
'talk_pid': 1001,
'talk_revid': 926378515,
'wp_templates': [
'WikiProject Biography',
'WikiProject Archaeology',
'WikiProject Novels',
"WikiProject Women's History",
'WikiProject Women writers',
'WikiProject Devon',
'WikiProject Women'],
'qid': 'Q35064',
'sitelinks': {
'af': 'Agatha Christie',
'am': 'አጋጣ ክርስቲ',
'an': 'Agatha Christie',
'ar': 'أجاثا كريستي',
...
'zh': '阿加莎·克里斯蒂',
'zh_min_nan': 'Agatha Christie',
'zh_yue': '阿嘉莎姬絲蒂'}
}