figshare
Browse
country.zip (780.66 kB)

Politicians by Country from the English-language Wikipedia

Download (2.86 MB)
Version 6 2017-10-28, 17:49
Version 5 2017-10-21, 04:05
Version 4 2017-10-21, 04:05
Version 3 2017-10-21, 02:28
Version 2 2017-10-19, 04:06
Version 1 2017-10-19, 04:05
dataset
posted on 2017-10-28, 17:49 authored by Os KeyesOs Keyes
This project contains data on most English-language Wikipedia articles within the category "Category:Politicians by nationality" and subcategories, along with the code used to generate that data. Both are released under the CC-BY-SA 4.0 license.

Data
The data was extracted via the Wikimedia API using the associated code. It is formatted as a CSV and saved as page_data.csv in the "data" directory. Columns are:

1. "country", containing the sanitised country name, extracted from the category name;
2. "page", containing the unsanitised page title.
3. "last_edit", containing the edit ID of the last edit to the page.

Country codes are inconsistent. Where possible, they have been modified to match the country names found in http://www.prb.org/DataFinder/Topic/Rankings.aspx?ind=14 - but the PRB dataset contains nations not found in Wikipedia, and vice versa.

The actual recursion only went 2 levels deep into the category tree: someone listed as an Antiguan politician, say, is included - someone exclusively listed as an Antiguan politician who was assassinated is not.

Code
The code is written in the programming language R, and heavily commented; it can be found in the "code" directory, and is split into 3 files:

1. utils.R, which contains utilities for operating the code in the other files;
2. retrieve.R, which contains functions for retrieving the category and page data from Wikipedia;
3. main.R, which executes the data retrieval code and performs sanitisation before writing it to file.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC