figshare
Browse
known_paid_editors.201803.tsv (64.29 kB)

Known Undisclosed Paid Editors (English Wikipedia)

Download (64.29 kB)
dataset
posted on 2018-04-24, 14:17 authored by TonyBallioni, James Heilman, Brian Henry, Aaron HalfakerAaron Halfaker
This dataset contains a manually curated set of known undisclosed paid editor (UPE) accounts from Wikipedia. This is not a complete set of known editors. Editors who do not appear in this set are not guaranteed to not be paid editors.

See also https://en.wikipedia.org/wiki/Wikipedia:Paid-contribution_disclosure

The dataset contains four columns:

- user_name: The username of the UPE
- case_page_name: The page name (title) of a page describing the case through which paid editing was discovered.
- type: One of three types of UPEs (described below)
- notes: Any notes that a dataset curator chose to include with the example.


Type 1User makes just over 10 minor edits. Is quiet for a few days well waiting for autoconfirm (user right) to kick in (takes 4 days). Then creates a promotional article in one big edit followed by the account going silent.This is the main priority. These are present in the largest numbers and are the clearest pattern. They also cause the most damage to our shared brand.
Type 2User is an obvious newbie. Makes lots of mistakes. Often turns out to be internal staff. Not a key priority. We already manage these cases fairly well as they are often so obvious.
Type 3Undisclosed paid editor, but one who only moves on to new accounts once their current account gets detected. A serious problem--these will be harder to detect as we will have smaller numbers of these cases. Also a long time will need to pass before a pattern becomes present

History