This dataset contains a manually curated set of known undisclosed paid editor (UPE) accounts from Wikipedia. This is not a complete set of known editors. Editors who do not appear in this set are not guaranteed to not be paid editors.
See also https://en.wikipedia.org/wiki/Wikipedia:Paid-contribution_disclosure
The dataset contains four columns:
- user_name: The username of the UPE
- case_page_name: The page name (title) of a page describing the case through which paid editing was discovered.
- type: One of three types of UPEs (described below)
- notes: Any notes that a dataset curator chose to include with the example.
Type 1
User makes just over 10 minor edits. Is quiet for a few days well waiting for autoconfirm (user right) to kick in (takes 4 days). Then creates a promotional article in one big edit followed by the account going silent.
This is the main priority. These are present in the largest numbers and are the clearest pattern. They also cause the most damage to our shared brand.
Type 2
User is an obvious newbie. Makes lots of mistakes. Often turns out to be internal staff.
Not a key priority. We already manage these cases fairly well as they are often so obvious.
Type 3
Undisclosed paid editor, but one who only moves on to new accounts once their current account gets detected.
A serious problem--these will be harder to detect as we will have smaller numbers of these cases. Also a long time will need to pass before a pattern becomes present