figshare
Browse
1/1
4 files

Monthly Wikipedia article quality predictions

dataset
posted on 2019-12-17, 07:58 authored by Aaron HalfakerAaron Halfaker, Amir SarabadaniAmir Sarabadani
Machine predicted quality levels of all articles in Wikipedia on a monthly basis.  All datasets contain the following 6 columns.

  • page_id -- The page identifier
  • page_title -- The title of the article (UTF-8_with_underscores)
  • rev_id -- The most recent revision ID at the time of assessment
  • timestamp -- The timestamp when the assessment was taken (YYYYMMDDHHMMSS) 
  • prediction -- The predicted quality class ("Stub", "Start", "C", "B", "GA", "FA", ...)
  • weighted_sum -- The sum of prediction weights assuming indexed class ordering ("Stub" = 0, "Start" = 1, ...)
Predictions are made using the ORES "wp10" models for the relevant language.  See [1] and [2] for more information.

Funding

Wikimedia Foundation

History