Monthly Wikipedia article quality predictions
dataset
posted on 2019-12-17, 07:58 authored by Aaron HalfakerAaron Halfaker, Amir SarabadaniAmir SarabadaniMachine predicted quality levels of all articles in Wikipedia on a monthly basis. All datasets contain the following 6 columns.
- page_id -- The page identifier
- page_title -- The title of the article (UTF-8_with_underscores)
- rev_id -- The most recent revision ID at the time of assessment
- timestamp -- The timestamp when the assessment was taken (YYYYMMDDHHMMSS)
- prediction -- The predicted quality class ("Stub", "Start", "C", "B", "GA", "FA", ...)
- weighted_sum -- The sum of prediction weights assuming indexed class ordering ("Stub" = 0, "Start" = 1, ...)
Predictions are made using the ORES "wp10" models for the relevant language. See [1] and [2] for more information.
Funding
Wikimedia Foundation
History
Usage metrics
Categories
Keywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC