WikiRank quality scores and measures for Wikipedia articles (April 2022)
datasetposted on 2022-05-13, 14:53 authored by Wiki RankWiki Rank
Those datasets include lists of over 43 million Wikipedia articles in 55 languages with quality scores by WikiRank (https://wikirank.net). Additionally, the datasets contain the quality measures (metrics) which directly affect these scores.
Quality measures were extracted based on Wikipedia dumps from April, 2022.
All files included in this datasets are released under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/
- page_id -- The identifier of the Wikipedia article (int), e.g. 840191
- page_name -- The title of the Wikipedia article (utf-8), e.g. Sagittarius A*
- wikirank_quality -- quality score for Wikipedia article in a scale 0-100 (as of April 1, 2022). This is a synthetic measure that was calculated based on the metrics below (also included in the datasets).
- norm_len - normalized "page length"
- norm_refs - normalized "number of references"
- norm_img - normalized "number of images"
- norm_sec - normalized "number of sections"
- norm_reflen - normalized "references per length ratio"
- norm_authors - normalized "number of authors" (without bots and anonymous users)
- flawtemps - flaw templates