figshare
Browse
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
1/3
55 files

WikiRank quality scores and measures for Wikipedia articles (April 2022)

dataset
posted on 2022-05-13, 14:53 authored by Wiki RankWiki Rank

Those datasets include lists of over 43 million Wikipedia articles in 55 languages with quality scores by WikiRank (https://wikirank.net). Additionally, the datasets contain the quality measures (metrics) which directly affect these scores. 

Quality measures were extracted based on Wikipedia dumps from April, 2022.

License

All files included in this datasets are released under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/

Format

  • page_id -- The identifier of the Wikipedia article (int), e.g. 840191
  • page_name -- The title of the Wikipedia article (utf-8), e.g. Sagittarius A*
  • wikirank_quality -- quality score for Wikipedia article in a scale 0-100 (as of April 1, 2022). This is a synthetic measure that was calculated based on the metrics below (also included in the datasets).
  • norm_len - normalized "page length"
  • norm_refs - normalized "number of references"
  • norm_img - normalized "number of images"
  • norm_sec - normalized "number of sections"
  • norm_reflen - normalized "references per length ratio"
  • norm_authors - normalized "number of authors" (without bots and anonymous users)
  • flawtemps - flaw templates

History

Usage metrics

    Licence

    Exports