figshare
Browse

Article-level image suggestions evaluation (ALISE) dataset

Download (183.76 kB)
dataset
posted on 2023-06-07, 09:50 authored by Cormac Parle, Marco FossatiMarco Fossati

Article-level image suggestions (ALIS, reads alice) is a distributed computing system that recommends images for Wikipedia articles that don't have one [1].
This publication contains roughly 3,800 human ratings made against ALIS output in multiple Wikipedia language editions.

Evaluation task

Data was collected through an evaluation tool [2], with code available at [3]. Given a language, the user is shown a random Wikipedia article and an image suggested by the system; they are then asked to rate the relevance of the image by clicking on either the Good, Okay, Bad, or Unsure button. The user is also brought to judge whether the image is not suitable for any reason via the It's ok, It's unsuitable, or Unsure button.

Content

The archive holds 2 tab-separated-values (TSV) text files:

  1. evaluation_dataset.tsv contains the evaluation data;
  2. unillustrated_articles.tsv keeps track of unillustrated Wikipedia articles.

Evaluation dataset headers

  • id (integer) - identifier used for internal storage;
  • unillustratedArticleId (integer) - identifier of the unillustrated Wikipedia article;
  • resultFilePage (string) - Wikimedia Commons image file name. Prepend https://commons.wikimedia.org/wiki/ to form a valid Commons URL;
  • resultImageUrl (string) - Wikimedia Commons thumbnail URL;
  • source (string) - suggestion source. ms = MediaSearch; ima = ALIS prototype algorithm. See [4] and [5] respectively for more details;
  • confidence_class (string) - shallow degree of suggestion confidence. Either low, medium, or high;
  • rating (integer) - human image relevance rating. 1 = good; 0 = okay; -1 = bad;
  • sensitive (integer) - human image suitability rating. 0 = it's okay; 1 = it's unsuitable; -1 = unsure; 
  • viewCount (integer) - number of times the suggestion was seen by evaluators.

Example

7357    1827    File:Cuphea_cyanea_strybing.jpg https://upload.wikimedia.org/wikipedia/commons/thumb/1/17/Cuphea_cyanea_strybing.jpg/800px-Cuphea_cyanea_strybing.jpg   ima     high    1       0       1

Unillustrated articles headers

  • id (integer) - identifier used for internal storage. Maps to unillustratedArticleId in the evaluation data;
  • langCode (string) - Wikipedia language code;
  • pageTitle (string) - Wikipedia article title;
  • unsuitableArticleType (integer) - whether the Wikipedia article is suitable for receiving image suggestions. 0 = suitable; 1 = not suitable;

Example

1827 vi Cuphea_cyanea 0

References

[1] https://www.mediawiki.org/wiki/Structured_Data_Across_Wikimedia/Image_Suggestions/Data_Pipeline

[2] https://image-recommendation-test.toolforge.org/

[3] https://github.com/cormacparle/media-search-signal-test/tree/master/public_html

[4] https://www.mediawiki.org/wiki/Help:MediaSearch

[5] https://www.mediawiki.org/wiki/Structured_Data_Across_Wikimedia/Image_Suggestions/Data_Pipeline#How_it_works

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC