Article-level image suggestions evaluation (ALISE) dataset
Article-level image suggestions (ALIS, reads alice) is a distributed computing system that recommends images for Wikipedia articles that don't have one [1].
This publication contains roughly 3,800 human ratings made against ALIS output in multiple Wikipedia language editions.
Evaluation task
Data was collected through an evaluation tool [2], with code available at [3]. Given a language, the user is shown a random Wikipedia article and an image suggested by the system; they are then asked to rate the relevance of the image by clicking on either the Good, Okay, Bad, or Unsure button. The user is also brought to judge whether the image is not suitable for any reason via the It's ok, It's unsuitable, or Unsure button.
Content
The archive holds 2 tab-separated-values (TSV) text files:
- evaluation_dataset.tsv contains the evaluation data;
- unillustrated_articles.tsv keeps track of unillustrated Wikipedia articles.
Evaluation dataset headers
- id (integer) - identifier used for internal storage;
- unillustratedArticleId (integer) - identifier of the unillustrated Wikipedia article;
- resultFilePage (string) - Wikimedia Commons image file name. Prepend https://commons.wikimedia.org/wiki/ to form a valid Commons URL;
- resultImageUrl (string) - Wikimedia Commons thumbnail URL;
- source (string) - suggestion source. ms = MediaSearch; ima = ALIS prototype algorithm. See [4] and [5] respectively for more details;
- confidence_class (string) - shallow degree of suggestion confidence. Either low, medium, or high;
- rating (integer) - human image relevance rating. 1 = good; 0 = okay; -1 = bad;
- sensitive (integer) - human image suitability rating. 0 = it's okay; 1 = it's unsuitable; -1 = unsure;
- viewCount (integer) - number of times the suggestion was seen by evaluators.
Example
7357 1827 File:Cuphea_cyanea_strybing.jpg https://upload.wikimedia.org/wikipedia/commons/thumb/1/17/Cuphea_cyanea_strybing.jpg/800px-Cuphea_cyanea_strybing.jpg ima high 1 0 1
Unillustrated articles headers
- id (integer) - identifier used for internal storage. Maps to unillustratedArticleId in the evaluation data;
- langCode (string) - Wikipedia language code;
- pageTitle (string) - Wikipedia article title;
- unsuitableArticleType (integer) - whether the Wikipedia article is suitable for receiving image suggestions. 0 = suitable; 1 = not suitable;
Example
1827 vi Cuphea_cyanea 0
References
[1] https://www.mediawiki.org/wiki/Structured_Data_Across_Wikimedia/Image_Suggestions/Data_Pipeline
[2] https://image-recommendation-test.toolforge.org/
[3] https://github.com/cormacparle/media-search-signal-test/tree/master/public_html