%0 Generic %A Redi, Miriam %A Morgan, Jonathan %A Taraborelli, Dario %D 2019 %T Citation Reason Dataset %U https://figshare.com/articles/dataset/Citation_Reason_Dataset/7756226 %R 10.6084/m9.figshare.7756226.v1 %2 https://ndownloader.figshare.com/files/14441312 %K Wikipedia articles %K citations %K citation needed %K dataset %K crowdsourcing %K mechanical turk %K Computer-Human Interaction %K Knowledge Representation and Machine Learning %X
A dataset of ~4K statements from English Wikipedia annotated with the reason why they need a citation.

Each line of this tab-separated file contains:
* entity_id: the Wikidata ID corresponding to the page
* revision_id: the revision of the corresponding Wikipedia article
* timestamp: the timestamp of the revision
* entity_title: the page/Wikidata ID title
* section_id: the section ID where the statement is
* section: the section title
* prg_idx: the index of the paragraph in the page
* sentence_idx: the index of the statement in the paragraph
* statement: the statement text
* citations: the source cited in the statement:
* vote1: first Mechanical Turk judgment
* vote2: second Mechanical Turk judgment
* vote3: third Mechanical Turk judgment

The numbers in the last 3 fields correspond to the following citation reasons:
1='direct quotation'
2='statistics'
3='controversial'
4='opinion'
5='life'
6='scientific'
7='historical'
8='other'
%I figshare