figshare
Browse
en_wiki_subset_statements_all_citations_sample_with_labels.csv (1.84 MB)

Citation Reason Dataset

Download (1.84 MB)
A dataset of ~4K statements from English Wikipedia annotated with the reason why they need a citation.

Each line of this tab-separated file contains:
* entity_id: the Wikidata ID corresponding to the page
* revision_id: the revision of the corresponding Wikipedia article
* timestamp: the timestamp of the revision
* entity_title: the page/Wikidata ID title
* section_id: the section ID where the statement is
* section: the section title
* prg_idx: the index of the paragraph in the page
* sentence_idx: the index of the statement in the paragraph
* statement: the statement text
* citations: the source cited in the statement:
* vote1: first Mechanical Turk judgment
* vote2: second Mechanical Turk judgment
* vote3: third Mechanical Turk judgment

The numbers in the last 3 fields correspond to the following citation reasons:
1='direct quotation'
2='statistics'
3='controversial'
4='opinion'
5='life'
6='scientific'
7='historical'
8='other'

History