figshare
Browse
DAST.zip (690.96 kB)

Danish stance-annotated Reddit dataset

Download (690.96 kB)
Version 2 2019-07-13, 11:13
Version 1 2019-06-02, 09:08
dataset
posted on 2019-07-13, 11:13 authored by Anders LillieAnders Lillie, Emil MiddelboeEmil Middelboe
This is an SDQC stance-annotated Reddit dataset for the Danish language generated within a thesis project. The dataset consists of 3,007 Reddit posts across 11 different subjects, making up 33 Reddit submissions, or 1,161 individual branches.

16 submissions were annotated as rumourous, i.e. the source post initiates some rumour. With 220 Reddit conversations, or 596 branches, this is equivalent to approximately half of the dataset.

The dataset is applicable for supervised stance classification and rumour veracity prediction.

History