User Perceived Disinformation on Reddit: Manual Classification

dataset

posted on 2020-12-21, 12:43 authored by Vlad AchimescuVlad Achimescu

Manual annotation of Reddit comments as "flags" or "non-flags".

Annotations used in training set for ML model and validation set for POS matcher:

(1.200 comments in total)

classified_regex_labeled_train.csv

Annotations used in test set for ML model and POS matcher:

(300 comments in total)

classified_regex_labeled_test.csv

Codebook

(for both files)

ID number starting from 1

indx

Index from larger file containing all matches

word

Which keyword does it match (disinformation, fake news, misleading, unreliable, propaganda, bullshit)

subm_title

Title of Reddit post / submission.

domain

Web domain of link shared in Reddit post.

comm_body

Full text of comment.

disinformation

Matched by POS matcher as "disinformation".

fakenews

Matched by POS matcher as "fake news"

Matched by POS matcher as "bullshit"

misleading

Matched by POS matcher as "misleading/clickbait"

unreliable

Matched by POS matcher as "unreliable"

propaganda

Matched by POS matcher as "propaganda"

sample

Sample from keyword matching ("all") or sample from POS matches ("pos")

matches_POS

Matches at least one POS pattern with the POS matcher.

consensus

Manual annotation (consensus of both coders):

- f = comment was coded as "informal flag for false information"

- n = comment was coded as "NOT informal flag for false information"

- u = uncertain

- na/r = removed for being automated message

History

Usage metrics

Keywords

Manual annotation Natural Language Processing

Licence

CC BY 4.0

User Perceived Disinformation on Reddit: Manual Classification

History

Usage metrics

Categories

Keywords

Licence

Exports