figshare
Browse

User Perceived Disinformation on Reddit: Manual Classification

dataset
posted on 2020-12-21, 12:43 authored by Vlad AchimescuVlad Achimescu
Manual annotation of Reddit comments as "flags" or "non-flags".

Annotations used in training set for ML model and validation set for POS matcher:
(1.200 comments in total)

classified_regex_labeled_train.csv


Annotations used in test set for ML model and POS matcher:
(300 comments in total)

classified_regex_labeled_test.csv


Codebook
(for both files)

id
ID number starting from 1
indx
Index from larger file containing all matches
word
Which keyword does it match (disinformation, fake news, misleading, unreliable, propaganda, bullshit)
subm_title
Title of Reddit post / submission.
domain
Web domain of link shared in Reddit post.
comm_body
Full text of comment.
disinformation
Matched by POS matcher as "disinformation".
fakenews
Matched by POS matcher as "fake news"
bs
Matched by POS matcher as "bullshit"
misleading
Matched by POS matcher as "misleading/clickbait"
unreliable
Matched by POS matcher as "unreliable"
propaganda
Matched by POS matcher as "propaganda"
sample
Sample from keyword matching ("all") or sample from POS matches ("pos")
matches_POS
Matches at least one POS pattern with the POS matcher.
consensus
Manual annotation (consensus of both coders):
- f = comment was coded as "informal flag for false information"
- n = comment was coded as "NOT informal flag for false information"
- u = uncertain
- na/r = removed for being automated message



History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC