User Perceived Disinformation on Reddit: Manual Classification
Manual annotation of Reddit comments as "flags" or "non-flags".
Annotations used in training set for ML model and validation set for POS matcher:
(1.200 comments in total)
classified_regex_labeled_train.csv
Annotations used in test set for ML model and POS matcher:
(300 comments in total)
classified_regex_labeled_test.csv
Codebook
(for both files)
id
ID number starting from 1
indx
Index from larger file containing all matches
word
Which keyword does it match (disinformation, fake news, misleading, unreliable, propaganda, bullshit)
subm_title
Title of Reddit post / submission.
domain
Web domain of link shared in Reddit post.
comm_body
Full text of comment.
disinformation
Matched by POS matcher as "disinformation".
fakenews
Matched by POS matcher as "fake news"
bs
Matched by POS matcher as "bullshit"
misleading
Matched by POS matcher as "misleading/clickbait"
unreliable
Matched by POS matcher as "unreliable"
propaganda
Matched by POS matcher as "propaganda"
sample
Sample from keyword matching ("all") or sample from POS matches ("pos")
matches_POS
Matches at least one POS pattern with the POS matcher.
consensus
Manual annotation (consensus of both coders):
- f = comment was coded as "informal flag for false information"
- n = comment was coded as "NOT informal flag for false information"
- u = uncertain
- na/r = removed for being automated message
History
Usage metrics
Categories
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC