figshare
Browse
1/1
2 files

Twitter Death Hoaxes dataset

Version 3 2019-03-25, 18:55
Version 2 2018-03-11, 21:14
Version 1 2017-12-11, 17:26
dataset
posted on 2019-03-25, 18:55 authored by Arkaitz ZubiagaArkaitz Zubiaga
This is a dataset of death reports collected from Twitter between 1st January, 2012 and 31st December, 2014. It was collected by tracking the keyword 'RIP', and matching those tweets in which a name is mentioned next to RIP. Matching names were identified by using Wikidata as a database of names. For more details, please refer to the paper:
https://arxiv.org/abs/1801.07311

The dataset contains 4,007 death reports, of which 2,301 are real deaths, 1,092 are commemorations and 614 are fake deaths.

Along with this dataset, the word embeddings models used in this paper are also provided.

This dataset is released in accordance with Twitter's TOS, which allows sharing of tweet IDs and are intended for non-commercial research.

Note: Twitter's developer policy doesn't allow sharing more than 1,500,000 tweet IDs (https://dev.twitter.com/overview/terms/policy#updated-policy), unless the author is affiliated with an academic institution (which is my case) and tweet IDs are solely used for non-commercial purposes (https://twittercommunity.com/t/policy-update-clarification-research-use-cases/87566). Hence, by downloading these datasets you agree that you will not use it for commercial purposes.

History