Twitter Death Hoaxes dataset
datasetposted on 25.03.2019 by Arkaitz Zubiaga
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This is a dataset of death reports collected from Twitter between 1st January, 2012 and 31st December, 2014. It was collected by tracking the keyword 'RIP', and matching those tweets in which a name is mentioned next to RIP. Matching names were identified by using Wikidata as a database of names. For more details, please refer to the paper:
The dataset contains 4,007 death reports, of which 2,301 are real deaths, 1,092 are commemorations and 614 are fake deaths.
Along with this dataset, the word embeddings models used in this paper are also provided.
This dataset is released in accordance with Twitter's TOS, which allows sharing of tweet IDs and are intended for non-commercial research.
Note: Twitter's developer policy doesn't allow sharing more than 1,500,000 tweet IDs (https://dev.twitter.com/overview/terms/policy#updated-policy), unless the author is affiliated with an academic institution (which is my case) and tweet IDs are solely used for non-commercial purposes (https://twittercommunity.com/t/policy-update-clarification-research-use-cases/87566). Hence, by downloading these datasets you agree that you will not use it for commercial purposes.