Zika tweets and topics (2015-03-01 to 2016-10-31)
datasetposted on 08.05.2019 by Dasha Pruss, Yoshinari Fujinuma, Ashlynn Daughton, Michael Paul, Brad Arnot, Danielle Szafir, Jordan Boyd-Graber
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This collection contains the identifiers and metadata of all tweets mentioning Zika ("zika", "zica", "zikv") from March 1, 2015 through October 31, 2016. The tweets can be retrieved using the Twitter API (https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-lookup).
The tweets are provided in two files. The first file contains tweets in English, Spanish, and Portuguese. A polylingual topic model was applied to these tweets, so this file contains the topic probabilities for each tweet. The final 50 columns correspond to the topic probabilities of the 50 topics. The second file contains tweets in all other languages, which do not have columns with topic probabilities.
In addition to the tweet files, topic_topwords.txt shows the words associated with each of the 50 topics, output by MALLET. The first line is the topic identifier (zero-indexed), which you can link to the topic columns in the tweet files.