figshare
Browse
.GZ
zika_tweets_en-es-pt.csv.gz (363.04 MB)
.GZ
zika_tweets_other.csv.gz (36.39 MB)
TEXT
topic_topwords.txt (24.54 kB)
1/0
3 files

Zika tweets and topics (2015-03-01 to 2016-10-31)

dataset
posted on 2019-05-08, 23:07 authored by Dasha Pruss, Yoshinari Fujinuma, Ashlynn DaughtonAshlynn Daughton, Michael PaulMichael Paul, Brad Arnot, Danielle Szafir, Jordan Boyd-Graber
This collection contains the identifiers and metadata of all tweets mentioning Zika ("zika", "zica", "zikv") from March 1, 2015 through October 31, 2016. The tweets can be retrieved using the Twitter API (https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-lookup).

The tweets are provided in two files. The first file contains tweets in English, Spanish, and Portuguese. A polylingual topic model was applied to these tweets, so this file contains the topic probabilities for each tweet. The final 50 columns correspond to the topic probabilities of the 50 topics. The second file contains tweets in all other languages, which do not have columns with topic probabilities.

In addition to the tweet files, topic_topwords.txt shows the words associated with each of the 50 topics, output by MALLET. The first line is the topic identifier (zero-indexed), which you can link to the topic columns in the tweet files.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC