posted on 2018-12-29, 20:29authored byMichael PaulMichael Paul, Xiaolei Huang, Michael C. Smith, David Broniatowski, Sandra C. Quinn, Amelia Jamison, Mark Dredze
This project contains two data files in tab-separated values format.
annotations.tsv - The first column is the tweet ID, and the second column is the annotation ('yes' or 'no') for whether the tweet indicates that the someone received or intended to receive a flu vaccine.
data.tsv - This file contains over 1.2 million tweets containing at least one influenza-related term and one vaccination-related term. The second and third columns are the classifier probabilities that the tweet is positive for vaccine intent (using logistic regression and convolutional neural network, respectively). The remaining columns are the metadata we extracted or inferred (time, location, and gender). Tweets corresponding to the IDs can be downloaded using the Twitter API.
Funding
National Institute of General Medical Sciences R01GM114771