3 files

Twitterstorm data: the Katie Hinde Target t-shirt saga 2017-06-11

posted on 06.04.2018, 08:41 authored by Randi GriffinRandi Griffin
This repository contains data from the Twitterstorm that occurred from June 11-13 2017 following a controversial tweet by scientist and public figure Katie Hinde (@Mammals_Suck on Twitter). This is a good dataset for practicing the analysis of text and/or social network data in R.

1. tweets_raw.Rds contains raw text data for all 33261 tweets mentioning @Mammals_Suck in the 36 hours following the original tweet. This file can be read into R using the 'readRDS' function.

2. tweets_clean.csv contains clean data on all 4843 quote & reply tweets responding to the original tweet over a 36 hour period, including: tweet text, user name, tweet time, # of favorites, # of retweets, # of friends of the user, # of followers of the user, user self-description, user location, type (quote or reply).

3. social_network.rds contains a social network for users in the twitterstorm an an 'igraph' object. Vertex names correspond to Twitter users, and edge weights are based on co-followers (i.e., the number of mutually followed accounts, which is a proxy for overlap in the interests of two users). Additional vertex attributes can be added to the graph using information about users from the 'tweets_clean.csv' file, such as the time they entered the twitterstorm, their geographic location, or the text content of their tweets. This file can be read into R using the 'readRDS' function.

For more information, check out the blog posts written by myself and Katie Hinde. Mine focuses on data analysis, while hers focuses on her experience and understanding of the events.

My blog post: https://rgriff23.github.io/2017/06/29/Katie-Hinde-Twitterstorm.html

Katie Hinde's blog post: https://mammalssuck.blogspot.co.uk/2017/06/portrait-of-unexpected-twitter-storm.html

The R code I used to compile and analyze this data can be found in this GitHub repository: https://github.com/rgriff23/Katie_Hinde_Twitter_storm_text_analysis

Note that the data in the GitHub repo does not match the data included in this figshare repo exactly. This is because the data provided here has been reduced to information collected from Twitter: I eliminated data columns that were produced using subsequent analysis, such as tweet classifications based on sentiment analysis or social network analysis.