Virality Measures of "Data Tweets"
datasetposted on 05.03.2020, 12:32 by Leslie Carr, Simperl, Elena
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This dataset consists of two files in TSV format derived from a large number of tweets (16754250) that were identified as containing different forms of "numeric data" in an extended collection of tweets from Twitter's 1% public sample over 11 months from September 2018.
Both files have a key column labelled "TweetID" which is the Twitter API ID that can be used to retrieve the full twitter data (recommended retrieval via TWARC).
The file "datatweet-numeric-occurrences.txt" consists of three columns:
2 NumericDataString - the actual substring from the tweet which was recognised as numeric e.g. "500 billion" or "24 years"
3 NumericType - one of a set of identified numeric types e.g. "[cardinal]" or "[time]".
The "virality" associated with the tweets in which the numeric data has been found is given in the file "datatweet-virality.txt".
Its columns are as follows
1 id of the tweet
4 followers_count (of the user who made the tweet)
If this tweet is a retweet of another (original) tweet, the following columns are non-empty:
5 id of the original tweet
6 favourite_count of the original tweet
7 followers_count of the original tweet's author
NB if col 2 is 0, then cols 5-7 will be blank.
If col 2 >0, then it contains the number of retweets of the original tweet, not the number of times that this retweet has been retweeted.