%0 Generic %A Poghosyan, Gevorg %D 2019 %T Insight4news Irish news related hashtagged tweet collection 15.07.2015-24.05.2017 %U https://figshare.com/articles/dataset/Insight4news_Irish_news_related_tweet_collection_15_07_2015-24_05_2017/7932422 %R 10.6084/m9.figshare.7932422.v4 %2 https://ndownloader.figshare.com/files/14760554 %K Ireland %K Twitter %K tweets %K news %K Hashtagger %K Insight4news %K Social and Community Informatics %X
The 1.3GB .tar.gz file contains a 3.6GB (uncompressed) .txt file with 198'725'860 rows, each row of which is a tweet ID.
These tweets have been collected in 15.07.2015-24.05.2017 period with the Hashtagger platform (presented in https://doi.org/10.1145/2872427.2882982 by Shi et al.), which considered these tweets relevant to the monitored stream of news from Irish sources (The Irish Times, Irish Examiner, etc.).

All 198'725'860 tweets are in English (with 'en' in the 'lang' field of the json objects, privided by GNIP) and contain at least one hashtag.



Hydrate the tweet ids with Twarc (https://github.com/edsu/twarc) and write to a file. You will need to provide Twarc with a set of Twitter API keys.
    twarc.py --hydrate tweet_ids.txt > tweets.json
It is probably not a good idea to hydrate all the tweets in one go, and may be better to split the file into chunks and hydrate the tweets chunk-by-chunk.


When using the dataset, please cite the following paper, for which this dataset was generated for.

SocialTree: Socially Augmented Structured Summaries of News Stories
Gevorg Poghosyan, Georgiana Ifrim  
Proceedings of 30th ACM Conference on Hypertext & Social Media (HT ’19), 2019
https://doi.org/10.1145/3342220.3343668
%I figshare