Insight4news Irish news related hashtagged tweet collection 15.07.2015-24.05.2017

2019-04-04T14:42:11Z (GMT) by Gevorg Poghosyan
<pre>The 1.3GB tar.gz file contains a 3.6GB (uncompressed) .txt file with 198'725'860 rows, each row of which is a tweet ID. These tweets have been collected in 15.07.2015-24.05.2017 period with the <i>Hashtagger</i> platform (presented in https://doi.org/10.1145/2872427.2882982 by Shi et al.), which considered these tweets relevant to the monitored stream of news from Irish sources (The Irish Times, Irish Examiner, etc.). <br></pre><pre>All 198'725'860 tweets contain at least one hashtag. <br></pre><pre><br></pre><pre>Hydrate the tweet ids with Twarc (https://github.com/edsu/twarc) and write to a file. You will need to provide Twarc with a set of Twitter API keys.</pre><pre><i> twarc.py --hydrate tweet_ids.txt > tweets.json</i><br></pre><pre></pre><pre>It is probably not a good idea to hydrate all the tweets in one go, and may be better to split the file into chunks and hydrate the tweets chunk-by-chunk. </pre>