Tweet geolocation 5m

<div>Tweet-geolocation-5m is a dataset with more than 5 million geolocated tweets with detailed geolocation information associated. Each geolocated tweet is associated with its fine-grained location information, collected from OpenStreetMap [1] using the reverse geocoding feature in Nominatim [2]. It was originally created for country-level classification of tweets, but finer-grained classification is also provided with the dataset. The country codes are provided using the ISO 3166-1 alpha-2 standard [3].</div><div><br></div><div>The dataset was collected in two different week long periods: TC2014, collected in October 2014, and TC2015, collected in October 2015.</div><div><br></div><div>Two files are provided here:</div><div>* tweet-geolocation-5m.tar.bz2, which is the actual datasets, providing the tweet IDs and ground truth country IDs that enable conducting further experiments.</div><div>* vectors-and-folds.tar.bz2, which is provided for the purposes of reproducibility. With the information provided in this file, you should be able to reproduce the results we presented in the paper.</div>