data.tar.gz (2.09 GB)
Tf values, word frequency values for gathering idf values, and evaluation data of the paper submitted to PLOS ONE, titled "Identifying Topics in Microblogs Using Wikipedia"
This data provides the topics identified by our approach BOUN-TI, on the data collected from Twitter while the 2012 U.S.A. presidential debates were holding.
The dataset also provides tf values of words in a Wikipedia snapshot, and the values required to gain idf values of words. Word frequency distribution of an interval of Twitter english public stream tweets' is provided.