figshare
Browse
data.zip (84.7 MB)

Twitter corpus of Resource-Scarce Languages for Sentiment Analysis and Multilingual Emoji Prediction

Download (84.7 MB)
Version 6 2018-06-12, 06:29
Version 5 2018-06-12, 06:14
Version 4 2018-06-12, 06:14
Version 3 2018-06-11, 11:35
Version 2 2018-06-11, 11:35
Version 1 2018-06-11, 11:05
dataset
posted on 2018-06-12, 06:29 authored by Rajat SinghRajat Singh, Nurendra ChoudharyNurendra Choudhary
This dataset is created by leveraging the social media platforms such as twitter for developing corpus across multiple languages. The corpus creation methodology is applicable for resource-scarce languages provided the speakers of that particular language are active users on social media platforms. We present an approach to extract social media microblogs such as tweets (Twitter). We created corpus for multilingual sentiment analysis and emoji prediction in Hindi, Bengali and Telugu. Further, we perform and analyze multiple NLP tasks utilizing the corpus to get interesting observations.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC