Russian university student comments in VKontakte online communities according to their migration status
The dataset in this research includes data collected from student online communities on VKontakte, spanning from September 2021 to July 2022. This data encompasses posts and comments from 297 groups where students from 208 Russian universities engaged. Initially, 10,099 unique comments were collected. After filtering out bots and fake profiles, the dataset was refined to 1,595 relevant comments.
Each comment in the dataset is accompanied by metadata, including the number of likes, shares, views, user IDs, hometown, and current residence of the users. The user verification process involved analyzing publication activity, reactions to content (likes and shares), time spent on the social network, and validating location data through the analysis of internet groups and regional community subscriptions.
The comments were classified into six key themes using text classification tools. Since some comments pertained to multiple themes, a separate row was added for each combination, ensuring comprehensive coverage of the topics. Thus, the final number of rows included in the dataset is 2,351.
In addition to thematic classification, the dataset includes information on the migration status of students. If a student's city of origin matched the city of their university, the comment was identified as from a local student (374 comments, 23%). Conversely, if the city of origin differed, the comment was from a visiting student (1,221 comments, 77%).
The data is presented in Russian. Please note that the texts have not been censored, posts and comments may contain inappropriate language.