PersonalityTypes.csv (1.64 MB)

Self-Reported Myers-Briggs Personality Types on Twitter

dataset

posted on 2023-07-03, 21:32 authored by Joshua WattJoshua Watt

We collected the data for our analysis by utilising the academic Twitter API (V2). The four-letter acronyms associated with the Myers-Briggs Type Indicator (MBTI) give people a short categorisation of their personality that is easily self-reported on social media in the form of a regular expression. As a result, people are much more likely to self-report their categorical MBTI rather than other personality types. The four letter MBTI acronyms are also unique to the Myers-Briggs questionnaire, meaning they can be easily queried using the Twitter API. This also means these personality types won't be confused with any other acronym or word, reducing the likelihood we incorrectly classify any users. When we initially explored Twitter, we found that some users self-reported their personality type in their biography and other users would self-report their personality types in their tweets. As a result, we formulated two methods for querying and labelling the Myers-Briggs personality type of accounts. We describe the two methods below:

Firstly, we used Tweepy's 'search_users' endpoint to obtain the set of users who currently self-report their MBTI in their username or biography. Due to the rate limits associated with this endpoint we were limited to obtaining no more than 1000 users for each unique search query.

Secondly, we used the Twitter API's 'full_archive_search' endpoint to obtain the set of users who self-reported their Myers-Briggs personality type in a Tweet since Twitter's creation (March 26, 2006). We searched for users who tweeted any of the three regular expressions, followed by their personality type: 'I am...', 'I am a...' or 'I am an...'. Note that we only searched for self-reports in Tweets and excluded Retweets, Quotes and Replies in our query due to these having a much higher potential of incorrectly labelling an account. Furthermore, we were bound by rate limits of 300 requests per 15-minute window, however there were no hard bounds on the number of tweets or users we could obtain. As a result, we ran this query for each personality type until the search was exhausted.

Note that in both cases, the queries were not case-sensitive.

In the attached dataset, we provide both the Twitter User IDs and the Myers-Briggs Personality Types associated with the 68,958 users obtained using the two methods discussed above. We provide this dataset prior to any preprocessing steps performed in our paper.