All Climate-Related Posts on Reddit from 2005-2021 and Derived Data
To analyze public discourse on climate change and global warming within the vast dataset of Reddit posts from 2005 to 2021, a rigorous filtering process was employed to isolate climate-related discussions. Starting with over 11.5 billion posts, a series of carefully designed regular expressions were used to identify and extract posts explicitly mentioning key terms and phrases associated with climate change. These included "climate change," "global warming," "carbon emissions," and references to significant environmental agreements like the Paris Accord and Kyoto Protocol. The expressions were crafted to capture a wide range of relevant discussions while excluding posts that mentioned "climate" in non-environmental contexts, such as "political climate" or "economic climate." This step was crucial in ensuring that the analysis focused solely on discussions pertinent to global environmental change.
After applying these filters, the dataset was narrowed down to approximately 15.3 million posts, representing just 0.134% of the original dataset. To further refine the data, language detection was performed using two independent libraries, Polyglot and LangDetect, to ensure that only English-language posts were included. This dual verification process resulted in a final dataset of approximately 1.5 million posts, all of which were confirmed to be in English.
The curated dataset was then subjected to detailed analysis, including sentiment analysis, polarity and subjectivity assessment, and readability evaluation. By focusing on this carefully selected subset of posts, the study was able to provide meaningful insights into how climate change and global warming are discussed across various communities on Reddit. This approach allowed for a nuanced understanding of public engagement with climate-related topics, revealing trends in sentiment, language complexity, and the shifting terminology used in these discussions over time.