Wikipedia Talk Corpus

We provide a corpus of discussion comments from English Wikipedia talk pages. Comments are grouped into different files by year. Comments are generated by computing diffs over the full revision history and extracting the content added for each revision. See our <a href="https://meta.wikimedia.org/wiki/Research:Detox/Data_Release">wiki</a> for documentation of the schema and our <a href="https://arxiv.org/abs/1610.08914">research paper</a> for documentation on the data collection and processing methodology.