Wikipedia Talk Corpus
datasetposted on 17.01.2017 by Ellery Wulczyn, Nithum Thain, Lucas Dixon
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
We provide a corpus of discussion comments from English Wikipedia talk pages. Comments are grouped into different files by year. Comments are generated by computing diffs over the full revision history and extracting the content added for each revision. See our wiki for documentation of the schema and our research paper for documentation on the data collection and processing methodology.