1/2
30 files

Wikipedia Talk Corpus

dataset
posted on 17.01.2017 by Ellery Wulczyn, Nithum Thain, Lucas Dixon
We provide a corpus of discussion comments from English Wikipedia talk pages. Comments are grouped into different files by year. Comments are generated by computing diffs over the full revision history and extracting the content added for each revision. See our wiki for documentation of the schema and our research paper for documentation on the data collection and processing methodology.

History

Licence

Exports

Licence

Exports