Wikipedia Talk Corpus

Version 3 2017-01-17, 21:50

Version 2 2016-12-13, 20:27

Version 1 2016-12-03, 17:58

dataset

posted on 2017-01-17, 21:50 authored by Ellery WulczynEllery Wulczyn, Nithum ThainNithum Thain, Lucas DixonLucas Dixon

We provide a corpus of discussion comments from English Wikipedia talk pages. Comments are grouped into different files by year. Comments are generated by computing diffs over the full revision history and extracting the content added for each revision. See our wiki for documentation of the schema and our research paper for documentation on the data collection and processing methodology.