ARCHIVE
comments_article_2001.tar.gz (1.73 MB)
ARCHIVE
comments_article_2002.tar.gz (10.35 MB)
ARCHIVE
comments_article_2003.tar.gz (22.73 MB)
ARCHIVE
comments_user_2001.tar.gz (60.89 kB)
ARCHIVE
comments_user_2002.tar.gz (1.81 MB)
ARCHIVE
comments_user_2003.tar.gz (10.47 MB)
ARCHIVE
comments_user_2004.tar.gz (44.12 MB)
ARCHIVE
comments_article_2004.tar.gz (84.24 MB)
ARCHIVE
comments_article_2005.tar.gz (267.08 MB)
ARCHIVE
comments_article_2006.tar.gz (677.74 MB)
ARCHIVE
comments_article_2007.tar.gz (754.4 MB)
ARCHIVE
comments_article_2008.tar.gz (669.86 MB)
ARCHIVE
comments_article_2009.tar.gz (573.56 MB)
ARCHIVE
comments_article_2010.tar.gz (502.83 MB)
ARCHIVE
comments_user_2005.tar.gz (165.23 MB)
ARCHIVE
comments_user_2006.tar.gz (734.34 MB)
ARCHIVE
comments_user_2007.tar.gz (1.24 GB)
ARCHIVE
comments_user_2008.tar.gz (1.39 GB)
ARCHIVE
comments_user_2009.tar.gz (1.25 GB)
ARCHIVE
comments_user_2010.tar.gz (1.17 GB)
1/0
Wikipedia Talk Corpus
Version 3 2017-01-17, 21:50Version 3 2017-01-17, 21:50
Version 2 2016-12-13, 20:27Version 2 2016-12-13, 20:27
Version 1 2016-12-03, 17:58Version 1 2016-12-03, 17:58
dataset
posted on 2017-01-17, 21:50 authored by Ellery WulczynEllery Wulczyn, Nithum ThainNithum Thain, Lucas DixonLucas DixonWe provide a corpus of discussion comments from English Wikipedia talk pages. Comments are grouped into different files by year. Comments are generated by computing diffs over the full revision history and extracting the content added for each revision. See our wiki for documentation of the schema and our research paper for documentation on the data collection and processing methodology.
History
Usage metrics
Categories
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC