This
data set includes over 100k labeled discussion comments from English
Wikipedia. Each comment was labeled by multiple annotators via
Crowdflower on whether it contains a personal attack. We also include some demographic data for each crowd-worker. See our wiki for documentation of the schema of each file and our research paper for
documentation on the data collection and modeling methodology. For a
quick demo of how to use the data for model building and analysis, check
out this ipython notebook.