Pedagogical Roles of Natural Language Processing Documents

<div><b>Description</b></div><div>To allow a computational exploration of the learning utility ("pedagogical value") between a learner and a document, we introduce the notion of "pedagogical roles" of documents as an intermediary component. This dataset is a novel annotated corpus of the pedagogical roles of documents from an expanded ACL Anthology corpus.</div><div><br></div><div>The current version includes the following pedagogical roles:</div><div><br></div><div>- Survey: Is this document a broad survey? A broad survey examines or compares across a broad concept.</div><div>- Tutorial: Is this document a tutorial? Tutorials describe a coherent process about how to use tools or understand a concept, and teach by example.</div><div>- Resource: Does this document describe the authors' implementation of a system, corpus, or other resource that has been distributed (e.g. public data sets or tools that have been released under an open source-license or are commercially available)?</div><div>- Reference Work: Is this document a collection of authoritative facts intended for others to refer to? Reports of novel, experimental results are not authoritative facts; the statement ``grass is green'' is. Reference Works describe different subtopics within a concept.</div><div>- Empirical Results: Does this document describe results of the authors' experiments?</div><div>- Software Manual: Is this document a manual describing how to use different components of a software?</div><div>- Other: Other role (This includes theoretical papers, papers that present a rebuttal for a claim, thought experiments, etc.)</div><div><br></div><div><b>Files</b></div><div>- annotations_raw_average.tsv: Averaged raw annotations. Each pedagogical role score is an average over all annotations of the role for the document.</div><div>- annotations_bin.tsv: Binarized version of the annotations. A document belongs to a pedagogical role if a majority of the annotators agree.</div><div>- pedagogical_roles.bib: Metadata of documents in annotated corpus. The documents with a source of "web-supplementary" are supplementary documents that were annotated internally.</div><div><br></div><div><b>Papers</b></div><div>If you use this dataset, please cite the following paper. We present annotation guidelines, analysis, and initial baseline classification results.</div><div><br></div><div>@InProceedings{ShengEtAl2017, </div><div> author = {Emily Sheng and Prem Natarajan and Jonathan Gordon and Gully Burns}, </div><div> year = {2017}, </div><div> title = {An Investigation into the Pedagogical Features of Documents}, </div><div> booktitle = {Proceedings of the 12th Workshop on Innovative Use of NLP for</div><div> Building Educational Applications} </div><div>}</div><div><br></div><div>Associated work that makes use of this corpus:</div><div><br></div><div>@InProceedings{GordonEtAl2017, </div><div> author = {Jonathan Gordon and Stephen Aguilar and Emily Sheng and Gully Burns}, </div><div> year = {2017}, </div><div> title = {Structured Generation of Technical Reading Lists}, </div><div> booktitle = {Proceedings of the 12th Workshop on Innovative Use of NLP for</div><div> Building Educational Applications} </div><div>}</div><div><br></div><div><b>Acknowledgements</b></div><div>This research is based upon work supported in part by the Office of</div><div>the Director of National Intelligence (ODNI), Intelligence Advanced</div><div>Research Projects Activity (IARPA), via Air Force Research Laboratory</div><div>(AFRL). The views and conclusions contained herein are those of the</div><div>authors and should not be interpreted as necessarily representing the</div><div>official policies or endorsements, either expressed or implied, of</div><div>ODNI, IARPA, AFRL, or the U.S. Government. The U.S. Government is</div><div>authorized to reproduce and distribute reprints for Governmental</div><div>purposes notwithstanding any copyright annotation thereon.</div>