1/1

4 files

Pedagogical Roles of Natural Language Processing Documents

dataset

posted on 2017-07-14, 18:17 authored by Emily ShengEmily Sheng, pnataraj@isi.edu, Jonathan GordonJonathan Gordon, burns@isi.edu

Description

To allow a computational exploration of the learning utility ("pedagogical value") between a learner and a document, we introduce the notion of "pedagogical roles" of documents as an intermediary component. This dataset is a novel annotated corpus of the pedagogical roles of documents from an expanded ACL Anthology corpus.

The current version includes the following pedagogical roles:

- Survey: Is this document a broad survey? A broad survey examines or compares across a broad concept.

- Tutorial: Is this document a tutorial? Tutorials describe a coherent process about how to use tools or understand a concept, and teach by example.

- Resource: Does this document describe the authors' implementation of a system, corpus, or other resource that has been distributed (e.g. public data sets or tools that have been released under an open source-license or are commercially available)?

- Reference Work: Is this document a collection of authoritative facts intended for others to refer to? Reports of novel, experimental results are not authoritative facts; the statement ``grass is green'' is. Reference Works describe different subtopics within a concept.

- Empirical Results: Does this document describe results of the authors' experiments?

- Software Manual: Is this document a manual describing how to use different components of a software?

- Other: Other role (This includes theoretical papers, papers that present a rebuttal for a claim, thought experiments, etc.)

Files

- annotations_raw_average.tsv: Averaged raw annotations. Each pedagogical role score is an average over all annotations of the role for the document.

- annotations_bin.tsv: Binarized version of the annotations. A document belongs to a pedagogical role if a majority of the annotators agree.

- pedagogical_roles.bib: Metadata of documents in annotated corpus. The documents with a source of "web-supplementary" are supplementary documents that were annotated internally.

Papers

If you use this dataset, please cite the following paper. We present annotation guidelines, analysis, and initial baseline classification results.

@InProceedings{ShengEtAl2017,

author = {Emily Sheng and Prem Natarajan and Jonathan Gordon and Gully Burns},

year = {2017},

title = {An Investigation into the Pedagogical Features of Documents},

booktitle = {Proceedings of the 12th Workshop on Innovative Use of NLP for

Building Educational Applications}

}

Associated work that makes use of this corpus:

@InProceedings{GordonEtAl2017,

author = {Jonathan Gordon and Stephen Aguilar and Emily Sheng and Gully Burns},

year = {2017},

title = {Structured Generation of Technical Reading Lists},

booktitle = {Proceedings of the 12th Workshop on Innovative Use of NLP for

Building Educational Applications}

}

Acknowledgements

This research is based upon work supported in part by the Office of

the Director of National Intelligence (ODNI), Intelligence Advanced

Research Projects Activity (IARPA), via Air Force Research Laboratory

(AFRL). The views and conclusions contained herein are those of the

authors and should not be interpreted as necessarily representing the

official policies or endorsements, either expressed or implied, of

ODNI, IARPA, AFRL, or the U.S. Government. The U.S. Government is

authorized to reproduce and distribute reprints for Governmental

purposes notwithstanding any copyright annotation thereon.

Funding

FA8650-15-C-9102

History

Usage metrics

Keywords

pedagogical roles pedagogical value scientific documents reading list generation Natural Language Processing Education systems not elsewhere classified

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM