figshare
Browse
1/1
3 files

Materials Science Named Entity Recognition: train/development/test sets

dataset
posted on 2019-06-04, 17:45 authored by Leigh WestonLeigh Weston
Training, development and test sets for supervised named entity recognition for materials science. The data is labelled using the IOB annotation scheme. There exist 7 entity tags: material (MAT), sample descriptor (DSC), symmetry/phase label (SPL), property (PRO), application (APL), synthesis method (SMT), and characterization method (CMT), along with the outside tag (O).

The data consists of 800 hand-labelled materials science abstracts. The data has an 80-10-10 split, giving 640 abstracts in the training set, 80 in the development set, and 80 in the test set.

Funding

This work was supported by Toyota Research Institute through the Accelerated Materials Design and Discovery program.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC