Training, development and test sets for supervised named entity recognition for materials science. The data is labelled using the IOB annotation scheme. There exist 7 entity tags: material (MAT), sample descriptor (DSC), symmetry/phase label (SPL), property (PRO), application (APL), synthesis method (SMT), and characterization method (CMT), along with the outside tag (O).
The data consists of 800 hand-labelled materials science abstracts. The data has an 80-10-10 split, giving 640 abstracts in the training set, 80 in the development set, and 80 in the test set.
Funding
This work was supported by Toyota Research Institute through the Accelerated Materials Design and Discovery program.