figshare
Browse
1/1
6 files

Entity Normalization

dataset
posted on 2019-06-04, 17:48 authored by Leigh WestonLeigh Weston
These json documents contain mappings for materials science entity normalization. Each entity is mapped onto the most frequently occurring synonym that is not an acronym.

We provide entity normalization for materials science properties (pro), applications (apl), sample descriptors (dsc), symmetry/phase labels (spl), synthesis methods (smt), and characterization methods (cmt).

Each term will have a "most common" entity to which it can be mapped. Sub entities are also included which have also been normalized.

*Please note: entities that occur infrequently in our corpus are unlikely to be normalized (and less likely to be normalized correctly). In-line with Zipf's law for NLP, infrequently occurring entities make up the largest portion of unique entities in the corpus, and so a large fraction of entiites in these json files are not normalized. However, frequently occurring terms like "XRD" are very likely to be normalized and should be normalized correctly.

Funding

This work was supported by Toyota Research Institute through the Accelerated Materials Design and Discovery program.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC