figshare
Browse
1471-2105-6-88-3.jpg (45.67 kB)

The dictionary was constructed by considering the classification results of a particular term in different articles

Download (0 kB)
figure
posted on 2011-12-30, 23:43 authored by Lei Shi, Fabien Campagne

Copyright information:

Taken from "Building a protein name dictionary from full text: a machine learning term extraction approach"

BMC Bioinformatics 2005;6():88-88.

Published online 7 Apr 2005

PMCID:PMC1090555.

Copyright © 2005 Shi and Campagne; licensee BioMed Central Ltd.

Step 1: we filtered out terms that were predicted to be a protein in less than 75% of the articles where a prediction was made. For example, if term A appears in 4 articles and is classified as a protein name in 3 of them, term A is accepted in the dictionary. This process collected 61,312 terms. Step 2: we removed terms with two characters or less. Step 3: to remove ambiguity with protein names that are also common nouns, we filter the dictionary against the Webster's Revised Unabridged Dictionary (G & C. Merriam Co., 1913, edited by Noah Porter, provided by Patrick Cassidy of MICRA, Inc, and retrieved from ). We estimate that this edition contains about 80 common protein names (e.g., amylase). Step 4: we filter the dictionary against species names from the NCBI taxonomy database [30].

History

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC