Automatic vs manual curation of a multisource chemical dictionary.pdf (153.03 kB)
Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining
journal contribution
posted on 2013-03-17, 19:49 authored by Antony WilliamsAntony Williams, Kristina Hettne, Erik M van Mulligen, Jos Kleinjans, Valery TkachenkoValery Tkachenko, Jan KorsPreviously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships.