Automated Identification and Conversion of Chemical Names to Structure Searchable Information
The communication of chemistry-related information occurs both via print and electronic media and chemical entities can appear as structure depictions or, more commonly, as systematic names (commonly either IUPAC or CAS names), as trade names or of one of a plethora of registry numbers (CAS, EINECS/ EC-number or others). The preferable form of communication for a chemist is via a depiction of the chemical structure with an electronic molecular connection table as its basis. Electronic representations of chemical structures are one of the informatics underpinnings for any organization operating in the domain of chemistry or biology and enable the creation of a structure/substructure searchable database of chemical structures and associated data and knowledge. There is an enormous wealth of information embedded inside both print and electronic documents in the form of chemical names and a means by which to convert those alphanumeric text descriptors into a more rich chemical structure representation has long been the mission of a large group of investigators. The challenges and hurdles to success are quite profound in their nature. We will review the present state of this research and the efforts underway to recover the value of information textually trapped in publications, patents, databases and Internet pages across the multiple domains of chemistry.