Distributed Representation of Chemical Fragments

Chakravarti, Suman K.

Distributed Representation of Chemical Fragments

Posted on 2018-03-08 - 19:16

This article describes an unsupervised machine learning method for computing distributed vector representation of molecular fragments. These vectors encode fragment features in a continuous high-dimensional space and enable similarity computation between individual fragments, even for small fragments with only two heavy atoms. The method is based on a word embedding algorithm borrowed from natural language processing field, and approximately 6 million unlabeled PubChem chemicals were used for training. The resulting dense fragment vectors are in contrast to the traditional sparse “one-hot” fragment representation and capture rich relational structure in the fragment space. The vectors of small linear fragments were averaged to yield distributed vectors of bigger fragments and molecules, which were used for different tasks, e.g., clustering, ligand recall, and quantitative structure–activity relationship modeling. The distributed vectors were found to be better at clustering ring systems and recall of kinase ligands as compared to standard binary fingerprints. This work demonstrates unsupervised learning of fragment chemistry from large sets of unlabeled chemical structures and subsequent application to supervised training on relatively small data sets of labeled chemicals.

CITE THIS COLLECTION

DataCite

3 Biotech

3D Printing in Medicine

3D Research

3D-Printed Materials and Systems

4OR

AAPG Bulletin

AAPS Open

AAPS PharmSciTech

Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg

ABI Technik (German)

Academic Medicine

Academic Pediatrics

Academic Psychiatry

Academic Questions

Academy of Management Discoveries

Academy of Management Journal

Academy of Management Learning and Education

Academy of Management Perspectives

Academy of Management Proceedings

Academy of Management Review

Chakravarti, Suman K. (2018). Distributed Representation of Chemical Fragments. ACS Publications. Collection. https://doi.org/10.1021/acsomega.7b02045

https://doi.org/10.1021/acsomega.7b02045

or

Select your citation style and then place your mouse over the citation text to select it.

SHARE

email

Search Collections

need help?

Distributed Representation of Chemical Fragments

CITE THIS COLLECTION

SHARE

Usage metrics

Read the peer-reviewed publication

AUTHORS (1)

CATEGORIES

KEYWORDS