Similarity Boosted Quantitative Structure–Activity RelationshipA Systematic Study of Enhancing Structural Descriptors by Molecular Similarity

The concept of molecular similarity is one of the most central in the fields of predictive toxicology and quantitative structure–activity relationship (QSAR) research. Many toxicological responses result from a multimechanistic process and, consequently, structural diversity among the active compounds is likely. Combining this knowledge, we introduce similarity boosted QSAR modeling, where we calculate molecular descriptors using similarities with respect to representative reference compounds to aid a statistical learning algorithm in distinguishing between different structural classes. We present three approaches for the selection of reference compounds, one by literature search and two by clustering. Our experimental evaluation on seven publicly available data sets shows that the similarity descriptors used on their own perform quite well compared to structural descriptors. We show that the combination of similarity and structural descriptors enhances the performance and that a simple stacking approach is able to use the complementary information encoded by the different descriptor sets to further improve predictive results. All software necessary for our experiments is available within the cheminformatics software framework AZOrange.