ci700351y_si_003.pdf (79.71 kB)
Gradual in Silico Filtering for Druglike Substances
journal contribution
posted on 2008-03-24, 00:00 authored by Nadine Schneider, Christine Jäckels, Claudia Andres, Michael C. HutterThe suitability of decision trees in comparison to support vector machines for the classification of chemical
compounds into drugs and nondrugs was investigated. To account for the requirements upon screening
virtual compound libraries, schemes for successive filtering steps with gradual increasing computational
cost are outlined. The obtained prediction accuracy was similar between decision trees and support vector
machine approaches for the applied compound data sets. By using rapidly computable variables such as
druglikeness indices, XlogP, and the molar refractivity, at least 39% of the nondrugs can be filtered out,
while retaining more than 83% of the actual drugs. Computationally more demanding descriptors such as
specific substructure queries and quantum chemically derived variables can be postponed to subsequent
classification schemes for the reduced set of compounds, whereby up to 92% of the nondrugs can be sorted
out without loosing considerably more drugs. Using all available computed descriptors simultaneously in
the first step did not yield significantly better results. Furthermore, the generated decision trees are used to
derive guidelines for the design of druglike substances. The numerical margins found at the branching points
suggest several criteria that separate drugs from nondrugs: a molecular weight higher than 230, a molar
refractivity higher than 40, and the presence of one or more rings as well as one or more functional groups.
Also reported are additionally required parameters to compute values for XlogP, SlogP, and the molar
refractivity of boron and silicon containing compounds.