ci800406y_si_001.xls (633.5 kB)
Aqueous Solubility Prediction Based on Weighted Atom Type Counts and Solvent Accessible Surface Areas
dataset
posted on 2009-03-23, 00:00 authored by Junmei Wang, Tingjun Hou, Xiaojie XuIn this work, four reliable aqueous solubility models, ASM-ATC
(aqueous solubility model based on atom type counts), ASM-ATC-LOGP
(aqueous solubility model based on atom type counts and ClogP as an additional descriptor), ASM-SAS (aqueous solubility model
based on solvent accessible surface areas), and ASM-SAS-LOGP (aqueous
solubility model based on solvent accessible surface areas and ClogP as an additional descriptor), have been developed
for a diverse data set of 3664 compounds. All four models were extensively
validated by various cross-validation tests, and encouraging predictability
was achieved. ASM-ATC-LOGP, the best model, achieves leave-one-out
correlation coefficient square (q2) and
root-mean-square error (RMSE) of 0.832 and 0.840
logarithm unit, respectively. In a 10,000 times 85/15 cross-validation
test, this model achieves the mean of q2 and RMSE being 0.832 and 0.841 logarithm unit,
respectively. We believe that those robust models can serve as an
important rule in druglikeness analysis and an efficient filter in
prioritizing compound libraries prior to high throughput screenings
(HTS).