10.1021/ci800406y.s001 Junmei Wang Junmei Wang Tingjun Hou Tingjun Hou Xiaojie Xu Xiaojie Xu Aqueous Solubility Prediction Based on Weighted Atom Type Counts and Solvent Accessible Surface Areas American Chemical Society 2009 prioritizing compound libraries Aqueous Solubility Prediction solubility model HTS RMSE Weighted Atom Type Counts 0.841 logarithm unit Solvent Accessible Surface AreasIn 0.840 logarithm unit atom type counts 2009-03-23 00:00:00 Dataset https://acs.figshare.com/articles/dataset/Aqueous_Solubility_Prediction_Based_on_Weighted_Atom_Type_Counts_and_Solvent_Accessible_Surface_Areas/2869495 In this work, four reliable aqueous solubility models, ASM-ATC (aqueous solubility model based on atom type counts), ASM-ATC-LOGP (aqueous solubility model based on atom type counts and <i>ClogP</i> as an additional descriptor), ASM-SAS (aqueous solubility model based on solvent accessible surface areas), and ASM-SAS-LOGP (aqueous solubility model based on solvent accessible surface areas and <i>ClogP</i> as an additional descriptor), have been developed for a diverse data set of 3664 compounds. All four models were extensively validated by various cross-validation tests, and encouraging predictability was achieved. ASM-ATC-LOGP, the best model, achieves leave-one-out correlation coefficient square (<i>q</i><sup>2</sup>) and root-mean-square error (<i>RMSE</i>) of 0.832 and 0.840 logarithm unit, respectively. In a 10,000 times 85/15 cross-validation test, this model achieves the mean of <i>q</i><sup>2</sup> and <i>RMSE</i> being 0.832 and 0.841 logarithm unit, respectively. We believe that those robust models can serve as an important rule in druglikeness analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).