posted on 2019-08-27, 15:07authored byZi-Yi Yang, Zhi-Jiang Yang, Jie Dong, Liang-Liang Wang, Liu-Xia Zhang, Jun-Jie Ding, Xiao-Qin Ding, Ai-Ping Lu, Ting-Jun Hou, Dong-Sheng Cao
Aggregation has been posing a great
challenge in drug discovery.
Current computational approaches aiming to filter out aggregated molecules
based on their similarity to known aggregators, such as Aggregator
Advisor, have low prediction accuracy, and therefore development of
reliable in silico models to detect aggregators is highly desirable.
In this study, we built a data set consisting of 12 119 aggregators
and 24 172 drugs or drug candidates and then developed a group
of classification models based on the combination of two ensemble
learning approaches and five types of molecular representations. The
best model yielded an accuracy of 0.950 and an area under the curve
(AUC) value of 0.987 for the training set, and an accuracy of 0.937
and an AUC of 0.976 for the test set. The best model also gave reliable
predictions to the external validation set with 5681 aggregators since
80% of molecules were predicted to be aggregators with a prediction
probability higher than 0.9. More importantly, we explored the relationship
between colloidal aggregation and molecular features, and generalized
a set of simple rules to detect aggregators. Molecular features, such
as log D, the number of hydroxyl groups, the number
of aromatic carbons attached to a hydrogen atom, and the number of
sulfur atoms in aromatic heterocycles, would be helpful to distinguish
aggregators from nonaggregators. A comparison with numerous existing
druglikeness and aggregation filtering rules and models used in virtual
screening verified the high reliability of the model and rules proposed
in this study. We also used the model to screen several curated chemical
databases, and almost 20% of molecules in the evaluated databases
were predicted as aggregators, highlighting the potential high risk
of aggregation in screening. Finally, we developed an online Web server
of ChemAGG (http://admet.scbdd.com/ChemAGG/index), which offers a freely available tool to detect aggregators.