Angiotensin-converting enzyme inhibitors (ACEIs) play
a crucial
role in treating conditions such as hypertension, heart failure, and
kidney diseases. Nevertheless, the ACEIs currently available on the
market are linked to a variety of adverse effects including renal
insufficiency, which restricts their usage. There is thus an urgent
need to optimize the currently available ACEIs. This study represents
a structure–activity relationship investigation of ACEIs, employing
machine learning to analyze data sets sourced from the ChEMBL database.
Exploratory data analysis was performed to visualize the physicochemical
properties of compounds by investigating the distributions, patterns,
and statistical significance among the different bioactivity groups.
Further scaffold analysis has identified 9 representative Murcko scaffolds
with frequencies ≥10. Scaffold diversity has revealed that
active ACEIs had more scaffold diversity than their intermediate and
inactive counterparts, thereby indicating the significance of performing
lead optimization on scaffolds of active ACEIs. Scaffolds 1, 3, 6,
and 8 are unfavorable in comparison with scaffolds 2, 3, 5, 7, and
9. QSAR investigation of compiled data sets consisting of 549 compounds
led to the selection of Mordred descriptor and Random Forest algorithm
as the best model, which afforded robust model performance (accuracy:
0.981, 0.77, and 0.745; MCC: 0.972, 0.658, and 0.617 for the training
set, 10-fold cross-validation set, and testing set, respectively).
To enhance the model’s robustness and predictability, we reduced
the chemical diversity of the input compounds by using the 9 most
prevalent Murcko scaffold-matched compounds (comprising a total of
168) followed by a subsequent QSAR model investigation using Mordred
descriptor and extremely gradient boost algorithm (accuracy: 0.973,
0.849, and 0.823; MCC: 0.959, 0.786, and 0.742 for the training set,
10-fold cross-validation set, and testing set, respectively). Further
illustration of the structure–activity relationship using SALI
plots has enabled the identification of clusters of compounds that
create activity cliffs. These findings, as presented in this study,
contribute to the advancement of drug discovery and the optimization
of ACEIs.