A combined in silico approaches of 2D-QSAR, molecular docking, molecular dynamics and ADMET prediction of anti-cancer inhibitor activity for actinonin derivatives

Abstract Inhibition of human mitochondrial peptide deformylase (HsPDF) plays a major role in reducing growth, proliferation, and cellular cancer survival. In this work, a series of 32 actinonin derivatives for HsPDF (PDB: 3G5K) inhibitor’s anticancer activity was computationally analyzed for the first time, using an in silico study considering 2D-QSAR modeling, and molecular docking studies, and validated by molecular dynamics and ADMET properties. The results of multilinear regression (MLR) and artificial neural networks (ANN) statistical analysis reveal a good correlation between pIC50 activity and the seven (7) descriptors. The developed models were highly significant with cross-validation, the Y-randomization test and their applicability range. In addition, all considered data sets show that the AC30 compound, exhibits the best binding affinity (docking score = −212.074 kcal/mol and H-bonding energy = −15.879 kcal/mol). Furthermore, molecular dynamics simulations were performed at 500 ns, confirming the stability of the studied complexes under physiological conditions and validating the molecular docking results. Five selected actinonin derivatives (AC1, AC8, AC15, AC18 and AC30), exhibiting best docking score, were rationalized as potential leads for HsPDF inhibition, in well agreement with experimental outcomes. Furthermore, based on the in silico study, new six molecules (AC32, AC33, AC34, AC35, AC36 and AC37) were suggested as HsPDF inhibition candidates, which would be combined with in-vitro and in-vivo studies to perspective validation of their anticancer activity. Indeed, the ADMET predictions indicate that these six new ligands have demonstrated a fairly good drug-likeness profile.


Introduction
Cancer is the second-leading cause of death after heart disease in middle age.In 2020, more than 10 million new cancer deaths and 19 million new cancer cases were identified.There is no cure for cancer, which is at the origin of 17% of deaths worldwide, with treatment costs being high (Anderson et al., 2021;Sung et al., 2021).Breast and colorectal cancer are the most commonly diagnosed cancers worldwide, and their burden has increased over the last few decades.Recent research examines and describes the global burden of breast cancer in 2020 and projections for 2040, which are predicted to increase to over 3 million new cases and 1 million deaths per year (Arnold et al., 2022).Subsequent studies revealed that human peptide deformylase (HsPDF) proteins were excessively expressed in a variety of cancerous cells, such as breast, lung and colon cancer (Randhawa et al., 2013).
HsPDF was discovered in 2003 by Lee et al. (2003) and Serero et al. (2003), who concluded that HsPDF occurs in the mitochondria and is the main cause of the modification of 13 proteins encoded by mitochondrial DNA (mtDNA) (Escobar-Alvarez et al., 2010).Inhibition of HsPDF by natural products, for example, actinonin, or the reducing of HsPDF expression by siRNA, leads to the depolarization of the mitochondrial membrane and the loss of ATP and has been reported to result in decreased proliferation of cancer cells (Sheth et al., 2014).The results indicate that the HsPDF protein plays an important role in maintaining mitochondrial function.This could be crucial for cancer cell proliferation and could be an alternative target for cancer therapies.
Furthermore, human mitochondrial PDF has been proposed as a novel target for cancer therapies because of its critical role in the maturation of mitochondrial proteins and the need for specific cancer cell proliferation.Indeed, using SAR (structure-activity relationship) analysis, Hu et al. (2020) confirmed the oncogenic potential of mitochondrial PDF in the development of new drugs to treat cancer by creating potent HsPDF inhibitors.As reported by these authors, actinonin derivatives were demonstrated to inhibit the proliferation of a broad spectrum of human cancer cells in vitro (Hu et al., 2020).
According to the World Health Organization (WHO) 2020 report (Huang et al., 2022), despite significant advances in recent decades, speeding up the anticancer drug discovery process is critical.Notably, in-silico methods can substantially widen this search by providing tools able to predict the best drug-target binding affinities and rationalize the process by reducing duration, cost and attrition rate (Anbuselvam et al., 2021;Brogi et al., 2020;Gagic et al., 2020;Hassan Baig et al., 2016;Nagamalla & Kumar, 2021;Pirhadi et al., 2016;Shukla et al., 2020;Sidorov et al., 2019;Thafar et al., 2019;Wu et al., 2020, Khamouli et al., 2022, Nour et al., 2022).Furthermore, recent advances in computer hardware and software, such as the analysis of cellular 3D images and the development of novel routes for the synthesis of bioactive molecules, have had a significant impact on drug discovery (Cong et al., 2021;Walters & Barzilay, 2021).Indeed, computer-aided drug design (CADD) and cheminformatics emerged as efficient toolkits in the design and development of potent bioactive molecules as possible drugs for a wide range of diseases (Aminpour et al., 2019;Jorgensen, 2004;Karelson et al., 1996;Schneider, 2012;Zoete et al., 2009).
In the present study, a series of 32 actinonin derivatives (AC0-AC31) based on their observed biological (IC50/nM) activity (Hu et al., 2020), were computationally investigated for the first time, for their HsPDF inhibitory activity using in silico methods.Our in silico study was motivated by the improved observed inhibition bioactivity IC50 (nM) of the studied 32 actninonin derivatives (AC0-AC31) towards the HsPDF target.In selected cancer cell lines, these latter compounds have been shown to have potent HsPDF inhibition and significantly better antiproliferation activity than natural actinonin (Hu et al., 2020).
Finally, the best selected compounds were evaluated and rationalized for their pharmacological potency using Lipinski's rule of five, and absorption, distribution, metabolism, excretion and toxicity resuming the ADMET properties (Norinder & Bergstr€ om, 2006).The created in silico model was used to predict six new HsPDF inhibitors molecules, which are derived from the AC30 ligand.Our in silico study would pave the way for the perspective developing of new agents for potent HsPDF inhibitors via in vitro and in vivo assays.

QSAR analysis
The QSAR is defined as a statistical method that leads to the development of a robust mathematical model with the goal of establishing a relationship between a molecule's chemical structure, encoded as a set of structural and/or physicochemical features (descriptors), and its biological activity on a target (Karelson et al., 1996;V� azquez et al., 2020).A dataset of 33 molecules was selected from structurally similar derivatives of actinonin previously synthesized, and described as HsPDF anticancer inhibitors as reported in the literature (Hu et al., 2020).For such dataset, all experimental measured activity values IC50 (nM) (the half-maximal inhibitory concentration) were converted to the negative logarithm scale.
[pIC50 ¼ -log (IC50.10À 9 ) ¼ 9 À log 10 (IC50)] expressed in molar concentration (Divya et al., 2019;Shukla et al., 2020;Er-Rajy et al., 2022), was used as dependent variable for the 2D-QSAR study.Therefore, the corresponding pIC50 activity values range from 7.50 to 8.56, which fairly fit well with previous good anticancer inhibition activity (Shukla et al., 2020;Gagic et al., 2020;Huang et al., 2022).The selected structures were grouped into three categories (A, B and C) into a training set to create the quantitative models, and a test set to assess their predictive ability.
In Figure 1, the (R1, R2, R3) substituted 2D structures of the dataset actinonin derivatives were shown, and their corresponding chemical structures are reported in Table 1, with the observed activities IC50 (nM), and their converted pIC50 parameters.

Molecular descriptors calculations
All molecular structures were generated using the Avogadro software (Hanwell et al., 2012) and fully optimized using density functional theory (DFT) (Becke, 1992;Lee et al., 1988) by means of the AMS2022.107 release software (Baerends et al., 2014).The hybrid correlation functional B3LYP method (Clark et al., 1983;Francl et al., 1982;Gordon et al., 1982;Hariharan & Pople, 1973), combining the TZ2P basis set, was used in a vacuum with tight convergence criteria and without imposing any symmetrical restrictions.Subsequently, harmonic frequency calculations were carried out at the same level of theory (DFT/B3LYP/TZ2P) to assess their structures as true minima.Several types of descriptors were calculated using the AMS/ADF outputs, including quantum descriptors, natural bond orbital (NBO) analysis and reactivity indexes computed from DFT calculations.The pharmacokinetic factors were generated by the Lipinski rule, and other topological classes of descriptors were obtained with the SwissADME server website (available at http://www.swissadme.ch;Daina et al., 2017;Pires et al., 2015) available at (http://www.swissadme.ch) and DRAGON software (Tetko et al., 2005).

Statistical analysis, model development and validation
The multiple linear regression (MLR) and artificial neural networks (ANN) techniques provide useful statistical tools that quantify the relationship between dependent and independent variables.The software R (Team, 2020) was used to obtain the 2D-QSAR models.

Multilinear regression (MLR) analysis
Multilinear regression fits a linear model of the form (Papa et al., 2007): Y is the dependent variable (pIC50), a i Ù b Regression coefficients, X i Independent variables (molecular descriptors).

Artificial neural network (ANN)
The ANN model is based on the sigmoid function of three components constituting a neural network of three layers of neurons (Winkler, 2004).These latter are called the input layer (descriptors), the hidden layer and the output layer (pIC50), as shown in Figure 2 (Luque, 2018). where To confirm the stability of the predictive models (MLR and ANN) and to test the influence of each element of the training set on the final model, the cross-validation 'leave-1/3-out' (cv-LOO) technique (Guendouzi & Mekelleche, 2012), the applicability domain and Y-randomization have been applied to determine the model robustness and to rule out the possibility of any speculative correlation (Baerends et al., 2014;Becke, 1992;Clark et al., 1983;Daina et al., 2017;Francl et al., 1982;Gordon et al., 1982;Gramatica, 2007;Hanwell et al., 2012;Hariharan & Pople, 1973;Lee et al., 1988;Norinder & Bergstr€ om, 2006;Pires et al., 2015;R€ ucker et al., 2007;Team, 2020;Tetko et al., 2005).
Indeed, to evaluate the stability of the 2D-QSAR model, statistical parameters such as external and internal validation should be checked.Moreover, the MLR/ANN validation could be carried out by including the coefficient of determination (R 2 ), the adjusted coefficient of determination (R 2 adj ), the mean square error of the model (MSE), the coefficient of determination of the test set (R 2 test ) and the Y-randomization parameters to ensure the robustness of a model (R€ ucker et al., 2007).
Receiver Operating Characteristic (ROC) analysis is a statistical method that is commonly used in the field of quantitative structure-activity relationship (QSAR) modeling.In QSAR, the goal is to develop mathematical models that can predict the biological activity of a set of chemical compounds based on their molecular structure.ROC analysis is used to evaluate the performance of these predictive models and to assess their ability to discriminate between active and inactive compounds (Fawcett, 2006).

Molecular docking
The molecular docking technique (Cavasotto & Aucar, 2020) is also widely used in drug design, to explore the mode of  interaction between the target (protein or enzyme) and its inhibitors (ligand) and to find the most stable configuration that is similar to the bioactive reference ligand within the binding sites.Moreover, molecular docking is a key tool used to understand and predict molecular recognition in drug discovery (Schneider, 2012).The 3D crystal structure of HsPDF (3G5K) inhibitor was retrieved from the Research Collaborator for Structural Bioinformatics Protein Data Bank (RCSB PDB) (Escobar-Alvarez et al., 2009), with a good resolution of 1.7 Å.
All molecular docking simulations, including the 2D and 3D interactions between the ligands and the protein, were performed using Molegro Virtual Docker (MVD 2019.7.0 release) software (Bitencourt-Ferreira & de Azevedo, 2019).First, the protein structure was preprocessed and refined by removing the water molecules and ions.The cavities or active sites (pockets) were identified by the cavity prediction algorithm implemented in the MVD software.The three-dimensional of pocket (x ¼ 15.37 Å, y ¼ À 24.26 Å, z ¼ À 23.82 Å) and surface of 1818 Å 2 , are shown on the Figure 3.
The scoring function and the moldock search algorithm were optimized for the number of runs with a maximum number of iterations of 10,000, a population size of 90, and an energy threshold of 100.All compounds were docked into the binding pocket using the same parameters.To validate the docking technique, the root mean square deviation (RMSD) range must not exceed 2 Å (Westermaier et al., 2015).The ligand structure AC0 of the 3G5K target was used as the docking binding pocket reference region 2 Å (Westermaier et al., 2015).

Molecular dynamics
The initial conformations of the dynamics simulations were predicted by molecular docking from the best generated models extracted from pIC50 and the highest scoring docking function.The CHARMM-36-2019 release force field implemented in the Gromacs package (P� all et al., 2020;Vanommeslaeghe & MacKerell, 2012) was used to assess the best docked AC0 and AC30 ligands.The dynamics simulations for the 3G5k-AC0 and 3G5k-AC30 complexes were performed at 1 atm and 310.15K in a 6 nm long cubic box.The protein complexes were solvated using the water model TIP3P (transferable intermolecular potential 3 P) (Mark & Nilsson, 2001).The system was neutralized by adding four chlorine anions (Cl À ).For 50,000 steps, the steepest descent algorithm SD was used, followed by 5 ns for each canonical (NVT) and (NPT) isothermal isobaric ensemble equilibration.The molecular dynamics simulation was performed for 500 ns (250,000,000 steps and a time step of 2 fs).The resulting trajectories were analyzed by root mean square deviation (RMSD), root mean square variation (RMSF), radius of gyration (Rg) and protein-ligand interactions.The MVD and Pymol (Humphrey et al., 1996;Schrodinger, 2010) softwares were used for visualization of the structures.

2D-QSAR studies
The 2D-QSAR model was developed using the experimental values of the anticancer activity (pIC50) of the 32 selected molecules and the values of the seven descriptors (Independent Variable) as defined in the following: D1 ¼ D/Dr06: distance/detour ring index of order 6.D2 ¼ T(N.S): the sum of topological distances between N.S.D3 ¼ Mv: mean atomic Van der Waals volume.D4 ¼ nC: number of carbon atoms.D5 ¼ IDDE: mean information content on the distance degree of equality.D6 ¼ Jhetp: balaban-type index from polarizability.D7 ¼ Vindex: balaban V index.
The build 2D-QSAR model is based on the correlation matrix between the seven molecular descriptors (D1-D7) and the pIC50 parameters for inhibitory activity, and the results are reported in Table 2.This correlation matrix is a commonly used tool in QSAR modeling, which shows the pairwise linear relationships between variables in a dataset.The correlation matrix can be also used to identify descriptor pairs that are highly correlated with each other and might therefore be redundant, which can affect the performance of a QSAR model.By removing these redundant descriptors, the accuracy and interpretability of the model can be improved.
The results of Table 2 show that the relationship between the 7 descriptors, and the observed pIC50 activity (Table 1) appears poor having a correlation coefficients low (Al-Sha 'er et al., 2019).In order to better evaluate the QSAR model, and to validate the relationship between the 7 descriptors and bioactivity (pIC50), it is necessary to follow statistical parameters like MLR, ANN or ROC techniques, and to predict relevant properties of chemical compounds.Moreover, the relationships between descriptors and activities can be studied through various non-specialized statistical software to predict structurally improved biological properties.
The 32 active molecules were then randomly divided into three categories, A, B and C, of 11, 11 and 10, respectively, for the test and training sets, using MLR, ANN and ROC techniques.

MLR validation
For 2D-QSAR validation, due to its simplicity and reproducibility, we used the MLR method based on three criteria: the coefficient of determination (R 2 ) and its adjusted value (R 2 adj ), the root mean square error (RMSE) and the Fisher ratio value (F).
In order to validate the 2D-QSAR model, we have used the cross validation approach by applying the following steps: (1) all data points were ordered in the ascending order of the activities values.(2) The parent 32 data values were divided into three subsets (A, B and C): the first, fourth, seventh, etc. data points comprise the first subset (11 compounds for A), the second, fifth, eighth, etc. comprise the second subset (11 compounds for B), and the third, sixth, ninth, etc. comprise the third subset (10 compounds for C).
(3) Three new datasets were built using all combinations of the binary sums: 4) The standard QSAR modeling procedure including MLR method was applied to the three datasets obtained in step 3, that is, for each training set the correlation equation was derived with the same descriptors corresponding to the model.( 5). the general model was again validated using classical internal cross validation procedures: leave many-out.The procedure described above was applied to the complete data set of 32 points.Three training subsets are constructed with 22 compounds and the remaining 10 compounds were used as external validation datasets.
The predicted 2D-QSAR model of the training set for inhibitory activity (pIC50) is given by the following equation: From the training set equation, it appears that the seven descriptors (D1 À D7) are linearly correlated with the anticancer biological activity values (pIC50).It was discovered that the best-fitting constructed pIC50 model accounted for 74.6% of the experimental values.The total variance in the training set has a small error value (MSE ¼ 0.013) and a total correlated Fischer value of 88.1, indicating that the 2D-QSAR model is statistically acceptable.Moreover, the predictive ability of the evaluated 2D-QSAR model was verified by internal validations, which gave the correlation coefficient values of R 2 ¼ 0.74.For internal validation, the cv-LOO method was used, leading to average R 2 coefficients of the training and test sets equal to 0.736 and 0.744, respectively.
The validation of the 2D-QSAR MLR model by applying cv-LOO and Y-randomization tests is reported in Table 3.The predicted MLR.pIC50 data are compared to their experimental values, and are plotted on Figure 4.
One can notice from Figure 4, that the distribution of the observed and predicted pIC50 values are significantly correlated.The reliability, robustness and stability of the built 2D-QSAR model were further confirmed by the Y-randomization test (R 2 rand ¼ 0.059, R 2 adj ¼ 0.028, F ¼ 1.899).The MLR model was found to give satisfactory results for both the training and test sets.

Artificial neural networks (ANN) validation
To improve the relationship between the predicted and obtained activities, we built a nonlinear predictive model for the observed pIC50 anticancer activity, using the ANN architecture model (Winkler, 2004).This latter was applied by considering the selected seven molecular descriptors obtained by the MLR method as the input layer with a single hidden layer, and the pIC50 inhibitory activity as the output layer.Therefore, the 7-4-1 ANN architecture, where 4 is the number of neurons in the hidden layer, was adopted to construct the ANN model, which is based on the cv-LOO parameters (R 2 ¼ 0.776, q ¼ 0.701 and MSE ¼ 0.017), where q is the Spearman's rank correlation coefficient.The results are reported in Table 4.
The obtained results (Table 3) show that the cv-LOO approach for internal validation of the ANN technique, provides better correlation compared to the MLR one.Interestingly, the Y-randomization parameter (R 2 rand ¼ 0.230, q ¼ 0.051 and MSE ¼ 0.05), provides a slightly higher coefficient of determination R 2 than the MLR approach (0.776 vs. 0.746).These results show that the 2D-QSAR model developed by the ANN technique is in better agreement with the observed inhibitory activity (pIC50).The results of the predicted ANN pIC50 approach are depicted in Figure 5.
Nonetheless, the results (Figures 4 and 5) show that the two MLR and ANN provided adequate results to maintain the performance of the statistical 2D-QSAR model obtained.

ROC validation
In ROC analysis, a model is trained on a set of known active and inactive compounds, and the model's predictions are then compared to the actual activity values (Al-Sha 'er & Taha, 2021;Al-Sha'er et al., 2019;Al-Sha'er et al., 2023).The ROC curve is generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various thresholds.The area under the ROC curve (AUC) is used to summarize the overall performance of the model, with a value of 1 indicating perfect discrimination and a value of 0.5 indicating no discrimination.
In Figure 6, is shown the ROC analysis of the MLR 2D-QSAR model.Indeed, the area under the ROC curve (AUC) is 0.88, means that the binary classification model has a relatively good performance.An AUC close to 1 means excellent performance, while an AUC close to 0.5 means performance similar to a random choice.
However, it is important to note that interpreting AUC alone does not provide a complete picture of model performance.It is also important to consider the shape of the ROC curve and to compare sensitivity and specificity as the decision threshold is changed.It may also be useful to consider other performance metrics such as precision, recall and f1-score.

Domain of applicability
The domain of applicability seeks compounds that do not fall within the AD of the constructed QSAR model.The Williams plot is used to construct the model by taking the standardized residual values on the y-axis and the leverage values on the x-axis, as depicted on Figure 7.The number of descriptors is 7, the number of compounds in the training set is 32, and the normalized residue r is limited to 3, so the critical leverage value (h As shown by Figure 7, all the training and test set compounds are within the domain of applicability because their leverage values were less than the critical leverage (h* ¼ 0.75), and their standardized residuals were within ±3.

Molecular docking simulations
The molecular docking analysis was carried out on the 32 compounds, and the first step is to consider the structure of the reference ligand (AC0) which has been re-docked in the active site within the 3G5K protein (PDB ID: 3G5K) as shown on Figure 8.The best pose obtained gave a root mean square deviation (RMSD) value of 0.9 Å.The docking technique is considered satisfactory if the RMSD range does not exceed 2 Å (Westermaier et al., 2015).
After identifying the active site, molecular docking simulations of all five best-scoring compounds (AC30, AC23, AC27, AC28 and AC11) were performed using MVD.In Table 5, are reported the best docking score (kcal/mol), hydrogen bond energy (kcal/mol), and amino acid steric interaction for such ligands.All results for 32 complexes are reported in the supplementary table (SI).
The steric and hydrogen bonding 2D interactions stabilizing the complex, between the HsPDF protein and the AC30 ligand, are shown in Figure 9.
It is noteworthy that the binding energies of protein-ligand interactions are important in describing how the drug binds to the target protein and provide insights on the stability of the ligands.Indeed, the more negative the binding energy, the more potent the drug candidate is at triggering a biochemical reaction in the protein.Indeed, analysis of the docking result presented in Table 4 revealed that the five best ligands interacted well with the target protein in the molecular dock score order (kcal/mol) as follows: AC30 (À 212.074) > AC23 (À 203.332) > AC27 (À 195.414) > AC28 (À 195.089) > AC11 (À 193.939).
Notably, the ligand AC30, which exhibits higher inhibitory activity, has a relatively better binding profile with the target protein, with a total energy value of À 112.074 kcal/mol, including hydrogen bonding of À 15.8791 kcal/mol.The predicted best 2D interactions of ligand AC30 with the target 3g5k receptor (Figure 9), show that such a ligand penetrates well into the target site cavity with a good number of amino acid residues, e.g., Glu115-157, Gln57, Cys114 and Gly113-52, including conventional hydrogen bonds associated with the amino acids Trp149 and Cys114 with bond distances of 3.14 and 3.22 Å, respectively.Thus, the two hydrogen bonds have a very important effect on the stability of ligand AC30.

Molecular dynamics simulations
The 3g5k-AC0 and 3g5k-AC30 complexes with the highest binding affinity obtained from the molecular docking were subjected to 500 ns molecular dynamics (MD) simulations.The system protein, ligand, solvent and ions were equilibrated for 5 ns at NVT and NPT ensembles (1 at, 310 K).The stability of the complexes was examined during the MD simulations by calculating the RMSD, RMSF and Rg analysis, as shown on Figures 10, 11 and 12, respectively.
The RMSD analysis (Figure 10) of the protein-AC0 and protein-AC30 complexes show some variability in the initial time up to 500 ns, and then stabilize with average values of 0.18 and 0.38 nm for the AC0 and AC30 complexes, respectively.In addition, Figure 10 shows that the RMSD values are mainly below 0.5 nm, which indicates satisfactory stability during the simulation.
This high fluctuation of the RMSD values for all the complexes is not unexpected, regarding MD simulation and docking techniques which are often considered satisfactory if the RMSD deviations range must not exceed 2 Å (Westermaier et al., 2015).However, one can note that such cutoffs of 2 Å can be used to select representative structures in molecular dynamics simulations.Indeed, the interpretation of the RMSD/RMSF/Rg mean values is context-dependent and should be used alongside other analyses and experimental data to understand a molecular system (Divya et al., 2019;C¸evik et al., 2022;Suresh et al., 2023).For example, simulations of well-folded globular proteins, an RMSD of 1-2 Å (0.1-0.2 nm) is often considered of a good indicator.For simulations of intrinsically disordered or partially unfolded proteins, higher RMSDs can be observed, typically of 2-5 Å (0.2-0.5 nm) or more (Suresh et al., 2023;Garkusha et al., 2023).
Moreover, these results show that the 3G5K-AC0/AC30 compounds have significant affinity for residues present in the active site, as indicated by the docking simulation.
The average deviation of the atom in the simulation from a reference position was further shown by the root-meansquare fluctuation (RMSF) analysis (Figure 11).
In particular, the Rg analysis indicate both molecule's stability, structure dimensions and compactness (Tiwari et al., 2022).It is simply a measure of the distance between the center of mass of the protein atoms and its terminal in a given time step.In general, a stably folded protein tends to maintain a relatively less variation in Rg value, which determines its dynamic stability.In the present study, variation occurring in Rg value between 1.6 and 1.7 nm (Figure 12) shows that compactness of 3G5K-AC0/AC30 complexes is relatively stable (Garkusha et al., 2023).

Prediction of new inhibitors
The constructed MLR 2D-QSAR model was used to design a new actinonin derivatives based on the best AC30 docking score ligand.The molecular structures of the six new ligands (AC32, AC33, AC34, AC35, AC36 and AC37) are shown on Figure 13, with substitution in red at the same site, and the docking results are reported in Table 5.
The docking results (Table 5), show that the docking and binding score (kcal/mol) obtained for the newly six designed structures (AC32 À AC37), are slightly lesser than those obtained with the best selected AC30 ligand (Dock Score À 212.07;H-bond ¼ À 15.87 kcal/mol; Table 4).However, the predicted pIC50 values, are in well agreement with experimental bioactivity activity reported for the actual actinonin derivatives (Hu et al., 2020).Notably, these new compounds have a significant affinity with the HsPDF enzyme-receptors producing H-bond interactions at the target point of amino acid residues.These results also suggest that these compounds might have the ability to inhibit the HsPDF.

ADMET and drug likeness analysis
According to the Lipinski's rule of five (Lipinski et al., 2012), absorption or permeation is achievable when the molecular weight does not exceed 500 g/mol, the value of log P is lower than 5, and the molecule has at least 5 H-donor and 10 H-acceptor atoms.The Ghose filter (Ghose et al., 1999) defines drug-likeness constraints according to the following criteria: calculated log P is between À 0.4 and 5.6, MW is between 160 and 480, molar refractivity is between 40 and 130, and the total number of atoms is between 20 and 70.Drug-likeness constraints are defined by the Veber rule (Veber et al., 2002) as rotatable bond count (10) and polar surface area (PSA � 140).
Although the best docked score AC30 ligand, and those predicted (AC32-AC37) only exceed the limit MW by about 24 and ca.49 g/mol, respectively, they all meet Lipinski's rule of five, Veber's rule and Ghose's rule and have a bioavailability score of 55%, with a maximum number of H-bond donors of 3 HBD and H-bond acceptors of 10 HBA and a logP value less than 6.
Indeed, physicochemical properties such as solubility coefficient (logS) and lipophilicity or partition coefficient; (logP) play a major role of whether a drug can progress to be a successful drug candidate (Kwong, 2017).The results of the logP and logS values of almost designed compounds indicating, that they have a reasonable absorbency and moderately water soluble, regarding the acceptable score of lipophilicity (À 0.7 < logP < 5), and solubility (0 < logS < À 6) (Khamouli et al., 2022).Furthermore, the ADMET results (Table 6) show acceptable human gastrointestinal (GI) absorption, suggesting that these compounds are expected to exhibit good oral bioavailability.
Furthermore, the Ghose filter and Veber rule, which define drug likeness constraints, count at most 10 rotatable bonds and have a fairly good polar surface area (TPSA � 140) except for the AC11, indicating that the newly designed compounds are potent HsPDF inhibitors.
More intriguingly, the P-glycoprotein (P-gp) is a drug transporter that aids in the uptake and efflux of a variety of drugs.Based on ADMET results, all compounds are substrates for P-gp, however, from the report on the BBB criteria, it appears that all the selected compounds have a bad BBB permeability.
For metabolism, all compounds were predicted to be substrates for CYP450 3A4 except AC11, which meant that they might be metabolized by CYP 3A4.In addition, all compounds might not inhibit CYP450 1A2, CYP2C19 and CYP2C9 isoforms, but they might inhibit the CYP2D6 and CYP3A4 isoforms, including the new six predicted ligands, except the experimental AC11 ligand.
Finally, the results predicted above (Table 6) indicate that these compounds exhibited a good ADMET profile and good drug-likeness.

Conclusion
In this work, an in silico study combining 2D-QSAR, molecular docking, dynamics simulations and ADMET prediction has been performed to analyze for the first time the anticancer inhibitory potential of actinonin derivatives (AC0-AC31) against human mitochondrial peptide deformylase (HsPDF) cells.The quantitative analysis of the anticancer structureactivity relationship 2D-QSAR for the 32 actinonin derivatives, combining statistical multiple linear regression (MLR) and artificial neural network (ANN), and Receiver Operating Characteristic (ROC) techniques, showed good internal and external validation abilities and led to better insights for the design of new HsPDF (PDB: 3G5K) inhibitors.In addition to the best docked score of the actinonin derivatives (AC1, AC8, AC15, AC18 and AC30) at the target receptor, the built 2D-QSAR model was utilized to design six novel actinonin derivatives candidates (AC32-AC37) targeting HsPDF protein, with biological activity (pIC50) predicted by MLR model is found in good agreement with experimental values.The Molecular docking results of the newly six designed compounds revealed the binding affinity scores are relatively close to those obtained with the best selected AC30 ligand (Dock Score À 212.07;H-bond ¼ À 15.87 kcal/mol), with suggest that these compounds might have the ability to inhibit the HsPDF.Furthermore, molecular dynamic simulations suing the RMSD, RMSF and Rg parameters showed that the protein-actinonin complexes remained stable during the simulation time ranging from 0 to 500 ns.Moreover, the ADMET prediction of the pharmacokinetic properties for the best docking score actinonin derivatives, shows relatively good oral bioavailability but poor BBB permeability.The toxicity evaluation showed that the newly designed compounds exhibit low toxicity with respect to the reference ligand (AC30) and may be effective against HsPDF.
This in silico study will pave the way for researchers to discover anticancer drugs with high HsPDF inhibitory potency in the future in combination with in vitro and in vivo analysis.

Figure 2 .
Figure 2. Schematic diagram of 1 � N � M neural network to derive a 2D-QSAR model for pIC50 activity prediction.

Figure 3 .
Figure 3.The active site generated by MVD software.

Figure 5 .
Figure 5. ANN correlation between the observed pIC50 values and predicted activities.

Figure 6 .
Figure 6.ROC analysis of the MLR model QSAR.

Figure 7 .
Figure 7. William plot for the standardized residuals vs. leverage of MLR model.

Table 1 .
Chemical structures of studied actinonin derivatives with its observed activities IC50 (nM) and their pIC50 expressed in molar concentration.

Table 3 .
Table of internal validation of the MLR model we use the cv-LOO.

Table 4 .
Internal validation of the ANN model using the cv-LOO.

Table 5 .
Docking result of five compounds with Dock Score (kcal/mol), amino acids hydrogen bond interaction and amino acids steric interaction.

Table 5 .
Dock Score (kcal/mol) result of the new compounds AC32-AC37 compounds, with H-bond, amino acids H-bond interactions and amino acids steric interactions and their predicted pIC50.