Molecular modelling of antiproliferative inhibitors based on SMILES descriptors using Monte-Carlo method, docking, MD simulations and ADME/Tox studies

ABSTRACT Cancer is one of the greatest challenges that worry the minds of scientists and threatens human life. Despite the presence of several drugs on the market, their effectiveness remains limited by its resistance. In this research, the Monte Carlo approach was used for QSAR modelling applying the representation of the molecular structure by the SMILES and optimal molecular descriptors. Correlation Ideality (IIC) and Correlation Contradiction Index (CCI)) were introduced as validation parameters to further estimate the predictive ability of the developed models. The statistical quality of the model developed with (IIC) was good compared to those without (IIC). The best QSAR model of the following statistical parameters: (R²train= 0.816, R²valid = 0.825) was selected to generate the activity-increasing and decreasing promoters studied, and these are the basis for the design of seven new candidates as an antiproliferative inhibitory agent. Additionally, the newly designed molecules, the most active compound in the dataset, erlotinib and melphalan as control compounds were docked in the EGFR receptor binding site. The docking results discovered that the proposed candidates had the highest potential and energy affinity. Besides, Lipinski’s Rule of Five, Synthetic Accessibility and ADME/Tox were performed to assess bioavailability and drug-likeness of proposed compounds. In addition, MD simulation accompanied by MM-PBSA analysis discovered that compound A1 and the screened compounds were stable and did not show significant fluctuations throughout the simulation time. Generally, this research showed that the selected model well explains the antiproliferative activity and also that the proposed compounds have high activity, good binding affinity and stable conformation with the reported target protein.


Introduction
According to NICPR (National Institute for Cancer Prevention and Research), more than 2.25 million people in India suffer from cancer and more than one million and 157,000 new cancer patients are registered every year. In 2018, more than 78,000 people died from cancer. The main problem with cancer is that the expression of DNA is disrupted and its growth causes the displacement of normal tissue. Resistance to anti-cancer drugs remains a problem that makes it difficult to contain cancer. For this reason, there is a need for the development of new candidates as anti-cancer agents. Drug development is a very complex, time-consuming and expensive process, where it requires a large amount of material and financial resources. Especially, in light of the decline of practical animal experiments. The epidermal growth factor receptor (EGFR) is a protein located on the surface of cells. Among his roles is helping cells to grow and divide. As a result inhibition of EGFR can induce blockage of cancerous cell growth [1]. More EGFR inhibitors have been developed, targeting the intracellular tyrosine kinase binding domain with small molecules, or the extracellular receptor domain with antibodies [2]. Consequently, anti-EGFR agents are recommended by international guidelines as the first-line treatment in patients with advanced EGFR mutations, due to the higher efficacy and safety of these agents rather than the standard chemotherapy [3,4]. Compounds containing 1,3,4-thiadiazole have been reported for various pharmacological activities such as anti-microbial [5,6], anti-cancer [7], anti-oxidant [8], antiinflammatory [9], anti-tuberculosis [10][11][12] anti-leishmania [13], anti-diabetic [14].
The use of computational approaches for drug design called Rational Drug Design [15], has created a fertile field for certain in silico methods including QSAR, ADME/Tox, molecular docking and molecular dynamics simulations to become a target and very interesting in the design of new drugs and also in the estimation of their biological profile. This strategy can reduce the cost in case of failure of a drug candidate at higher (clinical) degrees by filtering combinatorial libraries, rejecting these molecules with expected toxic effect and disfavouring pharmacokinetic profiles, thus decreasing the number of experiments. The quantitative structure-activity relationship (QSAR) is one of the most cost-effective and predictive rational drug design techniques that mathematically correlate the structural characteristics of a molecule with its activity [16,17]. The literature has shown that SMILES-based QSAR studies have incredible predictive ability [18][19][20][21][22][23][24]. The scenario of the ongoing research is to explain the reliable predictive relationship of the antiproliferative bioactivity of thiazolidine-based molecules using the CORAL software based on the Monte Carlo algorithm has been widely used for the design of QSAR models [25][26][27][28][29][30], which applies the SMILES code of molecules to calculate the correlation weight descriptor (DCW) and QSAR models are developed using the Monte Carlo algorithm [31][32][33]. This process is applied by different research groups for a large number of varied, medicinal parameters and biochemical and physico-chemical [27,28]. Predictability is the most important criterion for QSAR model development process. To determine this criterion, many statistical methods have been reported in the literature. However, none of them are able to estimate the predictive ability of the QSAR model individually and all of them are associated with each other. A new predictability parameter, the Correlation Ideality Index (CII), has recently been suggested, which is based on correlation coefficient and mean absolute error [26][27][28]. Further, in 2019, the same authors introduced a new prediction parameter, namely the correlation contradiction index (CCI) [29]. It has been shown that there is a good correlation between the validation correlation coefficient and the ICC. In this work, the comparison of IIC and ICC was performed to select the best model, thus to study the additional effect of IIC on ICC.
The molecular docking technique is widely used to discover the conformation adopted by means of ligands inside the binding pocket(s) of macromolecular targets and hence to explain the mechanism of binding between the ligand and the receptor. Molecular dynamic simulation is a useful tool to obtain more precise information and understand the dynamic behaviour of ligand-receptor interaction [34] and estimate protein stability [35]. The MM-GBSA method can calculate the binding free energy of the protein-ligand complex and obtain the energy contribution of different interactions to the total binding free energy [36]. Poor ADME/Tox properties of drugs are the main reason for failed drug development; ADME/Tox parameters have been predicted to determine the pharmacokinetics and bioavailability of proposed candidates for estimating the disposition of a pharmaceutical compound in an organism such as a drug, which can provide guidance for molecule design and minimise the drug development failure rate [37].
Thus, in the present research, the detailed analysis of antiproliferative thiazolidine compounds were studied, by the integrated Monte Carlo method of CORAL software was used to build predictive QSAR models. Finally, promoter inhibitory structural fragments were also extracted and applied to propose new, more potent antiproliferative candidates. For enriched QSAR study of the molecular docking procedures, molecular dynamics simulations and MM-GBSA studies were performed. Further, the results obtained, have been compared by two commercial chemotherapeutic drugs are erlotinib [38,39] and melphalan [40,41] (Figure 1). In clinical practice, Melphalan enters into the treatment of different cancers especially multiple myeloma [42]. Erlotinib, an EGFR tyrosine kinase inhibitor, it has been approved to be used with gemcitabine chemotherapy to treat advanced pancreatic cancer [43]. All the results obtained above would help us to understand the relationship between the activity and the structural characteristics of the compounds, and further to develop new highly potent compounds targeting EGFR.

Data set source
A data set of 65 1,3,4-thiadiazole derivatives as antiproliferative was accessed from the literature report [44][45][46][47]. The molecular structures of the examined compounds were converted into canonical SMILES operating the Open Babel GUI programme [48]. In vitro biological activities IC 50 (nM) were also converted to pIC 50 (M). Then, it was used as a dependent variable for the construction of QSAR models. The QSAR models were created for four arbitrary splits, for identifying the identity level of these four splits, the acknowledged procedure was used [49]. The values seen in Table S1 verify the non-identical character of splits. Each of the four splits was again subdivided into four sets, including training (+), invisible training (−), calibration (#) and validation (*), each set included 25% compounds Table S1 (see supporting information file).

Model for QSAR model development
The Monte Carlo approach is founded on the idea that appropriate numerical values referred to as correlation weights are created from each optimal descriptor (DCW) that relates connects the structure of compounds to certain properties/activities [50]. The structure descriptor is computed using in CORAL (Correlations and Logic) [51][52][53]. The pIC 50 antiproliferative agent model after all DCWs are calculated is suggested here in the following formula Equation (1) [20]: where C 0 et C 1 are constant regression coefficients.
The Monte Carlo method obtains satisfactory result by combining two parameters which are the number of epochs (N epoch ) and the threshold (T). The latter (ie T) is utilised to classify SMILES derived molecular attributes into active and rare structural attributes (SA k ). If the value of the rare structural attributes is less than T or zero, they are excluded from the modelling process for the training set, as they create disturbing noise in the QSAR model [54]. The optimal number of Monte Carlo optimisation epochs is represented by N epoch [55]. The equation affected for the computation of SMILES based descriptors is as follows Equation (2): The local and global SMILES attributes are computed for the molecular SMILES representation. Table 1 includes an exhaustive description of the global SMILES attributes.

Monte Carlo optimisation
Each CW weight correlation (A k ) corresponds to a different attribute of aptamer A k . The numerical value on the CW was calculated operating Monte Carlo optimisation (A k ). Target functions TF 1 and TF 2 are applied in CORAL 2019 software to construct the QSAR models, which are used to get the best results for the target functions when compared to their results [56]. To avoid overtraining, the correlation balance method, which requires a great correlation between the training and validation sets, has been suggested.
The R AT and R PT are the coefficient of correlation between experimental and calculated endpoint for the active training set and passive training set, respectively. IIC CLB is determined utilising calibration data (CLB) as follows equation Equation (5) [57]: MAE is an abbreviation for mean absolute error, which can be calculated employing the following equations: The following formula can be used to estimate the prediction quality (Δk) for a single element in a set: The endpoint of Antiproliferative inhibitors was quantified using SMILES based descriptors. Values for T and N epoch ranged from 1 to 10 and 0-15, respectively, in order to achieve the most predictive arrangement of T and N epoch for all splits. The IIC weight, dR weight and D limit were all set to 0.1. For TF 2 , a numerical value of IIC was chosen to be 0.3. The D start step in optimisation was 0.5*CW (SA). In this case, CW(SA) is the correlation weight of the structural attribute (SA) at the start.

Validation of QSAR models
The aim of any QSAR model is to reliably predict the activity of a novel compound. Three validation methods were used to evaluate the performance and reliability of the created QSAR models. First, The Leave-One-Out Cross-Validation (LOO) operating compounds from the training set, second, external validation (Q 2 ext ) operating the validation set, and third, test-Y randomisation [35]. The R² and Q² LOO values were estimated to be >0.5.
Furthermore, other statistical indicators including R 2 , CCC, Q 2 , Q 2 F 1 , Q 2 F 2 , Q 2 F 3 , s, MAE, F, randomisation test Y and C Rp2 were applied to examine the predictive potential of the QSAR models constructed [57].
The mathematical equation of different statistical parameters is following Equations (9-17): Additionally, when the test set is small, R 2 is unreliable and insufficient for judging model validity. In this situation, we introduce a new statistical metric (r 2 m ) that is mandatory to clarify the true predictive potential of QSAR models [58].

Applicability domain
The domain of applicability (AD), an important validation tool in building a QSAR model, where plays a critical role in assessing the uncertainty of an individual compound based on its analogy to compounds of the dataset used to train the model. AD is determined in the generated QSAR model based on the arrangement of the SMILES characteristics in the training and calibration sets. A compound is designated as an outlier if it is outside the domain of applicability and so cannot be reliably predicted in this strategy, a compound is considered to be aberrant if it is outside the AD limit, i.e. if: where Defect SMILES = Active k SA defect .

Molecular docking study
AutoDock Vina software was used to perform molecular docking after the compounds were drawn using SYBYL X2.0 software [59], to clarify the different sorts of binding mode between the docked ligand and binding site target. The PDB: 1M17 X-ray crystallographic structure with a resolution of 2.60 Å has been selected in the Protein Data Bank [60]. The docking analysis was started by removing all water molecules and the reference from the protein structure using the Discovery Studio Visualizer. Next, the hydrogen atoms were added and the Gasteiger force field was applied using the AutoDock programme. The grid box was created at 24 in all three dimensions (X, Y, Z), with a grid space size of 0.75 Å. The central grid area is (21.107, 0.713 and 54.06) by the ligand placed in the protein. The docking protocol was validated by re-docking the reference ligand (erlotinib), then calculating the RMSD factor which should not exceed 2.

Dynamic molecular simulations
Protein-ligand-docked complexes have been subjected to molecular dynamics (MD) simulations using Gromacs 2019 [61][62][63][64]. The OPLS force field was applied to generate ligand and protein parameters. The protein-ligand complex system was created inside a dodecahedron box, then, it was solvated with a simple water model (SPC). Na+ ions were added to electrically neutralise the system. Restricted particle Number, Volume and Temperature (NVT) overall balancing was performed for 20 ps at 300 K with Berendsen Thermostat Temperature Coupling [65]. Also, a constant number of particles, pressure and temperature (NPT) was set balancing 1 ns. The Parrinello-Rahman barostat was utilised for pressure coupling at 1 bar [66]. The simulation dynamics time was performed at 200 ns.

Free energy calculation
MM-GBSA (The molecular mechanics combined with the generalised Born surface area) approach was applied to compute the binding free energies of the ligand-protein complexes. The MM-GBSA binding free energy was estimated using programme g_mmpbsa [67,68]. Binding energy includes vacuum molecular mechanics (MM) potential energy of bound and unbound interactions, polar (G polar ) and non-polar (G nonpolar ) solvation energy. Free energy was characterised by the expression provided below Equation (20): The polar solvation energy was computed by solving the Poisson-Boltzmann (PB) equation. The non-polar solvation energy was determined based on the solvent-accessible surface area Area model (SASA). Energy components (E MM , G polar and G nonpolar ) were compiled from 0 ns to 200 ns of the MD trajectory.

In silico pharmacokinetics ADME/Tox study
The practically ADME/Tox examination (absorption, distribution, metabolism, excretion and toxicity) is a long and expensive process. Still cannot meet the requirements of drug testing. Today, there are many online tools and offline software available that help us predict this drug candidate behaviour with rapidity. In this review, we focused on ADME/toxicity prediction to measure pharmacokinetic properties and parameters in silico [69]. For this, we used the pkCSM online tool (http://biosig.unimelb.edu.au/pkcsm/) and SwissADMET online server (http://www.swissadme.ch/), to predict the ADME/Tox properties of new proposed compounds and the compound N40 [70].

QSAR results
QSAR models for the two sorts of target functions TF 1 (without IIC) TF 2 (with IIC) were constructed using Equations (3) and (4) respectively, in order to select coherent statistical performance. The results of the statistical indicators obtained for the generated QSAR models of antiproliferative inhibitors based on the SMILES descriptors for four random divisions are recorded in Table S2 (see supporting information file). The results reveal that the model (TF 1 ) shows statistical instability throughout four splits carried out (Equations (21)-(24)). On the other hand, we can point out that adding the IIC (TF 2 ) on weight to the model improves its predictive capacity pIC 50 (Equations (25)-(28)).
The threshold T and the number of epochs N were chosen to generate the optimal statistical indicators for the calibration set. In the case of TF 1 which eliminates the influence of IIC on the activity (pIC 50 ), the optimal values (T, N epoch ) for splits 1, 2, 3 and 4 were found to be (6,15), (9,15), (8.15), and (1.15), respectively. On the other hand, for TF 2 considering the influence of IIC on the activity, the best (T, N epoch ) were (7,15), (9,15), (4,15) and (3,15)  The good predictability and reproducibility of developed QSAR models had been proved by the values of R 2 , Q 2 , CCC, Q²F 1 , Q²F 2 , Q²F 3 and metric R 2 m Table S2. The QSAR models developed by TF 2 have statistically significant indicators at the calibration and validation set level, for all splits of the dataset. The results of these statistical test clearly demonstrate that the models developed by TF 2 have better prediction performance than the model developed by TF 1 The numerical value of the coefficient of determination calculated by TF 2 for four splits are reliable, split-1: Q² = 0.825, split-2: Q² = 0.655, split-3 = 0.645, split-4: Q² = 0.670, with an advantage of split-1, because it has the highest Q² (Q² = 0.825), so it is designated as the optimal model ( Table 2).
As shown in Table 2, all compounds in the database are within the defined AD and all have typical behaviour for splits 1-3 and do not contain Outliers, while split-4 defines three aberrant molecules. Hence, the established QSAR models for splits 1, 2 and 3 were self illustrative in terms of their Applicability domain.

Validation externe
The best models established four each split determined by the target function TF 2 calculated with Equation (4) were also tested according to the validation criteria of Golbraykh and Tropsha [71]. The result of metric R²m, R*m² and Average Rm 2 shows that the developed QSAR models have high reliability and consistency for all splits. Generally, all these selected models have successfully passed all the validation criteria, as shown in (Table 3), it should be point out that split-1 remains the best, and shows optimal values for all calculated metrics.

Randomisation test
A randomisation test with 10 iterations for the training set, invisible training set, and calibration set was used to evaluate the selected QSAR models' robustness Table S3 (see supporting information file). The Y randomisation test reveals that the value of C Rp 2 was more than 0.5, indicating that the created models were not subjected to random variation [72] ( Table 4).
The graphics of experimental pIC 50 and the calculated pIC 50 and residual evolutions for four models created by the target function TF 2 Equation (4) is presented in Figures 2  and 3.

Mechanistic interpretation of QSAR models
According to OECD principles, the established QSAR model must be interpretable mechanistic, which implies that structural molecular information can be extracted from it. This principle is respected by the Monte Carlo method, which has revealed optimal molecular descriptors linked directly by molecular fragments. After recording the numerical data collected from three Monte Carlo optimisation runs, it can be emphasised that features extracted from SMILES with positive correlation weights in all runs promote an increase in the endpoint, whereas those with negative correlation weights promote a decrease in the endpoint in all analyses. Tables 5 and 6 give a list of all the SA k , along with the correlation weight values for three Monte Carlo optimisation runs of the constructed QSAR model.
As seen in the data presented in Tables 5 and 6, the SA k s which have a positive value can therefore be classified as promoters of the increase in the pIC 50 (Table 5).
On the other hand, the SA k with negative values can therefore be classified as promoters of the decrease in the value of pIC 50 Table 6). The SA k analysis presented in Table 5 can be used to guide us in proposing new potent inhibitors of 1,3,4-thiadiazole derivatives with improved pIC 50 values.

Designed molecular docking results and analysis
Taking into account the appropriate molecular structural characteristics obtained from the SMILES notation based on optimal molecular descriptors (SA k ), we have designed by molecular modelling seven potent compounds derived from thiazolidine as inhibitors of EGFR, as shown in Table 8. The most active molecule N40 of data set was considered as a model molecule. When the selected model is applied for the calculation of DCW for the model molecule, the final obtained value of pIC 50 is 6.083. But when the reference molecule is replaced by the fragments 'O … … … … … … ' and '++++ O -S ===' which are both classified as promoters of increased pIC 50 . The molecules TA4 and A7 were designed with higher pIC 50 values than that of the molecule model N40 were respectively 6.411 and 6.889. On the other hand 'C … . (… .' and 'N … … … … ' are classified as promoters of the decrease in pIC 50 , when they were added to the reference molecule we obtained molecules B1 and B2 with lower values for pIC 50 compared to model molecule N40, were 5.957 and 5.778, respectively ( Table 7).
The improvement range of the calculated pIC 50 values for all newly designed compounds is from 6.449 to 9.904. Compounds A1 and A2 remain the best-designed compound with values of 9990 and 9789 respectively. We also applied Lipinski's Rule of Five to the newly designed molecules as shown in (Table 8). These candidates may be beneficial for medicinal chemistry to develop a more potent antiproliferative inhibitor.

In silico molecular docking analysis
To understand the mechanism of the binding interactions of the newly designed molecules, the N40 molecule, the control compounds, a molecular docking study was performed. The binding site used to dock all the molecules was identified as erlotinib binding site. melphalan and erlotinib are used as control compounds and compare their mode of binding with respect to the proposed molecules in the receptor binding site. In addition, his reference ligand will also be used to calculate the RMSD of the docking protocol to validate the docking procedure. Autodock vina generated nine conformations, the most suitable confirmation for the study were selected according to their binding affinity to the active site. A low binding mode affinity and a binding mode similar to that of the reference ligand reflect a good biological effect. Figures 4 collate the 2D and 3D interactions between the compounds screened and the residues of the binding site of the receptor. Compound A1 exhibited a hydrogen bond to the oxygen atom of the amide group and the residue Arg-817. The additional aromatic ring showed additional interactions πσ and π -π stacked with Val-702 and Leu-694, respectively. In the case of the compound, A2 showed a hydrogen bonding with the oxygen of the methoxy group and Gly-697, the added phenyl ring forms three π -σ interactions with Ala-719, Leu-694, and Leu-820. Amide group oxygen and methoxy group oxygen of compound A3 formed hydrogen bonding interactions with Met-769 and Lys-721, the added aromatic group formed π -sulphur interaction with Met-742. In the A4 molecule, the methoxy oxygen group forms a hydrogen bonding interaction with Cys-751, the addition of the SO 3 H group allows two additional hydrogen bonding type interactions to be formed with Met-769 and Gln-767. In addition, molecules A5 and A7 show no hydrogen bond interactions but the insertion of the Pyrrole group gave rise to two π -π stacked interactions type with Ala-719 in compound A5, and the addition of OH group formed a pi-Alkyl type interaction with Val-702 in A7 compound. In the A6 compound, two hydrogentype interactions have appeared with the nitrogen atom of the thiazolidine and Lys-721, and the nitrogen atom of amide groups and Asp-813. In contrast, the seven generally designed molecules showed a very strong affinity energy    in the case of compound A3. In the case of melphalan, two hydrogen bonds are observed with Asp-831 and Thr-766. At the level of binding affinity energy, erlotinib and melphalan presented affinity values lower than the proposed molecules as given values −7.8 kcal/mol, −6.2 Kcal/mol, respectively (Table 9). Generally, these results clearly show that the molecules proposed to show a strong affinity in the site of binding of erlotinib compared to two control compounds. The validation of the docking technique used was carried out by the redocking method the value of RMSD obtained is less than 2 (1.283 Å) which confirms the validation of the docking applied ( Figure 5).

ADME/Tox prediction of new molecules
Drug development cannot be studied without taking into account the estimate of ADME/toxicity. The bioavailability of the newly designed molecules and the most potent inhibitor in the data set N40 was estimated, to ensure that the proposed candidates are viable drugs and have synthetic accessibility, we estimated certain pharmacokinetic properties (ADME/Tox).  Table 7. SMILES notation and pIC 50 predict of compounds designed with the application of Promoter of endpoint decrease of QSAR model obtained using the target function calculated by Equation (25). As seen in Table 10, All of the designed compounds are within the ranges of acceptable water solubility and intestinal adsorption, where they exhibit a high intestinal absorption percentage of the oral drug (>90%), with the exception of compound A4, 30.46%, these results indicating that these drugs could have a possibility in an oral formulation. When the drug is absorbed into the body, it passes through several membranous barriers such as hepatocyte membrane, gastrointestinal epithelial cells, blood-brain barrier. When the drug is absorbed into the body, it passes through several membranous barriers such as hepatocyte membrane, gastrointestinal epithelial cells, blood-brain barrier. The permeability to the brain was estimated in terms of the permeability of the blood-brain barrier (BBB and CNS permeability). The six compounds except A4 have located at the values LogBB < −0.5 and Log PS <−3, which means that they are poorly distributed and difficult to move in the CNS. Therefore, these compounds were unlikely to cross the blood-brain barrier.
Drug metabolism has been estimated by cytochrome P450, in particular CYPs (1A2, 2C9/19, 2D6 and 3A4) being responsible for the biotransformation of more than 90% of drugs in phase I metabolism. The six newly designed compounds except A4 are inhibitors of CYP1A2, CYP 2C9/2C19 and, for CYP3A4. These compounds are inhibitors and substrates of CYP3A4.
In terms of toxicity, the evaluation was carried out by seven parameters as shown in (Table 11). The AMES test showed that all of the newly designed candidates were non-mutagenic and non-carcinogenic. Furthermore, the acute toxicity (LD50) of the compounds ranged from 1.919 to 2.813 mol/kg, while the chronic toxicity (LOAEL) of the compounds ranged from 0.557 to 2.502 mg/kg body weight/day. these results generally obtained, indicate that all the candidates do not present any potential toxicity. The synthetic accessibility values were assessed, as can be seen in Table 11. The new compounds and compound N40 display close together and encouraging values.

MD simulations results
In order to follow the conformational evolution of stable binding and to confirm the docking results, we performed MD simulations on the newly designed compound A1, compound N40 and the two control compounds (erlotinib and Melphalan) in complex with the site of protein binding for 200 ns.
The quantitative parameters RMSD and RMSF are employed to estimate the structural stability of the proteinligand complex. The RMSD values of the atoms of the protein backbone compared to the initial structure were estimated to evaluate the structural stability of the proteins during the simulation period.
The evolution of the RMSD values of the atoms of the protein backbone compared to the initial structure was estimated for the four complexes in Figure 6. The complex relating to compounds A1 reaches an equilibrium level around 0.16 nm quickly after 10 ns, whereas, the RMSD of the complex corresponding to compounds N40 and erlotinib did not fluctuate significantly and showed that the RMSD values of the three systems converged to 0. 23   In addition, the compactness of the proteins was expressed in radius of gyration Rg (nm), this parameter was monitored over a simulation time of 200 ns as shown in Figure 8. The calculated Rg values do not show strong variation during the simulation time, and are located between 2.2 and 2.03 nm for the four systems which confirms the stability of the four complexes.

Free binding energy calculations
To quantify the binding affinity between the protein and the selected ligands, the MM-GBSA method was applied to calculate the energy and determine the energy breakdown. The binding free energy was calculated by kinetic conformation derived from 0 ns −200 ns. The binding free energy and energy components of compound A1, Compound N40, erlotinib and Melphalan are depicted in Table 12. Free energy calculations based on MM-PBSA were performed for the newly designed compounds A1, the more active N40 and the two control compounds. The contributions of van der Waals energy, electrostatic energy, polar and non-polar energy to the total binding energy have been collected for all four complexes in Table 12 From the results of the free energy calculations, it was observed that van der Waals energie presented the main contributions to the total binding energies were −101.337, −49.51, −190.602 and −77.9 kJ/mol for compounds A1, N°40, erlotinib and Melphalan, respectively. We can also notice the high value of polar solvation especially in the witness compound erlotinib and compound A1 which shows the interesting role of solvation energy in the stability of these complexes. The   lower value of free energy confirms the good stability of Compound A1 designed by our model and therefore these results are in agreement with the results of molecular docking.

Conclusion
In this manuscript, the antiproliferative activity of a data set of 65 compounds has been clarified using optimal descriptors based on SMILES notation. The 2D-QSAR models developed are robust, reliable in predictive terms. From the interpretation of the best QSAR model, various structural attributes (SA k ) were selected as promoters to increase antiproliferative inhibition. Thus according to the principles of the OECD, the QSAR model developed remains applicable to design new antiproliferative inhibitors.
The results obtained from the best model lead us to the design of seven new inhibitors with high activity. In addition, molecular docking studies of the proposed candidates revealed ligand binding interactions with EGFR receptor binding site residues. Erlotinib and Melphalan were recalled as control compounds to determine the interactions with the active site. In addition, the ADME/Tox properties of the newly designed compounds and the N40 compound were analyzed and shows good results compared to the N40 compound. The newly designed candidate A1, the compound N40 and the two compounds control were chosen to assess dynamic stability in the protein sites over a simulation time of 200 ns. The results show that the four complexes have good overall stability, with the advantage of a newly designed complex. The calculation of the free energy by the MM-GBSA method confirms the system's stability   with a higher stability of A1, validating the molecular docking results.