Efficient estimation of MMGBSA-based BEs for DNA and aromatic furan amidino derivatives

Molecular mechanics with Generalized Born surface area (MMGBSA) based binding energies (BEs) derived from the molecular dynamics (MD) trajectories are highly reliable and extensively used standards to estimate the strength of interactions between ligands and their receptor. MD simulations (5 ns) for 30 aromatic furan aminidino derivatives (anti-Pneumocystis carnii agents) have been carried out by using Amber program and BEs have been calculated by using Generalized Born (GB) method. Based on the generated data, we present a simple and effective method for the approximation of BEs without performing MD simulations and MMGBSA calculations. Quantum chemical (density functional theory based) and geometrical descriptors are used for the prediction of the BE values. All the developed models are statistically significant with high values of correlation and cross-validation coefficients. The prediction ability and effectiveness of the models are tested by the division of the data-set into four different training and test sets and the average error was only 4–7% (1.56–2.61 kcal/mol) of the actual BEs.


Introduction
Dicationic aromatic amidine molecules bind noncovalently to the AT rich region of the minor groove of DNA Francesconi et al., 1999;Hopkins et al., 1998). These molecules are known as effective agents against various organisms like Pneumocyctis carinii, Giardia lambia, Cryptosporidium parvum, etc. (Bell et al., 1993;Boykin et al., 1995Boykin et al., , 1998Brendle et al., 2002;Francesconi et al., 1999;Hopkins et al., 1998;Lombardy et al., 1996;Tidwell et al., 1993). It has been shown that the activity of these molecules is directly proportional to the DNA binding affinity . Numerous attempts have been made in order to design 2,5-Bis [4-(N-alkylamidino)phenyl]furans, 2,4-bis (4-amidinophenyl)furans, and extended aromatic furan amidino derivatives with enhanced efficacy and DNA minor groove binding affinity Francesconi et al., 1999;Hopkins et al., 1998). Several factors were found important for enhanced DNA binding affinity such as electrostatic interaction, hydrogen bonding, van der Waals interaction, radius of curvature of the molecule, etc. (Francesconi et al., 1999). In general, the increase in aromatic region shows higher binding affinity probably due to increased potential for various kinds of noncovalent interactions (Vijay et al., 2008). Other important factors for determining drug activity include cell membrane permeability and nonspecific binding (Francesconi et al., 1999).
The DNA minor groove is the target for a large number of drugs such as the antiviral agents netropsin and distamycin (Lown et al., 1989;Zimmer, 1975), the anti-Pneumocystis carinii drugs pentamidine and furamidines (de-Souza et al., 2004;Stuedli et al., 2011), the antiseptic and disinfectant propamidine (McDonnell et al., 1999), the trypanocidal drug berenil (Othman et al., 2004), the experimental antitumor agent and important DNA stain Hoechst dyes (Gravatt et al., 1994;Purschke et al., 2010), various antitumor compounds (Braithwaite et al., 1980), etc. Several minor groove binders showed potent antitumor activity in preclinical studies, and some of them went to an advanced stage of clinical trials. These include duocarmycin derivatives adozelesin (Burris et al., 1997;Foster et al., 1996), carzelesin (Houghton et al., 1996;Li et al., 1992), bizelesin (Carter et al., 1996;Walker et al., 1994), KW-2189(Kobayashi et al., 1994, and a distamycin A derivative tallimustine (Pezzoni et al., 1991). Sequence-selective DNA minor groove binding agent SJG-136 was in phase-I clinical trials against advanced solid tumors (Hochhauser et al., 2009;Puzanov et al., 2011). Ecteinascidin 743 displayed potent antiproliferative activity against a variety of tumor cells and entered phase II clinical trials (García-Nieto et al., 2000). DNA minor groove binder brostallicin 196, a-bromo-acrylamido tetra-pyrrole derivative) showed high cytotoxic potency and antitumor activity (Lorusso et al., 2009). Several groups are involved in the study of interaction of small molecules with minor groove of DNA e.g. Dervin's group developed the biomimetic approach to DNA recognition which underpins the design of cell-permeable molecules for the regulation of gene expression in vivo and contributed on the interactions of small molecules (imidazopyridines, benzimidazoles, polyamides, etc.) with the DNA (Chenoweth et al., 2009;Renneberg et al., 2003;Warren et al., 2005). The influence of the solvent on the BEs to DNA of various nonintercalating antibiotics (netropsin, dlstamycin, berenil, stilbamidine, etc.) was estimated by combining the effect of the first hydration shell with that of bulk water (Zakrzewska et al., 1984). The nonintercalative binding of DAPI to the minor groove of doublestranded DNA oligomers was explored by computing related intermolecular interaction energies (Gresh, 1985). The interaction energy of diarylamidines (berenil and stilbamidine) with the DNA (A-T rich region in the minor groove) was computed by using additive procedure (Gresh et al., 1984). The technique of partially restrained molecular mechanics enthalpy minimization, which enables the sequence dependence of the DNA binding of a nonintercalating ligand, was developed and applied to analyze the binding of berenil to the minor groove of DNA (Laughton et al., 1990). Stacking is also an important interaction between the minor groove binders and DNA .
Molecular docking is a simplified computational representation of DNA-ligand interactions and usually fails in accurately predicting the binding affinities (Ferrara et al., 2004;Warren et al., 2006). Although various docking protocols include protein flexibility, the predicted binding affinity is usually based on a single protein-ligand complex structure (Badrinarayan et al., 2011;Srivani et al., 2007). Postprocessing methods are alternatives to rule out the weaknesses of simple scoring functions used in popular docking protocols by including the dynamic information in the energy calculation for comparatively more accurate estimation of free energy of binding. These postprocessing methods include molecular dynamics (MD) or Monte Carlo (MC) simulations for trajectory generation and free energy perturbation (FEP) (Zwanzig et al., 1954), thermodynamic integration (TI) (Kirkwood et al., 1935), linear interaction energy analysis (LIE) (Aqvist et al., 1994) and molecular mechanics Poisson-Boltzmann or Generalized Born surface-area (MMPBSA/GBSA) calculations (Kollman et al., 2000;Srinivasan et al., 1998) for free energy estimation using generated trajectories. MMGBSA and LIE are recent, computationally efficient and appropriate methods for the free energy estimation of a diverse set of ligands and MMGBSA can be applied to any drug-ligand system without additional regression (Thompsan et al., 2011).
Although the combination of MD and MMGBSA seems to be a good approach for the study of drugligand interactions, it is computationally expensive and may not be used universally only for the estimation of DNA binding affinities. Considering this, we propose a simple protocol to predict the DNA binding affinity. Thirty aromatic furan amidino derivatives were collected with the experimental DNA binding data Francesconi et al., 1999;Hopkins et al., 1998) for the purpose. MD simulations (5 ns) were performed on all 30 derivatives to generate the trajectories followed by MMGBSA calculations to estimate the BEs. Further, the estimated BE values were used as a data-set of actual BE values for generating density functional theory (DFT) based quantitative structure-activity relationship (QSAR) models for the prediction of those values. QSAR is an important and widely acceptable tool for understanding many aspects of chemical and biological interactions (Katritzky et al., 2001;Leach 2011;McKinney et al., 2000) and has been applied on variety of systems (Bohari et al., 2011;de Jonge et al., 2005;Janardhan et al., 2006;Pasha et al., 2009;Ravindra et al., 2008;Schultz et al., 2003;Srivastava 2009;Srivastava et al., 2009). These methods are based on the axiom that the variance in the activities, toxicities or physicochemical properties of chemical compounds is determined by the variance in their molecular structures. Quantum chemical descriptors have been used frequently (especially in the last years) in QSAR studies, because of the large well-defined physical information content encoded in many theoretical descriptors. After successful prediction of BE values, the study was extended to predict the ΔT m (thermal stability of DNA-ligand complexes defined as the difference between the melting point of DNA alone and the melting point of DNAligand complex) values of these molecules by using quantum chemical, topological and docking fitness scorebased descriptors. The ΔT m prediction models were generated on 28 aromatic furan aminidino derivatives and validated by generating four different training and test sets. The current study reveals the reliable and effective predictions of MMGBSA-based BE values and also present a valuable model for the prediction of ΔT m values.

Materials and methods
Twelve derivatives of 2,5-bis[4-(N-alkylamidino)phenyl] furan , nine derivatives of 2,4-bis(4amidinophenyl)furan (Francesconi et al., 1999) and another nine derivatives of extended aromatic furan (Hopkins et al., 1998) were collected with experimental ΔT m values. All the derivatives were fully optimized by means of DFT, using the hybrid three-parameter Becke-Lee-Yang-Parr (B3LYP) functional (Becke, 1993;Lee et al., 1988) with 6-31G(d) basis set (Ditchfield et al., 1971) on Gaussian 03 program package (Frisch et al., 2003). Frequency calculations characterize that all the obtained stationary points are minima on the potential energy surface. Descriptors were calculated by using Codessa (Comprehensive Descriptors for Structural and Statistical Analysis) program (Katritzky et al., 1994(Katritzky et al., , 1995 unless otherwise stated and regression analysis was carried out on project leader program associated with Scigress explorer package (Scigress, 2008). Codessa is a QSAR/QSPR approach that computes hundreds of structural parameters using the constitutional, topological, geometrical, electrostatic, and quantum chemical descriptors of the chemical compounds. Molecular docking studies of these compounds were carried out on GOLD (Genetic Optimization for Ligand Docking) program (Jones et al., 1997). DNA duplex d (CGCGAATTCGCG) was used as a receptor for all the docking calculations, because the experimental values were observed on this DNA sequence. The MD simulation (with explicit solvent, involving $10,000 water molecules) was performed up to 5 ns for all the derivatives using Amber 8.0 program Jakalian et al., 2000;Wang et al., 2004) and binding energies were estimated using MMGBSA calculations.

Docking
Sybyl 6.9.2 program (Tripos Inc., St Louis, MO, USA) was used for input preparation (Sybyl, 2004). All the derivatives were minimized to .001 kcal mol À1 / Å root-mean-square gradients by using MMFF94 force field and point charges. All the considered complexes were optimized at B3LYP/6-31G(d) level and the final geometries were used as input for minimization in Sybyl 6.9.2 program. The DNA duplex d (CGCGAATTCGCG) was extracted from 1VZK (downloaded from protein data bank) (Mallena et al., 2004), explicit hydrogens were added and the energy minimization of hydrogens was carried out using .01 kcal mol À1 /Å root-mean-squared gradient. Recently, GOLD docking was proven to be the best docking protocol which reliably reproduces the crystallographic poses for DNA-ligand complexes  and used for the study of various DNA-ligand complexes (Kamal et al., 2009;2010a;2010b;2010c;. Thus in the present study, we performed the docking calculations for current set of derivatives using GOLD3.2 program that uses the genetic algorithm (GA). This method allows a partial flexibility of recep-tor and full flexibility of ligand in the process of docking. For each of the 10 independent GA runs, a maximum number of 100,000 operations were performed with a population size of 100 individuals. Default values of niche size (2) and selection pressure (1.1) were used and operator weights for crossover, mutation, and migration were set to 95, 95, and 10, respectively. Default cut-off values of 2.5 Å for hydrogen bonds and 4.0 Å for van der Waals distance were employed. The 10 best conformations were generated for each derivative and the best score pose was selected for MD simulations.

MD simulations
Amber8.0 program was used for MD simulations of the selected docked poses. The 'leaprc.gaff' (generalized amber force filed) was used to prepare the ligands, while 'leaprc.ff03' was used for the DNA. The 'addions' command implemented in 'xleap' of AMBER 8.0 was used to add the Na + ions explicitly to neutralize the system. Each system was placed in a rectangular box of TIP3P water by using 'SolvateOct' command with the minimum distance between any solute atom and the boundary of the box was set to 10 Å. Equilibration of the solvated complex was done by carrying out a short minimization (500 steps of each steepest descent and conjugate gradient method), 50 ps of heating and 50 ps of density equilibration with weak restraints on the complex followed by 500 ps of constant pressure equilibration at 300 K. Adequate cutoff sizes are necessary to get reliable MD results and therefore a cut-off of 12.0 Å was used for MD simulations. All long-range electrostatics were included by means of a Particle mesh Ewald (PME) method (Darden et al., 1993). All hydrogen heavy atom bonds were constrained by the SHAKE method, and simulations were performed with a 2 fs time step and langevin dynamics for temperature control. The same conditions as the final phase of equilibration were used for production run, and the coordinates were recorded in every 10 ps. The periodic boundary conditions (PBC) were used during MD simulations. Before submitting for the production run, we verified that the system is equilibrated. Five hundred equally spaced snapshots of the complex (every 10 ps) were generated from the MD trajectories, and all water molecules and counter ions were removed before MMGBSA calculations. Coordinates were extracted by using 'extra-ct_coords.mmpbsa' script and the BE values were calculated by using 'binding_energy.mmpbsa' script. The reported BE values, obtained from GB (semi-analytical approximations to continuum electrostatics) method (Edinger et al., 1997) are the average of all 500 snapshots. Root mean squared deviation (RMSD) plots were constructed separately for complexes, DNA and ligands using Xmgrace program and images of snapshots at various time scales were generated using VMD program.
Using finite difference approximation and Koopman's theorem, above derivatives can be estimated as follows.
Heuristic method (a statistical technique for descriptor selection and correlation, available in Codessa program) was used for descriptor optimization, which follows a sensible and intuitive pathway for eliminating variables from consideration. The intercorrelation level between descriptors has been tested and highly intercorrelated descriptors were eliminated to avoid chance correlation. The uniqueness of a molecule and its total chemical information cannot be described by very few descriptors, while the large number of descriptors will create confusion and reduce the statistical robustness of the model. Thus, an analysis was done with various models where the number of descriptors was increased from 1 to 10 and three descriptors-based models were found optimum. Almost all possible combinations of descriptors were tested to select the final descriptors. Project leader program associated with Scigress explorer package (Scigress, 2008) was used for multiple linear regression (MLR) analysis. The entire data-set was divided into four different training and test sets for rigorously validating the predictive ability of the QSAR models. The current set of derivatives was collected from three references Francesconi et al., 1999;Hopkins et al., 1998). The first test set contains all 2,5-bis[4-(N-alkylamidino)phenyl]furan derivatives, the second test set contains all 2,4-bis(4-amidinophenyl)furan derivatives, and the third test set contains all extended aromatic furan ami-dino derivatives. While fourth test set was constructed by randomly taking 13 compounds from all three references.
Leave-one-out, leave-two-out, leave-three-out, and leaveten-out validation methods were used for cross-validation of the results. Similar methodology was adopted for generating predictive models for ΔT m values. However, conceptual DFT descriptors were not used in this case. Three descriptors [MVN, GS, and AIC 2 ] based model was found optimum for the prediction of ΔT m values.

Results and discussion
Thirty aromatic furan amidino derivatives were collected from literature with experimental ΔT m values and presented in Figure 1. These molecules show their activity via noncovalent binding in the AT rich region of the minor groove of DNA (Brovarets et al., 2012;Yurenko et al., 2011). Thus, the molecular docking calculations were performed using GOLD program (see methodology section for details of protocols used for docking calculations). All the derivatives were docked in the AT rich region of the DNA with reasonably good docking fitness scores. The best docking fitness score pose was selected for MD simulations (see methodology section for details of MD simulations protocols). To observe the systematic deviation of docked complexes during MD simulations, RMSD as a function of time is plotted. RMSD plot of complex as well as of DNA and ligand is presented separately in Figure 2 for few selected derivatives (see Supporting information for the plot of all the derivatives). It is clear from Figure 2 that the higher ΔT m containing compounds show lower RMSD deviation and vice versa. Figure 3 shows the snapshots at various time scales for the selected derivatives. Figure 4 depicts the hydrogen bonding pattern of the most potent derivative (4) in the minor groove of DNA. This derivative shows six hydrogen bonds with Adenine (A) and Thymine (T) base pairs of DNA (one hydrogen bond each with A-6, A-18, T-7 and T-20, and two with A-17). Now, we performed MMGBSA calculations by using MD trajectories to obtain BE values. The BE values are very reliable parameter for measuring the strength of interaction between ligands and their receptor. Based on calculated BE values, we present an effective way to predict these values. Various kinds of descriptors (constitutional, geometrical, topological, electrostatic, quantum chemical, conceptual DFT based, etc.) were calculated for each of the derivatives and comparative QSAR studies were performed. Conceptual DFT descriptor (electrophilicity index) and geometrical descriptors (moment of inertia B and YZ shadow) were selected for the final model and the values for these descriptors are presented in Table 1 (see methodology  section    mobile molecules (Hovorun et al., 1999;Nikolaienko et al., 2011). YZ shadow is the area of the shadows of the molecule as projected on the YZ planes by the orientation of the molecule in the space along the axes of inertia. The electrophilicity index measures the energy lowering of a ligand due to maximal electron flow between donor and acceptor (Parthasarathi et al., 2004;Pearson et al., 1997]. Electrophilicity is a reactivity descriptor which allows a quantitative classification of the global electrophilic nature of a molecule within a relative scale and found important for describing the biological activity and toxicity of various inhibitors (Chattaraj et al., 2006;Parr et al., 1999;Parthasarathi et al., 2004;Pasha et al., 2005;Pearson et al., 1997). Initially, the QSAR model was generated for the whole data-set (using the combination of MI B , YZS, and ω descriptors) and the results were significant with R 2 of .87. However, to rigorously validate the prediction model, we have divided the data-set into four different training and test sets. The first test set was constructed by taking all the derivatives of 2,5-bis[4-(N-alkylamidino)phenyl]furan in the test set. The generated prediction model was statistically significant with R 2 .93 and R 2 cv .83, Equation (1). The average residual (AE = average of 'actualpredicted' BE values) was 1.3 kcal/mol for training and 1.9 kcal/mol for test set and indicates the good predictive behavior of the generated model-1. Figure 5a shows the plot between actual and predicted BE values and indicates the excellent performance of the model. BE-P a ¼ À1:17008 Ã MI B À :395716 Ã YZS þ 11:0375 Ã x þ 31:5984 ð1Þ The second test set was constructed by taking all the derivatives of 2,4-bis(4-amidinophenyl)-furan in the test set. The QSAR model obtained by this test set was very good; however, the statistical parameters for this model was slightly lower (R 2 = .81; R 2 cv ¼ :68) compared to model-1 (R 2 = .93; R 2 cv ¼ :83). Equation (2) shows the QSAR equation and statistical parameters for model-2. The AE was 1.6 kcal/mol for training and 2.6 kcal/mol for test set for this model. Figure 5( The third test set was constructed by taking all extended aromatic furan amidino derivatives in the test set. Good correlation coefficient values were obtained for this model also (R 2 = .84; R 2 cv ¼ :82), Equation (3). The AE (1.5 kcal/mol for training and 2.6 kcal/mol for test set) for model-3 was comparatively higher for the test set. However, this model can be considered for the prediction as the percent error for the model was only around 7%. Figure 5 (4). The AE for the test set of model-4 was lowest (1.6 kcal/mol for test set) compared to all other models. Interestingly, the AE for test (1.6 kcal/mol) and training (1.6 kcal/mol) sets was essentially very similar, which shows the wide applicability of this model. Figure 5d shows the plot between actual and predicted BE values, and indicates a good performance of the model. The docking fitness score was more than 80.0 for compounds with experimental ΔT m values greater than 12.0 with one exception. However, a very good qualitative or quantitative (R 2 = .34) relationship between docking fitness scores and ΔT m values was not visible. Thus, we also present the predictive models for ΔT m values. Quantum chemical descriptor called 'maximum valency of a N atom,' 'GOLD fitness scores,' and topological descriptor called 'average information content order-2' were used for generating ΔT m prediction models, Table 2. The generated models were validated by dividing them into four different training and test sets, Equations (5)-(8). All the models show similar performance for the prediction of ΔT m values (AE for training and test set for all the models was around MI B is a geometrical descriptor named moment of inertia B; YZS is a geometrical descriptor named YZ shadow; ω is a conceptual DFT descriptor named electrophilicity index and the values for ω are given in eV, BE-C is binding energy calculated by GB method after 5 ns MD simulations; BE-P a is predicted binding energy by using QSAR Equation 1 (all the molecules from reference-1, i.e. 19-30, were used as test set); BE-P b is predicted binding energy by using QSAR Equation 2 (all the molecules from reference-2, i.e. 10-18, were used as test set)); BE-P c is predicted binding energy by using QSAR Equation 3 (all the molecules from reference-3, i.e. 1-9, were used as test set); BE-P d is predicted binding energy by using QSAR Equation 4 (test set molecules were selected randomly i.e. five from reference-1, four from reference-2, and four from reference-3); Res. is residual (actual predicted) of binding energies; and ⁄ show the molecules in the test set. All the MI B values are in kg-m 2 . BE values and residuals are in kcal/mol . 1.00). However, model-4 was statistically the best predictive model (R 2 = .82 and R 2 cv ¼ :80). Figure 6 shows the plots between experimental and predicted ΔT m values for all four sets.  Typically, an outlier is defined as any derivative for which the experimentally observed value substantially  (19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) in the test set, b all the derivatives of 2,4-bis(4-amidinophenyl)furan (10-18) in the test set, c all extended aromatic furan amidino derivatives (1-9) in the test set, and d test set was constructed by randomly taking 13 compounds from all three type of derivatives in the test set. BE-C is the actual binding energies obtained from MMGBSA calculations using 5 ns MD trajectories. BE-P is predicted binding energies using Equations 1-4. Circles show the molecules in the test set}. differs from the computed value. When removing one or more data point from the data-set dramatically improves the statistical quality of a given QSAR model, the outlier concept is employed. However, in the current study there was no such problem and outlier concept was not used while generating QSAR models for binding energy prediction. Leave-one-out and leave-n-out (basically crossvalidation methods) techniques estimate the validity and applicability of a model by testing its prediction ability (Kiralj & Ferreira, 2009). Leave-one-out can be performed by excluding each derivative of the data-set once and constructing a new model for rest of the derivatives followed by the prediction of the activity of excluded derivative. However, leave-n-out excludes the set of nnumber of derivatives each time while constructing the model. We also observed the effect of this validation on the correlation coefficient values. This practice may reduce the possibility of chance correlation. The gener-ated QSAR equations were statistically validated using leave-one-out, leave-two-out, leave-five out, and leaveten-out techniques. Table 3 shows the statistical coefficient values for leave-n-out validation and indicates the wide applicability of the generated models.

Conclusions
A computational approach that combines quantum mechanics, docking, molecular dynamics, and statistical analysis to model the energetic patterns of the interactions between DNA and their ligands is presented. It has been demonstrated that conceptual DFT descriptor (electrophilicity index, ω) along with two geometrical descriptors (moment of inertia B and YZ shadow) is able to predict the MMGBSA-based binding energies with statistically significant values (R 2 = .93, R 2 cv ¼ :83 and AE = 1.32 kcal/mol for the best model). The predic- MVN is a quantum chemical descriptor named maximum valency of a N atom; GS is GOLD fitness score; AIC 2 is a topological descriptor named average information content (order 2); ΔT m is the experimental thermal stability of DNA-ligand complexes.; ΔT m -P a is predicted thermal stability of the complex by using QSAR Equation 5 (all the molecules from reference-1, i.e. 19-31, were used as test set); ΔT m -P b is predicted thermal stability of the complex by using QSAR Equation 6 (all the molecules from reference-2, i.e. 10-18, were used as test set); ΔT m -P c is predicted thermal stability of the complex by using QSAR Equation 7 (all the molecules from reference-3, i.e. 1-9, were used as test set); ΔT m -P d is predicted thermal stability of the complex by using QSAR Equation 8 (test set molecules were selected randomly i.e. five from reference-1, four from reference-2, and four from reference-3); Res. is residual (experimentalpredicted) of ΔT m values; and ⁄ show the molecules in the test set. All the ΔT m are given in°centigrade.
tive ability of the model is also validated by generating four different training and test sets (average R 2 = .87, average R 2 cv ¼ :80), and leave-one-out and leave-n-out (2, 5, and 10) methods. Proposed methodology is applicable to screen selected derivatives from a large pool of molecules prior to carry out highly expensive MD simulations. We believe that, the current approach may be helpful for the development of a general methodology. The RMSD plots show that the ligands remain bound to the DNA duplex near the preferential binding position and do not experience substantial fluctuations with respect to their initial placements in the DNA minor groove. However, higher fluctuations are observed for lower active compounds and vice versa. While the studies are directed to the aromatic furan amidino derivatives, further validation is required before generalization of this approach to other classes of compounds.

Acknowledgments
HKS and GNS thank the Department of Science and Technology (DST), New Delhi for the financial assistance through Fast-Track (SR/FT/CS-031/2009) and Swarnajayanti projects respectively.

Supplementary Information
The supplementary material for this paper is available online at http://dx.doi.10.1080/07391102.2012.703071.

MI B
geometrical descriptor named moment of inertia B YZS geometrical descriptor named 'YZ shadow' ω conceptual DFT descriptor named 'electrophilicity index' MVN quantum chemical descriptor named 'maximum valency of a nitrogen atom' GS GOLD fitness score obtained from docking calculations using GOLD program AIC 2 topological descriptor named 'average information content -order 2'  [19][20][21][22][23][24][25][26][27][28][29][30] in the test set, b all the derivatives of 2,4-bis(4-amidinophenyl)furan (10-18) in the test set, c all extended aromatic furan amidino derivatives (1-9) in the test set, and d test set was constructed by randomly taking 13 compounds from all three type of derivatives in the test set. ΔT m is the experimental thermal stability of the DNA-ligand complexes. ΔT m -P is predicted thermal stability of the DNA-ligand complexes using Equations (5)-(8). Circles show the molecules in the test set}. Table 3. Effect of leave-one-out and leave-n-out validation methods on coefficients values. Leave-one-out Leave-two-out Leave-five-out  Out is the derivatives kept out for generating models; R 2 is correlation coefficient; and R 2 cv is cross-validation coefficient values.