How good are ensembles in improving QSAR models? The case with eCoRIA

A conceptually new idea in quantitative structure–activity relationships (QSAR) which makes use of ensembles from molecular dynamics (MD) trajectories and information retrieved from enzyme–inhibitor binding thermodynamics is presented in this study. This new methodology, termed ensemble comparative residue interaction analysis (eCoRIA), attempts to overcome the current one chemical–one structure–one parameter value dogma in computational chemistry by modeling the biological activity as a function of molecular descriptors derived from an ensemble of conformers of enzyme–inhibitor complexes. The approach is distinctly different from the standard QSAR methodology which uses a single low-energy conformation or the properties averaged over a set of conformers to correlate with the activity. Each conformational ensemble derived from MD simulations is analyzed for the distribution of the non-bonded interaction energies (steric, electrostatic, and hydrophobic) along with solvation, strain, and entropic energy of the inhibitors with the individual amino acid residues in the receptor and these are correlated to the activity through a QSAR model. The scope of the new method is demonstrated with three diverse enzyme–inhibitor data-sets – glycogen phosphorylase b, human immunodeficiency virus-1 protease and cyclin-dependent kinase 2. The QSAR equations derived from the methodology have revealed all the structure activity relationships previously reported for these classes of molecules as well as uncovered some features that were hitherto unknown and may have a hidden role in driving the ligand–receptor-binding process. Impressive improvements in the predictions of affinity have been achieved compared to other QSAR formalisms namely CoMFA, CoMSIA (receptor-independent QSAR techniques), and CoRIA (a receptor-dependent QSAR technique). eCoRIA could provide an understanding of the thermodynamic properties influencing the ligand–receptor binding over a time scale as sampled by the MD simulation. The advantage of analyzing enzyme–inhibitor interaction energies in a statistical domain is that the noise due to inaccuracies in the potential energy functions can be reduced and mechanistically important interaction terms related to protein–ligand binding specificity can be identified which can assist the medicinal chemists in designing new molecules and biologists in studying the influence of position-specific mutations in the receptor on ligand binding.


Introduction
A quantitative knowledge of the energetics (physical forces) involved in the binding of small organic molecules to a protein is of fundamental importance within the interface of chemistry and biology to link the chemical interactions with biological effects associated with ligand-protein binding. This association is remarkably sensitive to even small measure of the forces which contributes to the overall binding process. A wide variety of computational methods exist for the calculation of such binding affinities; one of these is the free energy perturbation (FEP) (Brandsdal et al., 2003;Lu & Kofke, 2001;Reddy & Erion, 2001) or thermodynamic integration method (Mitchell & McCammon, 1991;Straatsma & Berendsen, 1988) that involves non-physical stepwise conversion between pairs of rather similar ligands in the free and bound states. However, computational demands and issues related to sampling and convergence prevents them from being widely and routinely used in structurebased drug design. Also these techniques are usually "noisy" due to limited conformational sampling, inaccuracies in the modeled structures, and/or inadequacies in current potential energy functions. The quantitative structure-activity relationships (QSAR) approach on the other hand, can balance the different interaction energy contributions in such a way that the "signal" (characterized by a correlation with the biological endpoint) is increased whereas the "noise" (associated with calculation of interaction energies) is reduced. The QSAR model thus establishes a relationship between the differences in activity or binding affinity among members of a series and the differences in the calculated energy contributions. More recently, it was realized that the predictive ability of a molecular mechanics receptor-dependent (RD) QSAR model can be considerably improved if the total interaction energy for each protein-ligand complex is partitioned into a number of fragment-based contributions and a subset of the energy variables that account for most of the variance within the series are selected. Considering this fact, a fragment-based approach termed COmparative BINding Energy (COMBINE) was developed, in which the ligands are divided into fragments chosen according to their spatial location in the proteinbinding site (Lozano et al., 2000;Ortiz, Pisabarro, Gago, & Wade, 1995). An improvement of this methodology is Comparative Residue Interaction Analysis (CoRIA) which is based on all elements that define the complete thermodynamic events involved in ligand-receptor association (Datar, Khedkar, Malde, & Coutinho, 2006).
Although these techniques have been successful to a large extent, a conceptual limitation in these approaches is that the methods are based on the current "one chemical-one structure-one parameter value" doctrine. Here, a single conformer characterized by discrete point values of its parameters is used to represent a chemical under study, while all other conformational states are ignored. This assumption runs counter to a growing number of studies indicating that most organic molecules of non-trivial size are not just three-dimensional but are "four-dimensional" because they exist as an ensemble of three-dimensional conformations interchanging over time (or equivalently, consist of a distribution of different microscopic states) at room temperature Pissurlenkar, Khedkar, Iyer, & Coutinho, 2011). Their properties and corresponding reactivities depend intimately on this ensemble. Therefore, knowing the "structure" of a molecule requires knowing the structures in this ensemble. The same is true for the corresponding enzyme structure as well. Structural information derived from a single low-energy conformer may be insufficient for a complete description of a chemical and could be the prime reason for failure in QSAR analysis. Ultimately, to predict accurate binding affinities, it is essential to go beyond predicting a single "dominant lowest energy" conformation of the ligand complexed with the protein.
The observation that relatively subtle energy differences between the various conformational states of a molecule can yield significant variation in biological endpoint underscores the necessity to represent the molecular parameter values for each molecule as finite ranges, instead of single point values corresponding to the lowest energy conformer. Hence, the application of traditional logic in the investigation of chemical structures requires the "one chemical-one structure-one parameter value" dogma to be modified to a "one chemical-finite set of structures-range of parameter values" principle. The ability to account for the conformational dynamics associated with enzyme-inhibitor complexes is of specific importance for those methods such as QSAR that quantitatively predict some measurable real-life property of a molecular system. With very few exceptions, researchers (Andrade, Pasqualoto, Ferreira, & Hopfinger, 2010;Araújo et al., 2011;Fontes Ferreira da Cunha, Albuquerque, Ceva Antunes, & Bicca de Alencastro, 2005;Hopfinger et al., 1997;Oliveira, Ramalho, & da Cunha, 2009;Pissurlenkar et al., 2011;Vedani & Dobler, 2002a, 2002bVedani, Briem, Dobler, Dollinger, & McMasters, 2000;Wildman & Crippen, 2002) have tended to neglect the investigation of conformational ensembles in the design of 3D-QSAR techniques. The reason being the basic dogma of traditional 3D-QSAR that only those parameters contribute to good QSAR models which are derived from a specific active conformation generating the molecular interaction field (MIF) (Goodford, 1985) responsible for the ligand-receptor binding. However, it is noteworthy that the success of such MIF-based 3D-QSAR studies does not depend on finding the real receptor-bound/bioactive conformation, but rather on a highly self-consistent molecular alignment. Therefore, almost all the available 3D-QSAR methods require an assumed bioactive conformation and a good molecular alignment as an input Doweyko, 1991;Gancia, Bravi, Mascagni, & Zaliani, 2000;Klebe, Abraham, & Mietzner, 1994;Pastor, Cruciani, McLay, Pickett, & Clementi, 2000;Robinson, Winn, Lyne, & Richards, 1999;Silverman & Platt, 1996), and their ability to predict the real active conformation is consequently very limited. The way out began with the consideration of higher dimensions in QSAR where the effects of conformational flexibility, solvation, and the induced fit were taken into account in a holistic way. Hopfinger et al. (1997) and the group of  have developed QSAR methods beyond the 3rd dimension by considering the conformational flexibility as the fourth dimension. As an evolution of molecular shape analysis (Hopfinger, 1980), Hopfinger et al. have developed a 4D-QSAR formalism which incorporates ligand conformational flexibility and multiple alignment by ensemble averaging the spatial features i.e. marking the fourth "dimension" of QSAR. Besides this, an exhaustive evaluation of ligand-embedded pharmacophore groupings is also adopted in the QSAR model-building process. Construction of the QSAR model and identification of possible pharmacophore sites is achieved by a three-step statistical analysis, which comprises a genetic algorithm (GFA) optimization followed by backward elimination multidimensional regression and ending with another round of GFA optimization. The methodology has proved useful in the construction of quantitative 3D pharmacophore models for a set of ligands for which the geometry of the corresponding receptor is not known (Receptor Independent: RI-4D-QSAR) as well as for those cases where it is known (Andrade et al., 2010;Duca & Hopfinger, 2001;Hopfinger et al., 1997;Pan, Tseng, & Hopfinger, 2003;Pan, Liu, Senese, Hopfinger, & Tseng, 2004;Senese & Hopfinger, 2003;Venkatarangan & Hopfinger, 1999). The most recent evolution in QSAR has been the incorporation of the induced fit protocol as the 5th dimension allowing for a multiple representation of the topology of the quasi-atomistic receptor surrogate and simultaneous evaluation of various solvation models both implicitly as well as explicitly as the 6th dimension, additionally incorporating contributions from the solvent and entropy into the analysis (Oliveira et al., 2009;Vedani & Dobler, 2002a, 2002bVedani, Dobler, & Lill, 2005).
A significant amount of structural and activity data is still missing due to the lack of experimental and theoretical studies on the relationship of conformational change and associated variation in parameter values. Hence, an approach where biological activity is modeled as a function of molecular descriptors, derived from specifically selected conformations of the enzyme-inhibitor complex rather than as a property derived from the lowest energy gas phase conformer, has the potential to augment the quality and predictive power of a 3D-QSAR model. In our recently published methodologyensemble QSARthe applicability of the conformational ensembles in QSAR has been discussed. However, the approach was tested successfully only on peptide data-sets (ligandbased approach) and its extension into receptor-based setting was thought to be the subject of an ongoing endeavor (Pissurlenkar et al., 2011). With this objective, we address herein the issue of the need for an ensemble of protein-ligand conformers in the QSAR domain by a "dynamic" receptor-based QSAR approach termed ensemble Comparative Residue-Interaction Analysis (eCoRIA), which samples all thermodynamic contributions to ligand-receptor binding from the molecular dynamic (MD) trajectories and uses them as descriptors in the QSAR analysis. The term "ensemble" is used to describe attempts at mimicking the infinite conformational space of a protein-ligand complex by sampling a set of discrete isomers from the conformational trajectory. The method first establishes a collection of reasonable protein-ligand conformations that captures the bioactive conformation as one of diverse, energetically accessible conformations, from which the thermodynamic descriptors are extracted. These thermodynamic properties governing ligand binding are then correlated with the biological activity by a chemometric technique. The eCoRIA formalism has been tested on three diverse enzyme-inhibitor data-setsglycogen phosphorylase b (GPb), human immunodeficiency virus-1 protease (HIV-PR), and cyclin-dependent kinase (CDK2).
Some features of eCoRIA formalism may overlap with our preceding QSAR techniques, which were adopted directly; some were slightly modified, while others were ignored altogether (Datar et al., 2006;Pissurlenkar et al., 2011;Verma et al., 2008). Therefore, eCoRIA may seem to be an unwieldy mixture of techniques and principles but it is firmly based on our in-house QSAR technique -eQSAR.

Materials and methods
The operational steps involved in the eCoRIA analysis are schematically illustrated in Figure 1.

Theory
A literature survey of current 3D-QSAR techniques indicate, perhaps not too surprisingly, that investigating the thermodynamics of ligand-receptor binding interaction is of primary focus. Nevertheless, when the receptor geometry is available, one can theoretically evaluate all the thermodynamic properties that govern the binding of the ligand to its receptor.
The biological activity is related to the free energy of binding ΔG bind . The later is equated to the inhibition constant (K i ) as: where T is the absolute temperature and R the gas constant.
The in vitro measures of biological activity are representative of the relative ligand binding strength (ΔG bind ) and can be used as the dependent variable for constructing QSAR models.
The free energy of binding, ΔG bind , can be expressed empirically as a sum of the relative free energies of various thermodynamic events as follows: where ΔG solv is the change in solvation energy, ΔG conf reflects the conformational changes in the ligand as well as receptor, ΔG inter represents the specific intermolecular interaction energies i.e. van der Waals and electrostatics, ΔG hydrophobic is the hydrophobic interaction between the receptor and ligand, and ΔG entropy is change in entropy. Each one of the terms in Equation (2) can be evaluated with a molecular mechanics forcefield. However, since no force field is exact, the energy terms may be analyzed as descriptors using a QSAR paradigm in order to scale them appropriately.

Workflow
We discuss the sequence of steps in eCoRIA asmodeling molecules and their complexes; generation of protein-ligand ensembles; measurement of thermodynamic components involving ligand-receptor binding; and finally the chemometric analysis to derive the regression equations.

Modeling enzyme-inhibitor complexes
Protein-ligand complexes were sourced from the Protein Data Bank (Bernstein et al., 1977(Bernstein et al., , 1978(Bernstein et al., , 2000. In the absence of an X-ray structure, a model of the ligand obtained by docking into a receptor of known geometry can also serve as a suitable starting point. Every enzyme-inhibitor complex is first corrected for any crystallographic errors using the Protein Preparation Wizard workflow in the Schrödinger molecular modeling suite (Schrödinger LLC, New York). Structures with missing residues are repaired and correct bond orders and charges assigned to the atoms. The protein is capped with the N-methylamino group at the C-terminal and the acetyl (ACE) group at the N-terminal to maintain a backbone like structure. Beside the crystallographic water molecules, each ligand-protein complex is solvated with approximately a 5 Å thick shell of TIP3P water molecules. Counterions are added to satisfy the electroneutrality condition in the system. A distance-dependent dielectric is used to account for the electrostatic shielding effects of aqueous solvent. Periodic boundary conditions (NPT ensemble) are applied to the system. The system is then subjected to energy minimization using the OPLS 2005 forcefield, so as to remove aberrant van der Waals contacts while preserving most of the original structure information, using a cycle of steepest descents followed by conjugate gradients until the gradient converges of .01 kcal/mol/Å. The minimized structure is then used in the molecular dynamics simulation (MDS) to generate the ensemble of conformations.

Generation of conformational ensemble
The task of generating conformational ensembles can be accomplished by two approaches: deterministic (MD) and stochastic (MC). We have employed MD simulation to generate the conformational ensemble profile (CEP) of the enzyme inhibitor. The geometry of the energy minimized enzyme-inhibitor complex obtained in the first step is used as the starting structure for the MDS. The MD simulation is run with the MD program contained in Desmond (version 2.4, DE Shaw Research, NY, USA). The simulation is run for a period of 5 ns under the NPT (constant number of particles, pressure, and temperature) ensemble. The temperature of the system is maintained at 300 K by coupling to an external bath based on the Berendsen algorithm (Berendsen, Postma, Van Gunsteren, DiNola, & Haak, 1984). An isotropic condition with pressure restrained to 1 bar is achieved with the Berendsen barostat. High-frequency vibrations are eliminated by applying the SHAKE algorithm, constraining all bonds to their equilibrium values. Initial velocities are generated randomly by a Maxwell distribution at 300 K in accordance with the masses assigned to the atoms. The Particle Mesh Ewald (Essmann et al., 1995) method is used for calculating long-range electrostatics. The MD trajectory and corresponding energies are sampled every 5 ps to construct the CEP of each ligand-receptor complex. The MD approach generates ensemble of structures representative of those found in thermal equilibrium.

Measurement of thermodynamic properties
As discussed earlier in Equation (2), the complete thermodynamics of enzyme-inhibitor binding leading to a biological response can be described empirically as a linear sum of energy components that describe physical changes in the protein-ligand system caused by binding.
An overview of the energy components known to contribute significantly to protein-ligand association and which are adopted in the eCoRIA formalism is as follows.
The specific non-bonded interactions between the ligand and receptor are key elements in the binding thermodynamics. The components of the non-bonded interaction energyvan der Waals (E vdW ) and Coulombic (E Coul ) are calculated between the inhibitor and each amino acid residue in the protein environment. To account for the strong distance dependence of the electrostatics we used an off-grid calculation to include longrange electrostatic interactions. These interactions are calculated with the OPLS 2005 (Jorgensen & Tirado-Rives, 1988;Jorgensen, Maxwell, & Tirado-Rives, 1996) force field using the Glide module incorporated in the Schrödinger molecular modeling suite (Friesner et al., 2004(Friesner et al., , 2006.
The functional form of the non-bonded interaction energies between the ligand and the receptor is expressed as where i and j are atoms of protein and the ligand; A ij and C ij are the repulsive and attractive coefficients for the interacting atoms i and j, respectively; r ij is the distance between atoms i and j; q i and q j are the charges of atoms i and j, respectively; while e is the dielectric constant.
Accounting for hydrogen bonding separately from the electrostatics is crucial in modeling the specificity of drug-receptor interaction. Besides participation in receptor-drug complexation, H-bonding stabilizes the drug in the conformation required for proper binding to the receptor. The free energy of protonation defined as the difference between the total energies of the protonated and neutral forms of the molecule is considered as a good measure of the strength of hydrogen bonds i.e. the higher the energy, the stronger is the bond. The quantified values for the hydrogen bonding interactions between the ligand and receptor were calculated using the Glide module in the Schrödinger molecular modeling package.
Hydrophobic interaction has long been recognized to play an important role in the thermodynamics of ligandreceptor binding. It is an entropy-driven process resulting primarily from the change in the orientation of solvent molecules in the solvation shell wrapping the solute molecules and also from the bulk form of solvent molecules. The HINT (Kellogg, Semus, & Abraham, 1991) program incorporated in the Sybyl suite (version 7.1, Tripos Inc., USA) was used to calculate the hydrophobic interactions between all atom pairs in a ligand-protein system. The functional form of the HINT score is: where, B is the total interaction between the molecules, i and j are atoms of the protein and ligand, respectively, b ij is the micro-interaction constant/binding score that represents the attraction/ interaction between atoms i and j in the protein-ligand complex, a i is the hydrophobic atom constant for atom i based on the CLOGP method of Hansch and Leo, S i is the solvent accessible surface area for atom i, R and r are the distance functions for the interaction between atoms i and j, T ij is a descriptor function, which is essentially a sign change function parameterized to maintain the signs of interactions consistent with the HINT convention that favorable interactions have positive HINT scores, while unfavorable interactions have negative HINT scores. Another major parameter contributing to the thermodynamics of ligand-protein binding is the conformational entropy. The entropy loss in both the protein and the ligand accounts for the loss of torsional, vibrational, rotational, and translational free energies upon binding. When two molecules bind, there is a loss of three rotational and three translational degrees of freedom due to constraints about single bonds on complex formation. The loss of entropy due to reduced conformational flexibility upon receptor binding was estimated using Discovery Studio 2.5 (version 2.5, Accelrys Inc., San Diego, CA, USA).
eCoRIA also keeps an account of the energetic contribution from changes in the conformation of the ligand upon binding with the protein and vice versa. The conformational change associated with the ligand upon binding to the protein is much more significant compared to that for the protein and is represented as the strain energy. It is computed using a molecular mechanics potential function as the energy associated with changes in bond lengths, angles, torsions, and non-bonded interactions. The ligands extracted from their complexes are minimized using the smart minimizer in Discovery Studio until an energy gradient of .001 kcal/mol/Å is reached. The difference in energy of the ligands in the complex and the conformation minimized in vacuo is taken as the conformational strain energy.
The last term contributing to the free energy of binding is the solvation energy at physiological conditions (hydration free energy). The ligand-protein interactions compete with water interactions due to the fact that both the protein and the ligand are solvated before complexation. The energy required to strip the solvent molecules off the ligand while moving from an aqueous environment to a hydrophobic protein cavity constitutes the solvation energy. Because of the relatively high surface area of the protein compared to the ligand, the net solvation free energy for the protein is negligible as compared to that of the ligand. For the ligand, the electrostatic contribution to the solvation free energy is calculated using the Jaguar module in the Schrödinger molecular modeling package.
Every term in the energy function described above contributes more or less to the enzyme-inhibitor binding process. Through these energy functions one can explore various details of binding specificity within enzymeinhibitor systems. More details about these energy functions may be found in our earlier articles (Datar et al., 2006;Verma et al., 2008). It should be noted that in eCoRIA the energy terms are derived from an ensemble of energetically reasonable conformations, a feature unique to eCoRIA and not adopted in earlier techniques.
A final column containing biological activities is then added to the study table. In vitro measures of biological activity, such as K i and IC 50 , are taken to reflect relative ligand-binding strength and used as the dependent variable in the eCoRIA study.
It is hoped that the energy decomposition scheme adopted herein must, at some point, meet the two opposite propensities of the Scylla of considering enough energy terms that the elements responsible for the activity differences can be isolated and the Charybdis of including excessively large number of terms that the signal-to-noise ratio is so low that the subsequent analysis to derive a meaningful statistical model fails (Wade, Ortiz, & Gago, 1998).

Chemometric analysis
Static structure analyses by averaging the energy calculations over ensemble of structures have been the norm in understanding protein-ligand interactions. However, it may not reflect the distribution of the properties over the conformational states sampled by the molecule. It is the shape of the energy landscape that is crucial to understand real-time protein conformational fluctuations and dynamics. In the eCoRIA formalism presented here, we have applied moments of distribution as a quantitative measure of the shape of the unseen energy landscape. Best known among these is the first moment, M 1 , which describes the mean of the distribution around which central clustering occurs and is calculated as where x j is the measured thermodynamic property and N is the number of samples in the ensemble (1000 in this case).
It is a good approximation of the central tendency for unimodal symmetric distributions. However, it can be misleading in a skewed or multimodal distribution of property. Therefore, it is essential to consider additional measures for quantifying skewed distributions. After having characterized a distribution's central value, one has to know the spread of the distribution i.e. its "width" or "variability" around that value. The spread of a distribution may be estimated using various parameters, of which variance is the most common one, considered to be the second moment of distribution, M 2 .
It is measured as the sum of the squared deviations from the mean divided by the number of samples minus 1 (N − 1). The third moment of distribution, M 3 , is the skewness coefficient, which characterizes the degree of asymmetry of a distribution around its mean. In contrast to the mean which is a dimensional quantity having the same unit as the measured quantities, the skewness is defined in such a way as to make it non-dimensional.
It is a pure number characterizing only the shape of the distribution for a property over the ensemble. The kurtosis coefficient related to the excess (kurtosis) of the distribution is the fourth moment of distribution, M 4 .
It measures the relative flatness of a distribution (as against the normal distribution showing a kurtosis coefficient of zero). A positive kurtosis coefficient signifies a tapering distribution, while a negative kurtosis coefficient indicates a relatively flat distribution of the property over the ensemble of conformations. The four principle moments of distributionmean, variance, skewness, and kurtosis are calculated for each free energy component over the ensemble of conformations sampled by the enzyme-inhibitor complex. This approach is distinctly different from selecting a single conformation of an enzyme-inhibitor complex representing the ensemble by a single average value. The methodology explicitly incorporates the features and properties of all conformations of a molecule to represent the distribution of these properties over the conformational states available to a molecule.
As a result of the large number of terms the genetic version of partial least squares (PLS) i.e. G/PLS (Genetic function approximation in conjunction with PLS) (Dunn III & Rogers, 1996;Datar et al., 2006;Khedkar, Malde, & Coutinho, 2007;Pissurlenkar et al., 2011;Verma et al., 2008; is used as the statistical tool to systematically separate the "energetic signals" from the "background noise" in order to obtain a correlation between the activity and a subset of weighted energy terms. In order to preserve the information contained in the descriptor data-set, the pretreatment of the QSAR table based on correlation matrix was avoided. However, the descriptors (moments of distribution over each property) were scaled according to their mean and standard deviation so that all the columns are placed on the same platform and have equal weight (importance), while deriving the QSAR equations.
To derive statistically significant QSAR models, the data-set was split into training and test sets on the basis of structural, chemical, and biological diversity using similarity search techniques viz. D-Optimal design, Tanimoto similarity coefficient, and the Euclidian distance matrix criteria defined in Cerius2 (version 4.8, Accelrys Inc., San Diego, CA, USA). The selection technique searches for diverse molecules by comparing their chemical nature in 2D space. All QSAR equations were generated with the G/PLS method as implemented in the Cerius2 program. The models were developed with linear terms and the optimal number of components was selected based on the highest cross-validated r 2 (i.e. q 2 ) (Stone, 1974;Deep, 2006). The length of the equations was varied from 6 to 10 terms; with a smoothness value of 1.0 (the smoothness function controls the bias in the scoring factor between equations with different number of terms). The number of generations was limited to 10,000 and population size to 500. Crossover and mutation probabilities of 50% (default settings) were used. The models were rigorously validated internally using randomization at 99% confidence interval (Rucker, Rucker, & Meringer, 2007), Leave-One-Out (LOO), Leave-Five-Out (L5O), q 2 adj (Romeiro, Albuquerque, de Alencastro, Ravi, & Hopfinger, 2005), and by bootstrap procedures (Richard, Cramer III, Bunce, Patterson, & Frank, 1988). For external validation, the QSAR models were used to predict the biological activity of compounds not included in the training set (validation set). The models were further tested for their statistical significance by calculating validation parameters devised by Roy et al. (r 2 m and r 2 p ) (Pratim Roy, Paul, Mitra, & Roy, 2009). While r 2 m is a measure of the external predictivity of a model, r 2 p which is based on the randomization test indicates the statistical significance of the relationship between activity and the descriptors.
Compared with classical molecular mechanics calculations of binding energies, the advantage of calculating ligand-protein interaction energies over an ensemble and then subjecting them to statistical analysis is that the noise due to inaccuracies in the potential energy functions and molecular models can be reduced and mechanistically important interaction terms can be identified.
The best statistical models derived using the eCoRIA approach were then compared with CoRIA and the standard 3D-QSAR techniques -CoMFA and CoMSIA using the same training and test set molecules.

Results
The objective behind eCoRIA formalism was to develop a technique that accounts for the distribution of thermodynamic properties over an ensemble of enzyme-inhibitor conformers and incorporates their properties in the QSAR framework. The outcome of this technique for three different ligand-receptor systems is discussed comprehensively below. There are several literature reports of QSAR models obtained by various methods on different series of inhibitors acting on these targets. However, as the data-set compiled in this study and those used in the previous reports are not exactly identical, a direct comparison of the statistical outcomes with those obtained by eCoRIA is not fair. However, interpretation of the models in terms of chemistry appears to be a more meaningful alternative.

Data-set 1: GPb
The first application of the eCoRIA formalism is on a data-set of GPb inhibitors, an enzyme that catalyzes the first (or the rate-limiting) step in the phosphorolysis of glycogen to glucose-1-phosphate in the liver. Inhibition of hepatic glycogen phosphorylase has been proposed as a promising strategy for attenuating hyperglycemia in type 2 diabetes. Since glucose production in the liver has been observed to increase in type 2 diabetes, inhibiting the release of glucose from the liver's glycogen stores appears to be a valid approach. Administration of glycogen phosphorylase inhibitors prevents the conversion of glycogen into glucose and, thus, helps attenuating hyperglycemia in type 2 diabetes by shifting the balance between glycogen breakdown and synthesis in favor of the latter. The enzyme glycogen phosphorylase exists in two interconvertible forms (an inactive b and an active a form); the proportion of each form is regulated by phosphorylation. The less active b (GPb) form is transformed by phosphorylation (glycogen phosphorylase kinase a) to the more active a (GPa) form. The monomer of glycogen phosphorylase in the muscle is a large protein composed of 842 amino acids. The principle reason for selecting this target is that it has been extensively explored by Hopfinger et al. for evaluating their 4D-QSAR formalism. Therefore, a comparison with the outcome of previous results permits an assessment of the accuracy and information content of the eCoRIA models.
A set of 45 enzyme-inhibitor crystal complexes of GPb were compiled from the protein data bank. The experimentally determined inhibition constants were transformed into pK i values and are spread over a satisfactorily large range of 5.09 units. The PDB IDs of these molecules are summarized in Table 1S together with their experimentally determined activities. Although only the optimal eCoRIA equations in terms of statistics are shown in Table 1, other regression equations are also found to be nearly as significant and with slightly different descriptors. Statistically significant models were obtained with correlation coefficient (r 2 ) greater than .90. Cross-validation by the LOO and L5O procedure also produced statistically significant q 2 values (>.70). The bootstrap values further advocated the robustness of these models. The predictive r 2 i.e. r 2 pred calculated on the test set is also found to be greater than .60, indicating a good predictive power of the models for molecules outside the training set. A value of greater than .50 obtained for r 2 m and r 2 p also establish the power of predictivity for the models and eliminate the possibility of chance correlation. The residuals [experimental activity (y exp ) − predicted activity (y cal )] for each compound in the training as well as test set hardly exceed 1 unit. Comparison with the CoRIA, CoMFA, and CoMSIA approaches reveals that the models derived using eCoRIA method are significantly better in terms of r 2 and predictive capability (r 2 pred ) ( Table 2). The correlation plots of experimental vs. predicted pK i values of the molecules in the training and test sets for eCoRIA model (Equation (1) in Table 2) are shown in Figure 1S.
All 500 equations associated with the best eCoRIA model were analyzed for the frequency with which a particular descriptor appears in the population of equations. The plots of the most frequently appearing descriptors in different equations are shown in Figure 2. The frequency of appearance of different terms is shown on the X-axis, whereas the sign of the coefficients of the terms in the equations are shown on the Y-axis. Terms with positive coefficients in the equations are plotted on the positive Y-axis, whereas those with negative coefficients are on the negative Y-axis. The greater the frequency of appearance for a descriptor in the population of equations, the greater is its role in explaining the variance in the biological activity. A detailed analysis of the models obtained by the eCoRIA approach is described below.
While interpreting the equations derived from the eCoRIA methodology one should remember that (1) More negative the value of the interaction energies, stronger is the interaction between the ligand and the respective amino acid in the target enzyme. Similarly, positive values of these interaction energies signify weaker interaction between the enzyme and the ligand. "vdW" refers to the van der Waals interaction, "Coul" refers to the Coulombic interactions, HINT refers to hydrophobic interactions, and Hbond is the hydrogen-bonding interactions.
(2) In case of hydrophobic interactions, positive values imply favorable interactions while unfavorable interactions are associated with a negative coefficient.
(3) It is the sign (+/−) of the coefficient of the equation in the QSAR equation that will eventually guide the need to strengthen/enhance (i.e. shift the interaction energy value towards more negative) or weaken/reduce (i.e. shift the interaction energy value towards more positive) the interaction, in order to favor binding.
An examination of the frequency plot of descriptors appearing in the eCoRIA model ( Figure 2) indicates that the Coulombic (/electrostatics), van der Waals (/steric), hydrogen bonding interactions, and entropy are the major forces driving the binding of the ligand with the GPb enzyme. The terms describing steric interaction of the ligand with residues Tyr:280, His:377, and Glu:287 in GPb appear with a negative coefficient in the equations. This indicates that an increase in the biological activity can be achieved by strengthening the steric interaction of the ligand with these residues. On the other hand, the steric interaction with residues Asn:284 and Ser:674 has a positive coefficient in the equations suggesting that this interact needs to be toned down to improve the biological activity. The Coulombic interaction energy of residue Gly:135 has a negative coefficient in the equation, indi-  Note: Where r 2 : correlation coefficient; q 2 LOO ; and q 2 L5O are the cross-validation correlation coefficient by LOO and L5O procedure, respectively; r 2 pred is the predictive correlation coefficient; r 2 bs ; and r 2 rand are the correlation coefficient obtained by bootstrapping and y-randomization, r 2 m and r 2 p are the validation parameters calculated as per Roy et al. cating that an overall negative value of the Coulombic interaction energy of this residue with the ligand will improve binding. On the contrary, the Coulombic interaction energy with residues Arg:292 and Asp:283 have a positive coefficient in the equations suggesting that an overall positive value of the Coulombic interaction energy of these residues with the ligand will favor binding. The term describing the entropy of the system appears with a positive coefficient, which suggests that we need to increase the entropy of the system in order to favor the binding process. It also indicates that the flexibility of the molecules needs to be increased. Also, the hydrogen bonding interaction of the ligand with the residue His:377 should be strengthened to ensure tighter binding due to the negative coefficient of this interaction energy term in the equations. Although the terms describing hydrophobic interaction appear transiently during the evolution, they gradually disappear as the genetic function progresses. However, these terms do appear in the equations when the number of terms in the equation is increased to 10 or more. It is important to mention that no bias was placed on any of the descriptors and all had an equal weightage during development of the eCoRIA equations. A schematic representation of the active site of GPb in complex with an inhibitor (the most active molecules in the data-set-2prj) is shown in Figure 3. Interestingly, the eCoRIA models could successfully pick out the residues involved in the catalytic activity of the enzyme. Such observations have also been described in the literature.
The ultimate goal of eCoRIA is to capture the shape of the energy landscape (i.e. the distribution of variables over the ensemble), which is crucial and may not be observable when a single lowest energy static structure is used for QSAR studies. eCoRIA analysis could successfully capture this distribution of properties as is manifested in the terms that appear in the QSAR equations, e.g. skewness (M 3 ) of the steric interaction of the ligand with Tyr:280 is shown have a strong correlation with the activity. This indicates that the skewness of the steric interaction energy of the ligands with Tyr:280 is one factor for the variation in the biological activity of the ligands. That the asymmetry of the interaction of ligands with a specific residue may be responsible for the variation in their activity is evident from this result. Likewise, the variance (M 2 ) around the mean reflects the These results indicate that the contribution of each microscopic state to the variation in activity cannot always be weighted by the average value. Thus static structure analyses from ensemble averaged measurements at equilibrium which has been the norm in understanding protein structure and functions needs to be replaced with true ensemble representations to reflect the distribution of the thermodynamic properties over the energy landscape.

Comparison with other 3D-QSAR methodologies
The eCoRIA methodology has been compared with our standard CoRIA (receptor-based QSAR), CoMFA, and CoMSIA (receptor-independent QSAR) methods for the same training and test sets. Although these formalisms embody some information contained in the eCoRIA methodology, in many ways the eCoRIA formalism is significantly different. One important point of difference is that these methodologies rely on static structure analysis, while eCoRIA models the biological activity as a function of molecular descriptors derived from an ensemble of discrete microscopic states. The statistics of all these approaches are summarized in Table 2. It is observed that the eCoRIA models exhibit significantly better statistics over the other three methodologies as manifested by the basic statistical parameters (r 2 , q 2 LOO and q 2 L5O ). Also, the eCoRIA models perform unvaryingly better than other approaches in the external prediction (r 2 pred ). The metrics devised by Roy et al. further establish the statistical significance of eCoRIA models over the other approaches. It is noteworthy that though the standard CoRIA models are derived using the same thermodynamic properties incorporated in the eCoRIA formalism, an ensemble representation of these properties appears to be a superior mode of describing the activity. The CoMFA and CoMSIA techniques calculate intermolecular interaction energies indirectly by considering probes at each grid point around an aligned static conformation of the molecules, which does not directly reflect the dynamics occurring in the ligand-receptor interactions.
There are few QSAR studies reported for GPb inhibitors in the literature. A data-set of GPb inhibitors, which are analogs of glucose, were investigated by Hopfinger et al. using receptor-independent (RI) 4D-QSAR analysis (Venkatarangan & Hopfinger, 1999). The method uses grid cell occupancy descriptors for defining the thermodynamic averaged spatial pharmacophore. The QSAR model derived using this approach produced a q 2 of .81-.83 and an r 2 of .86-.87. In comparison to these statistics, the eCoRIA models have comparable q 2 (.81-.83) but relatively better r 2 (.92-.93) values. In another ligand-based QSAR approach termed LISA (Verma, Malde, Khedkar, Iyer, & Coutinho, 2009), a data-set of GPb inhibitors were investigated wherein the global molecular similarity is broken up as local similarity at each grid point surrounding the molecules and is used as a QSAR descriptor. It is observed that the eCoRIA statistics surpass those derived by LISA in terms of the internal cross-validation (q 2 ) as well as in the external prediction (r 2 pred ). In yet another report, a RD 4D-QSAR analysis was performed by Hopfinger et al. (Pan et al., 2003) on the same data-set of GPb inhibitors as used in the RI 4D-QSAR analysis. A pruned ligand-receptor model was used to estimate ligand-receptor thermodynamics in the functional region of the protein and explicit water molecules were excluded in this study to simplify the analysis of the ligand-binding process. The statistics of the eCoRIA model was found to be comparable to Hopfinger's RD 4D-QSAR (q 2 ¼ :82, r 2 ¼ :85).

Data-set 2: human immunodeficiency virus-1 protease (HIV-1 PR)
HIV-1 protease is an obligatory enzyme in the replication of the HIV. It cleaves the polyproteins transcribed from the gag and pol genes into enzymes and structural proteins essential for the assembly and maturation of infectious virions. The finding that inactivation of this viral encoded aspartyl protease produces a progeny of virions that are immature and non-infectious, elicited intense efforts in the development of specific and potent inhibitors targeted against this enzyme, as a novel therapy for AIDS. HIV-1 protease has been the subject of interest for a number of molecular modeling endeavors, which prompted us to test the eCoRIA methodology against this target as well. A data-set of 72 enzyme-inhibitor complexes of HIV-1 protease were compiled from the RCSB protein data bank and used as a test bed for the eCoRIA methodology. It was then segregated into a training set for model generation and a test set for model validation on the basis of chemical and biological diversity using similarity search techniques. The PDB IDs of these molecules together with their experimentally determined activities are summarized in Table 2S. The QSAR equations developed using eCoRIA and those for the standard CoRIA approach are compared in Table 3. Cross-validation using LOO returned q 2 > :70, while the L5O procedure produced a slightly lower yet significant value. When the eCoRIA equation was used to predict the activity of the test set, the predictive correlation co-efficient was found to be greater than .60 demonstrating good external predictability. The correlation plots of experimental vs. predicted pK i values of the molecules in the training and test sets for the best eCoRIA model are shown in Figure 2S. Also an r 2 bs value, close to the  r 2 for the training set, obtained on 100 bootstrap runs suggests a good internal consistency within the underlying data-set. Overfitting and chance correlation between the activity and descriptors were gauged by employing y-randomization. Poor regression models with low r 2 (<.20) were obtained indicating that the models are stable and the results are not based on chance correlation. The statistical parameters of the best QSAR models are summarized in Table 4. Comparison with the standard CoRIA, CoMFA, and CoMSIA approaches reveal that the QSAR models derived using eCoRIA formalism are significantly better in terms of r 2 and predictive capability (r 2 pred ). In this study, in addition to building a sound QSAR model with good predictive power, identifying the imperative ligand-receptor interactions were also of high interest. To facilitate realization of these objectives, all 500 equations associated with the best QSAR model were examined for the frequency with which a descriptor occurs in the population of equations (Figure 4). The more frequently a descriptor is used in generating new models, the greater is its relative significance in explaining variance in the biological activity. The observations are summarized in the Table 5. A schematic representation of the active site of HIV-PR in complex with an inhibitor is shown in Figure 3S. The eCoRIA models could capture the residues involved in the catalytic activity (e.g. Asp:B25, Pro:B9, Leu:A90, Thr:A80) of the enzyme.
In a computational study by Holloway and co-workers (Holloway et al., 1995), it was found that the intermolecular van der Waals and electrostatic interactions between HIV-1 protease and a series of inhibitors could explain the variance in the observed binding affinities of these ligands. The eCoRIA approach could capture in addition to these properties, the contribution from other thermodynamic properties as well (e.g. hydrophobic interaction). Our previous report on molecular docking and 3D-QSAR studies on a series of cyclic urea analogs with HIV-PR inhibitory activities has some parallel observations to the present eCoRIA model. Thus, crucial interactions with Asp:B25, Pro:B9, Leu:A90, and Thr: A80 residues that influence the biological activity have been captured in both studies . In another report, COMBINE analysis was performed on a data-set of HIV-protease inhibitor complexes, wherein the ligand-receptor interaction energies were broken down into the individual van der Waals and electrostatic contributions (Perez, Pastor, Ortiz, & Gago, 1998). The statistical outcomes of eCoRIA model are found to be comparable with this analysis, where a q 2 and r 2 of .72 and .90 were obtained for the COMBINE model, the best eCoRIA model produced values of .75 and .82, respectively.
Data-set 3: cyclin-dependent kinase 2 Cyclin-dependent kinase 2 (CDK2) also known as cell division protein kinase 2 is a member of the cyclindependent kinase family of Ser/Thr protein kinases, which regulate the cell cycle progression in proliferating eukaryotic cells. Aberrant CDK control and consequent loss of cell cycle checkpoint function have been directly correlated to the molecular pathology of cancer. In view of the strong link between CDKs and cancer, CDKs are being investigated as a potential target for anticancer medication. Studies with first generation CDK inhibitors support a role for the use of these agents as both single agents or in combination with cytotoxic drugs. A dataset of 81 enzyme-inhibitor complexes of CDK2 were compiled from the protein data bank for testing the eCoRIA approach. The PDB IDs for the molecules are summarized in Table 3S together with their experimentally determined activities. The regression equations and statistical analysis of the eCoRIA models and its comparison with CoRIA, CoMFA, and CoMSIA approaches are provided in Tables 6 and 7, respectively. The correlation plot of experimental vs. predicted pIC 50 values of the training and test set molecules for the best eCoRIA model is shown in Figure 4S. Further, the derived eCoRIA models have good predictive ability and provide insight into the features governing the enzyme inhibition. Visual inspection of the energy values emanating from each protein residue highlights not only the amino acids whose interaction with the ligand varies from complex to complex, but also those parts of the enzyme that give rise to unfavorable interaction energies upon binding of the ligand. Examination of the eCoRIA equations and frequency analysis ( Figure 5) of the descriptors associated with them reveals that the terms describing van der Waals, Coulombic, hydrophobic, and hydrogen bonding interactions have a greater influence on the biological activity (Table 8). Figure 5S shows a schematic representation of the active site of CDK2 in complex with an inhibitor showing the key residues extracted by the eCoRIA models.
There are several reports of QSAR studies on CDK2 inhibitors. The CoMFA and CoMSIA models developed by Lan et al. for a series of pyrazolo[4,3-h]quinazoline-3-carboxamides have q 2 values of .64 and .50, r 2 values of .95 with predictive correlation coefficients of .50 and .46 respectively (Lan, Chen, Xiao, Sun, & Chen, 2010). Samanta et al. have reported a 2D-QSAR study on a data-set of 3-aminopyrazoles analogs to explore the pharmacophoric requirement to exhibit CDK2 inhibitory activity (Samanta et al., 2006). The best models were obtained using topological parameters, indicator variables, and atomic charges have r 2 of .88-.91 and r 2 pred of   -pyrazolo[3,4-d] pyrimidine derivatives, which produced models with r 2 of .79-.89 and q 2 of .59-.64 (Fernandez, Tundidor-Camba, & Caballero, 2005). The statistics derived for the eCoRIA models are significantly better than these studies both in terms of internal validation (q 2 [ :8, r 2 [ :9) as well as external predictivity (r 2 pred [ :65). Jans et al. have investigated the CDK2 inhibition potential of N2-and O6-substituted guanine derivatives using molecular docking and the hybrid ONIOM method. Using the best ligand poses obtained from docking as an input, a correlation model was derived between the ONIOM-derived quantum chemical descriptor "H-bond interaction energy" and the experimental biological activity, with a r value of .80 (Alzate-Morales, Caballero, Vergara Jague, & GonzaÌlez Nilo, 2009). They concluded that the residues at the entrance of the binding site (Lys:89 and Asp:86) were the main contributors to the total interaction energy between the protein and the ligand. The eCoRIA models shed more light on these interactions which recommends that increasing the van der Waals interactions of the ligands with Lys:89, while reducing the hydrophobic interaction with Asp:86 will improve binding. In another report, the same authors have used CoMFA to describe the activities of guanine analogs. The best CoMFA model derived with steric and electrostatic fields had q 2  (Mazanetz, Withers, Laughton, & Fischer, 2009;Withers, Mazanetz, Wang, Fischer, & Laughton, 2008). These ASP-QSAR models produced a non-cross-validated correlation coefficient (r 2 ) of .95-.99 and a cross-validated correlation coefficient (q 2 ) between .927 and .929. Comparison with these reported QSAR studies indicates that the eCoRIA models have an equivalent or in some cases better statistics in terms of r 2 , q 2 , and r 2 pred values. Also, it is noteworthy that the r 2 pred reported for eCoRIA models is derived from a larger and structurally diverse test set, while the above quoted studies were limited to a single chemical class.
A close appraisal of all eCoRIA models over the three data-sets reveals that some of the interactions shown to be decisive in these studies are distant from the ligand (~10 Å), which indicates that along with the direct interactions, indirect long-range interactions also contribute significantly towards the stability of the ligand-receptor complexes.

Discussion
A myriad of techniques for analyzing the energetics of enzyme-inhibitor interactions have been reported in the literature and some of them have been able to derive meaningful structure activity relationships for specific systems. However, it is observed that many of these modeling techniques are based on two approximations.
The first involves the approximations necessary to compute the free energies rapidly which usually means ignoring some components of the binding energy such as solvation and configurational entropy (Ajay & Murcko, 1995;Dominy & Brooks, 1999). It has been shown that solvation and desolvation effects inherent to ligand as  well as receptor can have significant impact on the strength of binding and the corresponding measurement of the resultant biological activity. Also, negligence of explicit water molecules during the MDS can lead to the generation and sampling of artifact states relative to the actual binding mode. The loss of configurational entropy upon binding also contributes significantly to the binding affinity. The restriction of a flexible molecule's motion upon binding to a protein causes a loss of conformational entropy. Some energy models used in computeraided molecular design neglect this entropic penalty in the interest of time. However, such a contribution could be especially important in specific systems such as for HIV protease, where binding is associated with a marked reduction in the mobility of the active site flaps. It is possible that other high-affinity protein-ligand systems may undergo similarly large losses of conformational entropy on binding. The eCoRIA methodology presented here tries to account for both these terms along with the typical energy components used in other RD QSAR approaches. Interestingly, the eCoRIA models derived for the GPb data-set could capture conformational entropy as an important thermodynamic property governing the ligand-receptor binding, while the term explaining solvation energy was observed in the eCoRIA equations obtained for the data-set of HIV-PR inhibitors.
Other properties like strain energy also appear transiently during the evolution but gradually disappear as the genetic function progress i.e. their frequency of occurrence is too low to be considered as important relative to other interaction terms for ligand binding. One explanation for the absence of these terms in the final equations could be due to the fact that for the data-sets evaluated in the present study, these terms may not be a major determinant in the overall binding process, so that they can be picked up more frequently by the statistical tool.
Most of the 3D-QSAR techniques reported till date have only probed and extracted information about a specific interaction involving the ligand (e.g. CoMFA considers only electrostatic and vdW interactions) and this is mostly restricted to its binding site (i.e. receptor pruning), while neglecting critical elements of the ligandbinding process. The receptor pruning approach has been adopted mainly to reduce the time and computational resources, assuming that the enzyme-inhibitor interactions are relatively short range. However, in order to gain a meaningful thermodynamic profile of proteinligand systems in QSAR, it is necessary to adopt a nongrid approach to incorporate the long-range interactions.
Computing the interactions in free space as opposed to using the truncated receptor (as in 4D-QSAR) for the analysis is well able to account for long-range electrostatic interactions. Interestingly, as discussed in the results sections, many of the interactions shown to be decisive in these studies were distant from the ligand, indicating that along with the direct interactions, indirect long-range interactions also contribute significantly towards the stability of the ligand-receptor complexes. The second type of approximation includes those involved in treating the enzyme-inhibitor complex as a static structure (Dominy & Brooks, 1999;Lightstone, Zheng, & Bruice, 1998). This assumption runs contrary to a growing amount of research indicating that the lowest energy conformation is least likely to represent the binding conformation with solvation and binding interactions capable of compensating for energy differences among the conformers of a molecule. The crystal structure is an average static representation of the dynamic behavior of a ligand-receptor complex and therefore contains limited information on possible conformational changes and corresponding coupled motions occurring in the inhibitor-receptor-binding event. The eCoRIA formalism does away with this approximation by eliminating the assumption of a static ligand-receptor complex and measures the energies over ensembles of structures, which is a major component to the "fourth" dimension in QSAR and hence eCoRIA may be regarded as a 4D-QSAR approach. Analysis of the statistical distribution of each energy component over an ensemble of conformations provides a more robust picture of binding than can be achieved by a single-structure representation. The width of the energy distributions is crucial in correlating with the biological result as evidenced in this study. Moments of distribution computed for the thermodynamic properties gives an account of the spread of these properties over the entire conformational space. This is the first time, as far we know, that a QSAR methodology extracts a feature associated with an ensemble of conformations and packages the dynamic information to establish a correlation equation.
The goal of eCoRIA analysis was to accurately translate the most significant thermodynamic properties of a set of ligand-receptor complexes into a matrix of informative variables to derive a correlation between these properties and the biological activity. The partitioning scheme adopted in eCoRIA analysis allows the categorization of the thermodynamics of binding into many individual energetic contributions and therefore provides a wealth of information that fully account for the biological activity. The data matrix derived from this fragmentation approach could capture the essence of all the structural changes in the molecules and the resulting quantitative model highlights those variables that are crucial for modulating the biological activity.

Conclusion
Although the concept of induced fit in enzyme-inhibitor binding has long been recognized, only a few studies have been published that address the nature and the energetics of the conformational changes that the small molecules undergo upon binding. In this paper, we have presented a methodology for analysis of protein-ligand association by modifying the current dogma in computational chemistry of "one chemical-one structure-one parameter value" to a "one chemical-finite set of structures-range of descriptors values" principle. The basic idea was derived from the fact that relatively small differences between conformers of a chemical are known to yield significant variation in the properties of the molecule. We have treated the enzyme-inhibitor complex as an ensemble of energetically reasonable conformations and examined the influence of ligand-receptor binding thermodynamics on activity by analyzing the problem as a QSAR calculation. This approach is distinctly different from methods that select a single lowest energy conformation of a molecule or those that average the descriptors over a set of molecular conformations.
Development of mechanistically sound SARs for receptor-mediated events requires one to account for the dynamic behavior of the complex as well as the selection of appropriate descriptors to represent the molecule under investigation. The scan of the potential energy surface by MDSs has validated that this MD-based QSAR approach provides a solution to the uncertainty in deducing the bioactive conformation of small molecules in QSAR. Moments of distribution computed for the thermodynamic events representing ligand-receptor interactions could provide insight into the influence of the potential energy landscape on the activity that is not captured in a single crystal structure.
Each application of eCoRIA analysis reported here has resulted in QSAR models with good statistics. The QSAR equations have revealed all structure activity relationships previously reported for these molecules as well as uncovered some features that were hitherto unknown and may have a hidden role in driving the ligand-receptor binding process. The predictive ability of eCoRIA analysis compares very favorably with that of other methods which make more conventional use of molecular mechanics interaction energies, such as COMBINE and FEFF 4D-QSAR (Santos-Filho, Mishra, & Hopfinger, 2001;.
Decomposition of the interaction energies into van der Waals, electrostatics, hydrogen bonding, hydrophobic, solvation, entropy, and strain energy terms enabled us to examine protein-ligand association in even greater detail. The former interaction energies are associated directly with a specific residue in the protein's active site and can advantageously be used to extract the residues that are important for binding and also serve as a guide for carrying out mutation studies directed towards understanding ligand binding. This adds to the attempts made by few others to include the influence of conformational entropy, solvation, and strain energy altogether along with the non-bonded interactions on biological activity.
In summary, by combining MD-based conformation analysis with quantitative structure-activity relationship analysis, eCoRIA promises to be a valuable extension to the classical nD-QSAR methodologies, both conceptually and also with respect to reliability in predicting the affinities of new chemical entities.