The effect of various atomic partial charge schemes to elucidate consensus activity-correlating molecular regions: a test case of diverse QSAR models

The estimation of atomic partial charges of the small molecules to calculate molecular interaction fields (MIFs) is an important process in field-based quantitative structure–activity relationship (QSAR). Several studies showed the influence of partial charge schemes that drastically affects the prediction accuracy of the QSAR model and focused on the selection of appropriate charge models that provide highest cross-validated correlation coefficient ( or q2) to explain the variation in chemical structures against biological endpoints. This study shift this focus in a direction to understand the molecular regions deemed to explain SAR in various charge models and recognize a consensus picture of activity-correlating molecular regions. We selected eleven diverse dataset and developed MIF-based QSAR models using various charge schemes including Gasteiger–Marsili, Del Re, Merck Molecular Force Field, Hückel, Gasteiger–Hückel, and Pullman. The generalized resultant QSAR models were then compared with Open3DQSAR model to interpret the MIF descriptors decisively. We suggest the regions of activity contribution or optimization can be effectively determined by studying various charge-based models to understand SAR precisely.


Introduction
Three-dimensional quantitative structure-activity relationship (3D-QSAR) is an important computational approach in ligand-based drug discovery and design (Lill, 2007). It explores the relationship between properties of chemical substances and their biological endpoints to develop a statistical model, and it is based on the fundamental principle that the difference in structural properties is responsible for observed variations in the biological response of the molecules (Verma, Khedkar, & Coutinho, 2010). The concept that molecules interact non-covalently through their electronic properties such as electrostatic and van der Waals forces, revolutionized the perspectives of medicinal chemistry arena and essentially made way to pioneered research on molecular interaction fields (MIFs) (Cramer, Patterson, & Bunce, 1988;Wermuth, 1996). A MIF explains the spatial variation of the interaction energy between a molecular target (a macromolecule or a chemical substance) and a chosen probe (Wade, 1993). Particularly, the progress in the MIFs studies was attributed to the development of three popular QSAR methods, Comparative Molecular Field Analysis (CoMFA) (Cruciani, 2006), Comparative Molecular Similarity Indices Analysis (CoMSIA) (Klebe, Abraham, & Mietzner, 1994), and the GRID force field (Goodford, 1985). The terminology MIF has been used synonymously with the terms, molecular field analysis (MFA), field-based QSAR and GRID-based QSAR regardless of the QSAR softwares, in the scientific community (Consonni & Todeschini, 2010).
A brief introduction of various molecular field technologies is given here. The standard CoMFA method involves the placement of aligned small molecules centrally in a grid box or lattice grid with a regular spacing or interval. Further, the steric and electrostatic potential energies were computed using a probe at all intersections of the lattice and subsequently, utilized as independent variables to correlate with biological endpoints using partial least square (PLS) statistical technique (Cramer et al., 1988;Cruciani, 2006). Various breakthroughs in CoMFA method have been witnessed since decades and discussed elaborately in the literature (Consonni & Todeschini, 2010;Verma et al., 2010;Wade, 1993). To overcome some of the limitations in CoMFA method, CoMSIA was developed which calculates molecular similarity indices from Steric and Electrostatic ALignment (SEAL) similarity fields (Kubinyi, Hamprecht, & Mietzner, 1998), to generate steric, electrostatic, hydrophobic, and H bonding descriptors. The implementation of bell-shaped Guassian functions of SEAL fields that provide good approximations of Lennard-Jones and Coulomb potentials in contrast to its steep nature in CoMFA, made CoMSIA a famous QSAR tool (Klebe, Mietzner, & Weber, 1994). The GRID program based on GRID force field emerged as an alternative to the CoMFA method for estimating interaction fields and possesses an advantage of employing a comparatively smoother 6-4 Lennard-Jones potential than 6-12 form in CoMFA (Goodford, 1985). Moreover, this program applied a new concept of variable selection and reduction called GOLPE (Generating Optimal Linear PLS Estimations) approach to recognize highly informative variables from the descriptor pool, and hence, this technique is collectively called GRID/GOLPE approach (Baroni et al., 1993). The application of molecular field extrema as descriptors for QSAR modeling was introduced in FieldAlign program to overcome grid restrictions and sample large numbers of descriptors (Cheeseright, Mackey, Rose, & Vinter, 2006). These descriptors were derived from the eXtended Electron Distribution force field which treats the electronic distribution as multipole in accordance with the quantum orbital descriptors rather than conventional atom-centered charges and utilizes Gaussian functions (Vinter, 1994).
The assumption of a nonlinear relationship between the dependent and independent variables in k-nearest neighbor (kNN) method benefitted with other stochastic variable selection methods such as simulated annealing and evolutionary algorithms, allowed the development of QSAR models without the need to obtain a single linear equation. In this method, kNNs of any molecule are acquired by computing the distance between the descriptors chosen from numerous variable selection methods (Ajmani, Jadhav, & Kulkarni, 2006). Field-based QSAR (also known as force field for CoMFA-like models and Gaussian for CoMSIA-like models) is a tool from Schrödinger, LLC which works on the basic principles of CoMFA/CoMSIA and utilizes a specific set of parameters including the derivation of steric potentials from the OPLS (Optimized Potentials for Liquid Simulations)-2005 force field, hydrophobic properties from CLOGP fragmental methods, and H bond acceptor (HBA) and donor fields from its own Phase pharmacophore module (Field-Based QSAR (version 2013-3), 2013). A free, open-source chemometric tool designed to explore pharmacophore by high-throughput MIFs analysis and equipped with automations through scriptable interface, flexibility and interoperability with external programs as well as algorithm parallelization for variable selection and model validation to achieve high performance, make Open3DQSAR a powerful tool in pharmacophore assessment and ligand-based drug design (Tosco & Balle, 2011).
Literature survey indicates that majority of the fieldbased QSAR models were largely affected by the variable selection methods and molecular treatment such as the force field parameters for steric field descriptors and atomic partial charge calculation methods for electrostatic fields, besides the primary limitations of appropriate 'bioactive' conformational sampling and alignment (Kubinyi, 1997a(Kubinyi, , 1997b. The choice of suitable and accurate electrostatic potentials assignment scheme is a crucial task in QSAR work as there are no strict guidelines for the selection of charge models (Tsai et al., 2010). The most commonly used molecular mechanics (MM) force fields (empirical and semiempirical) to calculate MIFs facilitates the development of robust QSAR models. However, the high-throughput assignment of atomic partial charge to molecules via quantum mechanics (QM) approach (ab initio methods) are time consuming and computationally demanding (Cruciani, 2006;Tsai et al., 2010). Most of the CoMFA-and CoMSIA-based QSAR studies that focused on reporting the most appropriate charge calculation schemes based on statistical measures (e.g., r 2 , q 2 ) do not provide consensus guidelines rather than the recommendation of superior quality of QSAR models derived from QM and/or semiempirical approaches in comparison with MM methods (Choo, Choi, Jung, Koh, & Pae, 2003;Kroemer, Hecht, & Liedl, 1996;Madhavan, Gadhe, Kothandan, Lee, & Cho, 2012;Tsai et al., 2010). Contrastingly, we were interested in the study of activity-correlating molecular regions by various electrostatic schemes in an understanding that the selection of suitable atomic charge models facilitates the mapping of crucial molecular regions that may be missed or remain unexplored if we rely on single charge model. The latest progress in computational strategies for medicinal plant-based drug discovery, structural bioinformatics, and related methods has proved extremely useful (Kumar, George, Jasrai, & Pandya, 2014;Kumar, Jasrai, Mehta, & Pandya, 2014;Kumar, Pandya, Desai, & Jasrai, 2014). There are quite many charge models, and methods have been used for assigning charges include AM1, AM1-BCC, CFF, Del Re, Gasteiger, Gasteiger-Hückel, Hückel, Merck Molecular Force Field (MMFF), PRODRG, Pullman, and VC2003. Each of them has its own advantages and disadvantages. Knowing the maximum diversity in the target molecules and method of assigning charges, we have selected most commonly used six of them namely, Gasteiger and Marsili (1980), Del Re, Pullman, and Yonezawa (1963), and MMFF (1996, Hückel (1931), Gasteiger and Saller (1985), and Berthod, Giessner-Prettre, and Pullman (1967) available in VLife MDS (2008) and SYBYL suites (2010), to map activitycorrelating molecular regions and to obtain a consensus impact of descriptors from charge scheme-based QSAR models, were the primary objectives of this study. The resultant generalized QSAR model was then compared with the Open3DQSAR (2011) model to judge the predictions.

Dataset partitioning and alignment
The selection of training and test sets along with the compound number scheme was adopted from respective studies. The training set constituted diverse molecules spanning various chemical classes and encompassed an activity range of 4-5 orders of magnitude. For example, the COX-2 dataset was divided into 24 training and 22 test sets based on Chopra et al. (2008) COX-2 pharmacophore study (Chopra et al., 2008). The structures of the molecules were drawn and 3D optimized using Marvin Sketch 5.11.4 program (ChemAxon LLC) (Marvin (version 5.11.4), 2012) and exported in structure data format (SDF). YASARA Structure (academic license) software (Krieger, Darden, Nabuurs, Finkelstein, & Vriend, 2004) was employed to align molecules using atom-to-atom matching technique as the dataset constituted diverse chemical classes of molecules and do not contain a common maximal common substructure to act as template to guide template-based alignment. Prior to alignment, conformational search was executed using conjugate gradient energy minimization technique (atom-typed using MMFF force field, 10,000 iterations, root mean square gradient = 0.001 kcal/mol) to obtain feasible conformers of molecules as input for optimal molecular alignment and returned as alignment file in single SDF format. This alignment file was procured as input to calculate molecular

Calculation of molecular field descriptors and commonalities in VLife MDS, SYBYL, and Open3DQSAR softwares
The procedure for developing molecular field-based QSAR follows the standard prototype of CoMFA method. The list of recommendations to enable reproduction of CoMFA results, given by Hugo Kubinyi was strictly followed (Kubinyi, 2008). We used VLife MDS (2008), SYBYL (2010), and Open3DQSAR (2011) softwares to build field-based QSAR and constituted the following commonalities (otherwise mentioned in the next section) in its respective calculations. A sp 3 hybridized carbon probe of charge +1 was used to calculate the interaction energies between probe and ligand atoms in a grid box with the customized grid settings in x (For example, COX-2 dataset: start: -11.0000; end: 17.0000), y (start: -23.0000; end: 19.0000), and z (start: -13.0000; end:15.0000) dimensions using a grid spacing of 2 Å. The distance-dependent dielectric function was invoked with a dielectric constant of 1.0. The electrostatic and steric fields interaction energies were refined by the specification of cutoffs for electrostatic field at 10.0 kcal/mol and steric field at 30.0 kcal/mol, respectively, for inclusion as field descriptors. The steric field was computed between n atoms of ligand and sp 3 carbon probe; based on Lennard-Jones' 6-12 potential using AMBER FF99 (Wang, Cieplak, & Kollman, 2000) van der Waals parameters (Open3DQSAR) and Tripos force field (2006) (VLife MDS and SYBYL CoMFA), respectively.
The Lennard-Jones potential can be expressed in the following equation (Cramer et al., 1988;Wade, 1993), Here, r ij is the distance between atom i of the molecule and the grid point j where the probe atom is located. A ij and B ij are the constants associated with the van der Waals radii of the corresponding atoms.
Additionally, the electrostatic field terms are calculated based on Coulomb potential function and can be generalized (Cramer et al., 1988;Wade, 1993) as follows; Here, r ij is the distance between atom i of the molecule and the grid point j where the probe atom is located. D is the dielectric constant, q i is the atomic partial charge of the ligand atom i, and q j is the charge of the probe j.
Besides the commonalities in the field treatment of molecules in these softwares, PLS modeling was ensured to develop QSAR models.

Specifications in VLife MDS, SYBYL, and Open3DQSAR softwares
We described here the complete details of selection of atomic partial charge schemes, variables sampling, and visualization settings implemented in VLife MDS (2008), SYBYL (2010), and Open3DQSAR (2011) softwares. As we compared the various atomic partial charge-based QSAR models developed from VLife MDS and SYBYL, default specifications were given to Open3-DQSAR software to construct conventional QSAR model. This step ensured the mapping of variables that explain activity-correlating regions by an external program (here, Open3DQSAR, 2011) and provided explanation of MM-based force fields toward the robustness and predictivity of field-based QSAR models. Thus, Open3-DQSAR acted as standalone QSAR tool and rely on the point charge model of Coulomb potentials to gauge electrostatic field descriptors, and also on the Lennard-Jones' 6-12 potential atom-typed using AMBER FF99 (Wang et al., 2000) van der Waals parameters to compute steric field descriptors.
Various electrostatic potentials including Gasteiger and Marsili (1980), Del Re et al. (1963), and MMFF (1996 were assigned using 'Charge type' option in 'Field Compute' settings of VLife MDS. After the calculation of field descriptors, the 'Remove invariant column' command was given to exclude statistically insignificant descriptors and chosen stepwise variable selection method. This stepwise procedure begins by developing a trial model step by step with a single field variable and to each step, and one field variable is added and subsequently inspect the fit of the model. This step is iterative until no more significant variables remaining outside the model are left. The selection of variables during this process was obliged with the following conditions. The cross-correlation limit between variables was set to 0.5, the number of maximum variables of 10, the term selection criteria of q 2 , variance cutoff of 0.0, selection of autoscaling procedure, the specification of Fischer (F) statistic to evaluate the significance of the selected variables. The highly informative lattice descriptors were represented as spherical balls at the lattice intersection in the VLife MDS workspace, to map the activity-correlating molecular regions and its chemical nature (electropositivity, electronegativity, steric favoring and unfavoring regions) which was studied by its arithmetic sign (+ or -) assigned to descriptors in the final PLS equation.
CoMFA models were built in SYBYL suite (2010) using Hückel (1931), Gasteiger and Saller (1985), and Berthod et al. (1967) charge schemes. The alignment generated from YASARA Structure was procured as a molecular database file in SYBYL suite. CoMFA fields were calculated with 'Tripos Standard' field class option.
No smoothing of field variables was applied and dropped electrostatic columns within steric cut-off for each molecule. Using spreadsheet dialog, PLS models were developed with cross-validation (q 2 ) to recognize the optimal number of principal components (PCs) used to develop model. Field variables whose energy variance are less than 2 kcal/mol were omitted using 'column filtering' option. Subsequently, PLS models were produced without cross-validation and specified the optimal PCs to obtain r 2 value. The CoMFA contour maps generated by accounting the aligned chemical groups in the dataset were used to project over selected molecules (most active and inactive) in the integrated SYBYL molecular viewer.
Subsequent to field descriptor calculations in Open3-DQSAR, several variable selection procedures were employed. In the pre-treatment operation phase, zeroing of grid values that was close to 0 (level = 0.05), standard deviation (SD) cutoff that removed variables having a SD less than 2 to improve the signal-to-noise ratio, Nlevel variable elimination which discarded variables containing different energy values across the different objects to avoid bias in the model, were performed to determine the poorly informative variables and eliminate them. We used block unweighted scaling procedure to normalize multiple sets of field descriptors securing different magnitudes (Kastenholz, Pastor, Cruciani, Haaksma, & Fox, 2000). To develop optimal PLS models, some advanced variable selection methods were also applied which include smart region definition (SRD) (Pastor, Cruciani, & Clementi, 1997) and fractional factorial design (FFD) (Baroni et al., 1993). SRD attempted to group variables based on the mapping of descriptors in 3D space and reduced redundancy that originates from multiple nearby descriptors which encode similar information (critical and collapse distance of 1.0 and 2.0 Å were set). FFD was used to select group of variables (type = leave-many-out; 20 runs) that have largest effect on model predictivity from a pool of descriptors refined by earlier SRD run. The PLS pseudocoefficients contour map depicting the activity-correlating molecular regions were visualized using PyMOL graphical package (academic license) (PyMOL, 2009). The Open3DQSAR and CoMFA contour maps were produced as scalar products of coefficients and SD related with each column.

Evaluation of the QSAR models
The PLS method was used to develop QSAR models in its respective software applications. We build PLS models considering the field-based descriptors as independent variables and pIC 50 as dependent variable. Several statistical measures were studied in PLS analyses for evaluating QSAR model's quality and predictivity. These include coefficient of determination (r 2 ), leave-one-out (LOO) cross-validation (q 2 ), F test, and number of principal components (PCs). We selected the same set of criteria used by Tsai et al. (2010) to compare different electrostatic potentials on prediction accuracy in CoMFA and CoMSIA studies. A q 2 value of at least 0.5 was chosen as the criterion for QSAR robustness. Further, the internal prediction accuracy was assessed by r 2 ≥ 0.5 and q 2 ≥ 0.5, for training and test sets, respectively (Tsai et al., 2010).

Assessment of best scoring QSAR models
The statistical measures r 2 and q 2 have been conventionally used to assess the best scoring QSAR models (Verma et al., 2010;Wermuth, 1996) wherein PLS method is used to develop relationship between biological endpoints (here, pIC 50 of selected dataset) and the molecular field descriptors chosen by various variable selection methods of respective QSAR programs (Table 1 and Supplementary Table 1). The r 2 value (variance explained by the PLS model) of QSAR models based on Gasteiger-Marsili, Del Re, MMFF, Hückel, Gasteiger-Hückel, and Pullman charge schemes indicated that QSAR models based on Hückel and Gasteiger-Hückel charge schemes outperformed in most of the cases when compared against respective charge-based QSAR models. The internal predictivity and robustness of the QSAR models were examined by q 2 parameter, and the best scoring PLS models were selected by the highest q 2 value. No consensus was obtained to show the superiority of charge models. Further, the difference between r 2 and q 2 values of various QSAR models was at least~4.0. Noticeably, the difference was only 2.90 for Hückel-based charge models. This suggests that charge scheme may be directly related to the molecular dataset and thereby requires examination of all available charge schemes. Although the best scoring QSAR models were selected based on the compliance of guideline related to prediction accuracy (i.e., QSAR model scoring r 2 ≥ 0.5 and q 2 ≥ 0.5 (Tsai et al., 2010), all the presented PLS models were statistically significant describing the importance of descriptors in relation to its observed pIC 50 .
The objective of exploring various charge models to recognize activity-correlating molecular regions is comprehensively detailed in the COX-2 inhibitors dataset. The observed and predicted activities of the selected COX-2 inhibitor dataset are listed in Table 2 and graphically plotted in Figure 1. The robustness of this method was studied using ten diverse datasets and briefly explained in the later sections. The contribution of various field-based descriptors studied from VLife MDS and SYBYL suites to build PLS relationship in the selected charge-based QSAR models is given in Supplementary   Supplementary Figures 3 and 4, respectively. The observed and predicted activities for the ten datasets are listed in Supplementary Table 2 and graphically shown in Supplementary Figure 5, respectively.

COX-2 inhibitory activity-correlating molecular regions
The COX-2 inhibitory activity-correlating molecular regions based on the selected 3D alignment of dataset were studied by the respective mapped field descriptors from the selected charge-based QSAR models. The molecular regions were annotated based on the chemical structure of the most active COX-2 inhibitor (4-[4-fluoro-5-phenyl-3-(trifluoromethyl)-1H-pyrazol-1-yl]benzene-1-sulfonamide, 1, IC 50 = 1.7 nM) and were divided into three regions, viz. region around (i) trifluoromethane group, (ii) sulfonamide moiety, and (iii) functional groups connected to pyrazole core structure other than regions (i) and (ii), respectively, to facilitate the mapping of consensus field descriptors (Figures 2 and 3).

Set of various charge-based MIF descriptors mapped in the region around trifluoromethane group
One pair of consensus electronegative field descriptors was mapped by the Del Re (E_1998)-and MMFF (E_1751)-based charge models around trifluoromethane group with a unique electronegative (E_1986 of Gasteiger-Marsili model) and two unique steric (S_1726 and S_2517 of Del Re model) descriptors. The consensus descriptors, E_1998 and E_1751, signified the importance of electronegative groups in this region and mapped the electropositive groups present in the COX-2 moderately active molecules. E_1998 showed the electropositive groups connected with heterocycles which should be avoided to develop active COX-2 inhibitors. This descriptor mapped CH 3 (7), neutral charge of cyclohexane (8) and fused aromatic groups with S=O (13) and -O-((CH 2 ) 14 ) (24), respectively. Similar to E_1998 of Del Re model, E_1751 descriptor of MMFF model projected above the electropositive moieties vide the electropositive cloud above methylthiophene (7), neutral charge of cyclohexane (8), the electropositive region above substituted indole group (11), the neutral system of thiochromene (13) connected with fluorine and methoxy groups (adjacent electronegative groups) at 6th and 7th positions, respectively. Conclusively, Del Re and MMFF partial charge schemes pointed out the avoidance of electropositive groups that are present in the moderately active COX-2 inhibitors. The unique electronegative E_1986 descriptor of Gasteiger-Marsili model showed the electronegativity conferred by CF 3 group is beneficial for COX-2 inhibitory activities. This group is present in the most active dataset of COX-2 inhibitors. The contour plots developed from Hückel, Gasteiger-Hückel, and Pullman charge-based QSAR models revealed an electropositive cloud above the heterocycles in the trifluoromethane region as noted by E_1998 (Del Re) and E_1751 (MMFF) descriptors. However, an extensive electropositive (blue) contour patch was observed in Hückel model compared to Gasteiger-Hückel and Pullman CoMFA models, respectively. Similar to the presence of electronegative descriptor pair (E_1998 and E_1751), the projection of fluorine atoms from trifluoromethane group in the aligned space generated two corresponding electronegative (red) small contours in the entire CoMFA models. The steric descriptor S_1726 depicted the preference of less bulky groups such as CF 3 in this region. Indirectly, S_2517 of Del Re model represented the bulky substituents such as methylthiophene (7) and fluorine and methoxy groups (6th and 7th positions) attached with thiochromene (13), are the less favored functional groups. Varied contour magnitude (size and shape) for the steric disfavored region (yellow) was graphically illustrated for chemical connectivities around trifluoromethane group with commonality in steric favorness (green) for CF 3 group in all the CoMFA models. Collectively, the MIF descriptors mapped in this region around trifluoromethane group demonstrated the preference of less bulky electronegative groups. In addition, the electropositive and neutral heterocycles that are bulky groups observed in the moderately active COX-2 inhibitors were highlighted.

Set of various charge-based MIF descriptors mapped in the region around sulfonamide moiety
The aligned chemical groups around sulfonamide moiety constituted a unique steric and two pairs of consensus electrostatic MIF descriptors. The consensus electropositive descriptor pair was contributed from Gasteiger-Marsili (E_2034)-and MMFF (E_2281)-based charge models, whereas the consensus electronegative descriptor pair was derived from Gasteiger-Marsili (E_1254)-and Del Re (E_2306)-based charge models. The E_2034 descriptor mapped to NH 2 group (an electropositive) present in the sulfonamide moiety from the most active COX-2 dataset, which enhances activity. This corresponded to H bond donor (HBD) feature of Chopra et al. (2008) pharmacophore hypothesis (Chopra et al., 2008). Similar information was encoded by E_2281 descriptor of MMFF model. The Gasteiger-Hückel-and Pullman-CoMFA models exhibited an electropositive contour above NH 2 group whereas Hückel-charge model did not highlight this requirement. This is one among the important points that classical Hückel charges may not comprehensively account the electronic distribution of the molecular system where extended Hückel semiempirical models can be advocated. This observation reiterates the examination of more empirical charge schemes when interpreting SAR data and development of QSAR models over the preference of convenience and speed in calculations. Conclusively, the NH 2 group with electropositive property influences COX-2 inhibitory activity. In pharmacophore terms, HBD is preferred in this region. Further, the E_1254 from consensus descriptor pair projected above the S=O (an electronegative) functional group of sulfonamide moiety from most active COX-2 molecules. This chemical group is encoded as HBA feature in the pharmacophore model of Chopra et al. (2008). Similarly, this information was mapped by E_2306 descriptor in its Del Re model. Hence, the S=O group possessing an electronegative character increases the COX-2 inhibitory activity. In pharmacophore terms, HBA is preferred in this region. The CoMFA contour maps from Hückel, Gasteiger-Hückel, and Pullman models represented the steric favorness of sulfonamide moiety in COX-2 inhibitors. Steric disfavorness was noticed in molecules that expand beyond the sulfonamide group into the aligned space majorly from the adjacent groups connected to pyrazole scaffold. Noticeably, the pharmacophore model of Chopra et al. (2008) failed to account both of these features (HBA and HBD) in a single model due to the artifact in the interfeature spacing parameter in Catalyst/HypoGen program (Catalyst, 2008;Chopra et al., 2008). Additionally, the unique steric descriptor S_2553 of MMFF model described the bulkiness of sulfonamide group as an important sub-structure for enhanced COX-2 inhibitory activities. Moreover, the most active molecules from training and test sets of COX-2 inhibitors especially contained this group. Collectively, the combination of electropositive and electronegative properties of sulfonamide moiety with steric bulkiness influences COX-2 inhibitory activities.

Set of various charge-based MIF descriptors mapped in the region connected to pyrazole core structure excluding regions (i) and (ii)
This region contained a triplet of consensus steric descriptor derived from the various best scoring charge models, along with three unique electrostatic descriptors, respectively. The consensus triplet descriptors were S_1472 (Gasteiger-Marsili model), S_942 (Del Re model), and S_942 (MMFF model) with negative coefficients. The S_1472 descriptor indicated that the chemical complexities due to substitutions at phenyl group (1) attached to pyrazole core structure can decrease activity. Similar to S_1472 descriptor of Gasteiger-Marsili model, the Del Re S_942 descriptor mapped the substitutions at heterocycles connected to pyrazole group, which include benzyloxy (14), benzodioxole (18), toluene (21), and ethoxybenzene (22) groups from moderately active and inactive COX-2 inhibitor dataset. This descriptor information was also encoded by S_942 of MMFF model. The CoMFA models showed variations in steric requirements of the COX-2 inhibitors. Steric disfavorness over chemical groups that were combinatorially explored in place of the phenyl system (compared to compounds 1 and 2) was exclusively depicted in the Hückel chargebased model. The Gasteiger-Hückel and Pullman indicated steric favorness of the aromatic group with fewer complexities. Conclusively, these consensus descriptors outlined the chemical complexities connected to pyrazole core structure through substitutions, which should be avoided to increase COX-2 inhibitory activity. The unique electronegative E_1200 descriptor of Gasteiger-Marsili model showed the preference of electronegative group at the lattice point 1200, which mapped the neutral benzyloxy group connected to thiophene core group (14) that decreases activity. The E_1202 of Del Re model demonstrated that the relatively less electronegative groups compared to phenyl moiety of compound 1 are preferred. This descriptor mapped phenyl moiety (1, 2 and 4), difluoro-isothiochromene group fused with pyrazole system (3) and 3-bromo-2,5-dihydrothiophene moiety (5) that conferred electropositive or neutral regions from most active set. Relatively less electronegative substitutions are preferred in the sulfur-containing pyrazole fused heterocycles (3 and 6) as mapped by the E_1190 of MMFF model. Similar to earlier CoMFA models discussion, the electronegativity from the most active set is emphasized in Hückel model whereas electropositivity is highlighted in the moderately active set by Gasteiger-Hückel model. Pullman-based charge model did not suggest any role of electropositivity and/or electronegativity via contour map depiction. Collectively, the complex substitutions connected to pyrazole core group must be avoided and alternatively, less electronegative groups can be substituted as connector to pyrazole core structure.

Division of Open3DQSAR contour map to analyze descriptors from various charge models
The contour plot obtained from developed field-based QSAR model using Open3DQSAR program was divided into three regions as above mentioned. The descriptors mapped by various charge scheme-based models were compared with Open3DQSAR contour map to judge the predictions. The activity influencing electropositive and electronegative contours were shown in yellow and green colors, respectively. The steric favoring and unfavoring regions were depicted in red and blue colors (Figure 4).

Interpretation of contour plot in the region around trifluoromethane group
Compounds with bulky substituent should possess electropositive functional groups such as benzene (9) and heptane (12) toward COX-2 activity enhancement. A small yellow patch corresponded to electropositive surface above pyrazole core group. This contour mapped at carbon atoms of pyrazole system excluding the electronegative center of N atoms and these features are common in most of the COX-2 inhibitors. This contour information agrees with the electronegative field descriptors, viz. E_1998 (Del Re), E_1751 (MMFF) and red contours (Hückel, Gasteiger-Hückel, and Pullman), respectively. Further, the position of CF 3 on pyrazole core structure is influential in COX-2 activity enhancement as steric positive (red) contour field. The S_1726 of Del Re model and CoMFA contours mapped similar descriptor information. In addition, adjacent substitutions to CF 3 on pyrazole reduce the activity as shown by steric negative (blue) contour field.

Interpretation of contour plot in the region around sulfonamide moiety
Similar to E_2034 (Gasteiger-Marsili model) and E_2281 (MMFF model), the electropositive contour mapped the NH 2 group of sulfonamide moiety. Further, the electronegativity conferred by S=O was shown in green contour which matched the descriptor information of E_1254 (Gasteiger-Marsili model) and E_2306 (Del Re model). The bulkiness of sulfonamide moiety mapped by S_2553 (MMFF model) is influential in the most active COX-2 inhibitors. The CoMFA models depicted similar steric and electrostatic descriptors information.
3.9. Interpretation of contour plot in the region connected to pyrazole core structure excluding regions (i) and (ii) The electropositivity conferred by heterocycles is shown in yellow contour, which correlate with the E_1202 descriptor of Del Re model.

Application of this approach to ten selected datasets
We selected an additional dataset of ten diverse datasets with support of available pharmacophore and QSAR studies to evaluate the robustness of this approach and briefly explained in this section. The VLife MDS, SYBYL CoMFA, and Open3DQSAR coefficient plots are shown in Figures 5-9, respectively. 3.10.1. Apoptosis inducing 4-aryl-4H-chromenes Sciabola et al. (2007) utilized triplets of pharmacophoric points (TOPP) descriptors to study SAR of 4-aryl-4Hchromenes having apoptosis inducing effect. The combination of pharmacophoric points, viz. dry-dry-HBA, dry-HB donor-HB acceptor and dry-HB acceptor-HB acceptor, is most prevalent in potent molecules and showed the importance of substitutions at 3rd and 5th positions of phenyl and pyridyl rings (Sciabola et al., 2007). The steric favoring descriptors S_944 (Del Re model) described the importance of bulkiness at 7-amine position of chromene scaffold. The electropositivity conferred by chromene scaffold (Gasteiger-Hückel and Pullman models), whereas the electrostatic negative potential of chromene oxygen (Open3DQSAR) is crucial for apoptosis inducing behavior. Additionally, the 3rd position of benzodiozole is an important determinant where bulkiness (S_484 -Del Re and MMFF models) and electronegativity (Open3DQSAR) are dominant. Adjacently, steric affecting descriptor (S_295all VLife models) indicated bulkiness beyond methyl group (2nd position of benzodiozole) drastically reduces the activity. Bulky electropositive (S_235 -Del Re and MMFF; CoMFA models) along with intermediary electronegative oxygen (E_156 -Gasteiger-Marsili, CoMFA, and Open3-DQSAR) is activity governing molecular regions.

Azaaurones with anti-malarial activity
In strong association with the pharmacophore model proposed by Sharma et al. (2013), the electron donating groups depicted by E_632, a unique descriptor of Gasteiger-Marsili model, plays a crucial role in governing reactivity of the azaaurones series due to its less steric bulkiness (CoMFA models) characteristics (Sharma et al., 2013). The oxygen atom in the aurone scaffold     was mapped by entire CoMFA models. Further, the steric favorness by small lipophilic electron-withdrawing groups was demonstrated by CoMFA and identical electrostatic descriptors of VLife MDS models (E_1099 from Del Re and E_1100 from Gasteiger-Marsili and MMFF models). The electronegative bulkiness (E_979 from Del Re model, entire CoMFA; Open3DQSAR) at 3rd position of benzene connected azaaurone scaffold where functional groups extending beyond propyl and cyclic groups decreases activity. The less steric region at 5th position of benzene should be exploited with diverse chemical groups as designated by identical descriptor S_664 of entire VLife MDS and Gasteiger-Hückel charge models. Additionally, less electronegative substitutions as developed by H and CH 3 groups (E_524 from MMFF and entire CoMFA models) are convenient groups to obtain increased anti-malarial potency. Talele et al. (1999) developed CoMFA models on azole dataset using Austin model 1 (AM1) charge scheme (Talele et al., 1999). The activity trend in the chosen dataset indicates a linearly connected heterocyclic (up to 4) groups in the molecules correlate with antifungal activity where the terminal groups such as imidazole and piperazine contribute to bulkiness (S_2308 from Del Re; S_1411 from entire VLife MDS models) with electrostatic polarizations, viz. E_1775 (Del Re model) and Open3DQSAR model marking N atoms and aromatic system. Such long lipophilic moieties represent correlation with major hydrophobic pocket of cytochrome P-45,0 14αDM (Talele, Hariprasad, & Kulkarni, 1998). The most important insight is the observation of identical steric disfavoring descriptor from all QSAR models projected above the phenyl, thienyl, benzothiophene, and naphthyl rings present in the least active molecules. Similar to mapping of azole nucleus by electronegative patches (Open3DQSAR), the unique electrostatic descriptor E_1436 from Gasteiger-Marsili model points out this requirement.

Azoles with antifungal activity
3.10.4. CDK2 inhibition based on pyridines and pyrimidines scaffold QSAR study using diverse series of pyridines and pyrimidines scaffolds for CDK2 inhibition showed that electron-withdrawing properties on this scaffold with HBD and steric attributes may help to design potent CDK2 inhibitors with special emphasis on the reduction of HBAs and degree of branching emanating from molecule's center (Babu et al., 2008). Relatively less electronegative substitutions are preferred in relation to oxygen and sulfur containing groups above pyridine and pyrimidine core structures enhance CDK2 activity which represents bulkiness (S_2716 from MMFF model) with electronegative characteristics (CoMFA models). Descriptors that map scaffold represent the electronegative properties of its N atom (E_827 from Del Re; E_989 from MMFF models) where the hydrophobic substituents (S_140 from Del Re; CoMFA and Open3DQSAR models) may hinder CDK2 potency. The Open3DQSAR model in this case fails to demonstrate prominent regions possibly due to field variable cutoff.

CYP2B6 inhibitors set
The pharmacophore hypothesis developed by Roy and Roy (2010a) showed that an electron-rich pyridine group and its connected aromatic groups such as phenyl and naphthalene that are encoded as the HBA and ring features to study CYP2B6 inhibition (Roy & Roy, 2010a). Electropositive descriptors such as E_786 from Del Re, E_809 from Gasteiger-Marsili, E_883 from MMFF models, and CoMFA contour region projected above N containing heterocycles dictates relative less electronegative groups are preferred, whereas its terminal must be explored further by bulky functional groups as indicated by S_1015 of Del Re, CoMFA, and Open3DQSAR models, respectively. This steric requirement holds similar notion on both of the terminal regions with variable substitution sites are preferable with at least two heterocycles connected core groups.
3.10.6. CYP19 inhibitors Roy and Roy (2010b) performed docking and 3D-QSAR studies on a diverse dataset of aromatase (CYP19) inhibitors and concluded the role of one or two HBAs with optimal hydrophobicity as a primary activity governing properties (Roy & Roy, 2010b). The bulkiness around imidazole scaffold is negatively influencing CYP19 activity as mapped by identical pairs of steric descriptors, S_1852 (Del Re and Gasteiger-Marsili models) and Open3DQSAR coefficient maps. Further, the electronwithdrawing properties of the substitution sites are favorable as demonstrated by E_826 and E_1096 (Gasteiger-Marsili model) and E_780 (MMFF model), respectively. Similar details were also mapped by Open3DQSAR model. Scaffold such as imidazole and indole groups largely contributes to electronegative potential. The preference of NO 2 and CN are also represented by electronegative descriptor clusters (electrostatic descriptor placed nearby in VLife MDS models), whereas the Open3DQSAR model provided hints on more speculative regions with small polyhedra contour region.
3.10.7. Anti-HIV-1 RT activity of 2-amino-6arylsulfonylbenzonitriles and its related thio-and sulfinyl congeners Roy and Leonard (2004) studied SAR of 2-amino-6-arylsulfonylbenzonitriles and its thio-and sulfinyl congeners having anti-HIV-1 reverse transcriptase (HIV-1 RT) activity tested on MT-4 cell lines (Roy & Leonard, 2004). Similar to trifluoromethane's detrimental property as noted in the SAR study, the electronegativity (E_643 from both Del Re and Gasteiger-Marsili; CoMFA and Open3DQSAR models) conferred along with its bulkiness (S_525 from Del Re; CoMFA and Open3DQSAR model) reduces activity. The steric (S_532 from entire VLife MDS models) and electronegative (E_643 from Del Re and Gasteiger-Marsili models; entire CoMFA models) properties were shown by the sulfone moiety in the scaffold which possesses HIV-1 RT activity enhancing characteristics. A unique descriptor S_176 from Del Re model showed the prominent role of ortho-methoxy in HIV-1 RT activity. These Open3DQSAR models revealed the effect of substitutions at ortho position and presence of sulfone moiety with distribution of CF 3 group can increase HIV-1 RT potency.

LpxC inhibitors
The cell wall biosynthetic enzyme LpxC is an attractive target for Pseudomonas aeruginosa therapeutic applications (Kadam & Roy, 2006). A unique steric incremental descriptor S_1190 from all VLife and CoMFA models showed fluorophenyl group with electronegative properties based on associated halogen atoms (e.g., Cl) plays important core functionality in LpxC inhibitors. Additionally, hydrophobic carbon chains attached to this core group indicates steric disfavorness (Open3DQSAR). Further, another unique steric descriptor, S_1347 (Del Re and Gasteiger-Marsili), S_1344 (MMFF), and CoMFA contour patches represents extensive branching of O and N atoms drastically reduces activity where halogenation at either C2 or C3 positions enhances LpxC potency. These observations may be related to the utmost requirement of less electron density at ligand-receptor site (Kadam & Roy, 2006).
3.10.9. PfPK7-imidazopyridazine-based inhibitors Sahu et al. (2011) performed P. falciparum protein kinase 7 (PfPK7) QSAR studies using VLife MDS software (Sahu et al., 2011). Similar to the electropositivity incremental role at 7th position of imidazopyridazine scaffold, the 4-cyanophenyl group correlate with most active molecules of the dataset as shown by E_609 (Del Re), E_572 (Gasteiger-Marsili), E_572 (MMFF) and electropositive contours of CoMFA models. The steric bulkiness produced by fluoropyridine and morpholine connected with phenyl group correlates with decreased PfPK7 inhibitory activity as noted by unique descriptor S_849 (MMFF model), CoMFA and Open3DQSAR models. Electropositivity conferred by aromatic carbons (exclusive of N atoms) was also highlighted in CoMFA models. Additionally, steric groups other than CN such as carboxamide, cyanomethide, hydroxyl, and imidazole deteriorate PfPK7 activity.

Thienol [3,2-d] pyrimidines with phosphodiesterase IV inhibitory activity
We noticed that the QSAR observation follows the report of CoMFA model developed by Chakraborti et al. (2003). The nitrogen containing pyrimidines that confer electronegative property was mapped by E_844 (Del Re), E_720 (Gasteiger-Marsili and MMFF models) and Open3DQSAR models. Further, the electropositive character of aromatic rings (E_876 from Del Re, CoMFA and Open3DQSAR models) such as 4-phenoxy butyl is detrimental to the activity. In addition, the steric hinder provided by such groups was predicted by S_862 (Gasteiger-Marsili model) and Open3DQSAR also suggests the selection of appropriate electronegative chemical groups with less bulkiness. Further, the thienol scaffold should be explored further as S_456 (Del Re), S_314 (Gasteiger-Marsili and MMFF model) and Open3-DQSAR coefficient patch projected above this group.

Limitations and future directions
Compared to empirical charge schemes, quantum mechanical (QM) and quantum chemical (QC) descriptors offer a more precise and comprehensive description of electronic effects of the molecules derived from wave function and can be directly applied to study QSAR (Franke, 1984). Even though the calculations are hardware intensive and time consuming, these methods have gained wide acceptance in correlating biological response (Karelson, Lobanov, & Katritzky, 1996). The most popular semiempirical methods are extended Hückel theory (Hoffmann, 1963), complete neglect of differential overlap (CNDO) (Pople, Santry, & Segal, 1965), intermediate neglect of differential overlap (INDO) (Pople, Beveridge, & Dobosh, 1967), modified INDO (MINDO) (Bingham, Dewar, & Lo, 1975), modified neglect of diatomic overlap (MNDO) (Dewar & Thiel, 1977), Austin model 1 (AM1) (Dewar, Zoebisch, Healy, & Stewart, 1985), and parametric model 3 (PM3) (Stewart, 1989). The major strength of quantum-based methods is its accountability to directly characterize the effect of chemical fragments or groups based on the structure and aids in the study of possible mechanisms of action in terms of chemical reactivity. There are inherent errors associated with the mathematical rigourness and its inability to address bulk effects (Karelson et al., 1996).
Applications of quantum-based charge schemes only revealed the superior performance of QSAR models by yielding high q 2 value. For example, Merz-Singh-Kollman (MK)-electrostatic potential (ESP)-derived charges on moracin analogs (Madhavan et al., 2012), consistent force field (CFF) (Tsai et al., 2010) on ten selected datasets, Austin Model 1-Bond Charge Correction (AM1-BCC) on adamantyl derivatives of P-glycoprotein inhibitors (Gadhe et al., 2011), Mulliken charges on paullones class of glycogen synthase kinase 3β inhibitors (Osolodkin et al., 2010), and electrostatic potential-derived charges on benzodiazepine derivatives (Kroemer et al., 1996). Therefore, it seems that the selection of appropriate charge models in QSAR studies is simply based on the selected molecular dataset and currently cannot be prioritized one charge model over another. On the other hand, MM-based methods provide fast descriptors calculations and model development, but may not cover comprehensively the chemical space (Gadhe et al., 2011).
There is a tendency to select QSAR models based on the highest r 2 and q 2 parameters rather than the comparison of QSAR models obtained from various atomic charges and/or force fields pretreatments. A consensus analysis of various descriptors mapped by the top scoring QSAR models will facilitate the crucial understanding of molecular properties necessary to develop potent molecules. It is well known that the assignment of partial charge schemes directly affects the CoMFA QSAR prediction accuracy (Mittal, Harris, McKinnon, & Sorich, 2009). Thus, the generalized QSAR model derived from various charge models will map essential structural properties that may be missed or remain unexplored when relied on single QSAR model.
The current scenario of domination by commercial closed-source programs in the field of 3D-QSAR modeling has been gradually replaced by the open-source packages (Tosco & Balle, 2011). QSAR models developed from VLife MDS provide a single PLS model with information on optimal PCs used to derive model. However, the details regarding the prediction accuracy (r 2 or q 2 ) during the trend in the selection of optimal PCs (a facility provided by Open3DQSAR PLS engine) will be helpful to recognize over-fitting of data points. Further, the manual specification of PCs in the cross-validation PLS model should be automated in SYBYL CoMFA technique. It is an advantage to represent the field-based descriptors as highly informative grid points rather than the contour maps that are directly affected by isosurface field rendering command. A nonlinear modeling equipped with stochastic variable selection methods implemented in commercial QSAR packages drastically reduces the collinearity of variables and over-fitting of the resultant models (Ajmani et al., 2006) (a feature absent in open-source packages). There is a need to implement various advanced grid-based selections in addition to stochastic methods to obtain optimal QSAR models. It should be noted that there are alternative ways (e.g., alignment methods, different probes, chemometric softwares) to develop optimal QSAR models and selection of such models may offer varied interpretations.

Conclusions
We studied the consensus activity-correlating molecular regions which were derived from optimal MIF descriptors of the various (best scoring) atomic charge-based QSAR models. The interpretation of MIF descriptors obtained from charge-based QSAR models was carried out by analyzing its highly informative lattice descriptors and subsequently divided into molecular regions. The electrostatic and steric descriptors were gathered regionwise from six atomic charge schemes such as Del Re, Gasteiger-Marsili, MMFF, Hückel, Gasteiger-Hückel, and Pullman, to interpret the requirements of chemical groups in relation to activity. The essential molecular properties mapped by respective MIF descriptors were compared with conventional field-based QSAR model obtained from Open3DQSAR program to validate the predictions. The better performance of Hückel-and Gasteiger-Hückel-based QSAR models was noted in terms of correlation coefficient (r 2 ), and the difference between correlation coefficient and its cross-validated terms of 0.4 suggests that the performance of atomic charge schemes to obtain optimal QSAR models may vary across datasets and therefore requires examination of all available MM-based models to arrive at conclusions. The procedure associated with variable selection methods to develop a statistically significant QSAR model precludes the selection of optimal field-based features in a single model. Moreover, the fact that atomic partial charge assignment schemes significantly affects the prediction accuracy of a QSAR model necessitates the study of various charge-based models to retrieve prominent activity governing molecular properties. From this study, it is quite evident that accuracy of charge assignment has a major role to play in developing QSAR models. In such situation, we would like to suggest that one must be careful in selecting the MM-based charge assignment model before taking any conclusion based on the developed QSAR model when quantum mechanical calculations related resources are limited to the user.

Supplementary material
The supplementary material for this paper is available online at http://dx.doi.org/10.1080/07391102.2015. 1044474.