A computational study on role of 6-(hydroxymethyl)-3-[3,4,5-trihydroxy-6-[(3,4,5-trihydroxyoxan-2-yl)oxymethyl]oxan-2-yl]oxyoxane-2,4,5-triol in the regulation of blood glucose level

6-(hydroxymethyl)-3-[3,4,5-trihydroxy-6-[(3,4,5-trihydroxyoxan-2-yl)oxymethyl]oxan-2-yl]oxyoxane-2,4,5-triol (SID 242078875) was isolated from the fruits of Syzygium densiflorum Wall. ex Wight & Arn (Myrtaceae), which has been traditionally used in the treatment of diabetes by the tribes of The Nilgiris, Tamil Nadu, India. In this study, reverse pharmacophore mapping approach and text-based database search identified the dipeptidyl peptidase-IV, protein-tyrosine phosphatase 1B, phosphoenolpyruvate carboxykinase, glycogen synthase kinase-3β and glucokinase as potential targets of SID 242078875 in diabetes management. Further, molecular docking was performed to predict the binding pose of SID 242078875 in the active site region of the target protein. In addition, dynamic behaviour and stability of protein–ligand complexes were observed for a period of 50 ns through molecular dynamics simulation.


Introduction
Diabetes mellitus is one of the largest chronic endocrine metabolic disorder worldwide; it is debilitating and often life threatening with an increasing prevalence rate of 8.3% (IDF, 2014). Investigation of medicinal plants is an emerging research area due to their lesser side effects (Osadebe, Odoh, & Uzor, 2014). Recently, natural sources have been used for the treatment of diabetes throughout the world. A large number of medicinal plants that possess highly diversified chemical compounds with hypoglycaemic activity (Onkaramurthy et al., 2013). A wide variety of natural and naturally derived compounds are in the clinical phase. Among the FDA-approved diabetes medicines, 15 anti-diabetic drugsvoglibose, acarbose, miglitol, exetenatide, triproamylin acetate, liraglutide, nateglinide, tolrestat, epalrestat, troglitazone, rosiglitazone maleate, sitagliptin, vildagliptin, saxagliptin and alogliptin benzoate − were categorized under natural products. These include natural products, naturally derived compounds that mimic natural products, and synthetic compounds with natural product pharmacophore (Newman & Cragg, 2012). However, use of natural products in drug discovery has decreased due to technical barriers such as high-throughput screening assays against disease targets (Harvey, Edrada-Ebel, & Quinn, 2015).
Diabetes target proteins have been identified by various approaches such as chemical proteomics, comprehensive phenotypic screening, computational inference methods, direct biochemical methods, expression profiling, gene knockouts studies, genetic interactions analysis, genome-wide genetic assays, high-throughput functional genomics, integrated network-based approach, microarray gene expression analysis, PageRank algorithm and pathway analysis (Emig et al., 2013;Grolmusz, 2015;Gu, Chen, Yuan, & Xu, 2013;Haanstra & Bakker, 2015;Schenone, Dančík, Wagner, & Clemons, 2013;Walke et al., 2001;Wang, Zhao, Shang, & Xia, 2014). Various online resources have been updating new novel protein targets and offer target information to the scientific community. At present, following resources provide information about the proteins associated with diabetes: Potential drug target database (Gao et al., 2008), T2D-Db (Agrawal et al., 2008), T1Dbase (Burren et al., 2011), T2D@ZJU (Yang et al., 2013), The Human Diabetes Proteome Project (Schvartz et al., 2015;Topf et al., 2013) and diabetes associated proteins database (DAPD) (Gopinath, Jayakumararaj, & Karthikeyan, 2015). Identification of specific molecular targets of the drug is crucial for drug innovation, personalized medicine and rational use of drugs. Also, it has a significant impact on clinical trials and treatment with reduction in side effects (França, 2015;Liu, Constantinides, & Li, 2014). However, target identification from thousands of molecules is still a laborious, time-consuming and costintensive process (Kore, Mutha, & Antre, 2012). Over the past decades, computer-aided drug design approaches have accelerated the early-stage pharmaceutical research in target identification and validation. Computational tools have been reported as an alternative to experimental methods in predicting target protein for small molecules/plant-derived molecules (Bashar, Khalipha, & Chowdhury, 2012;Zheng et al., 2013). Computational tools employ ligand and structure-based approaches for target prediction. Reverse/inverse docking uses a structures-based approach to identify the target protein, where a query compound is docked against a panel of protein targets or receptors using scoring functions (McPhillie, Cain, Narramore, Fishwick, & Simmons, 2015). Various online sources − INVDOCK (Chen & Ung, 2001), Target Fishing Dock (Li et al., 2006) and idTarget (Wang, Chu, Chen, & Lin, 2012)have employed reverse/inverse docking approaches to identify the target protein (McPhillie et al., 2015). The abovementioned server uses docking program and ligandprotein interaction energy to assess the docking results (Zheng, Chen, & Lu, 2011). DOCK scoring, ITScore and robust AutoDock4 scoring functions are in use for ranking the target protein (Kharkar, Warrier, & Gaud, 2014). Besides, parallel strategies have been used to improve the execution of inverse docking (McPhillie et al., 2015;Vasseur et al., 2013). The recently developed VinaMPI program decreases the time required for completing the screening process (Kharkar et al., 2014). In addition, ligand-based methods use the concept of shape similarity, where chemical compounds of similar shape tend to have similar biological activities (McPhillie et al., 2015;Vasseur et al., 2013). Ligand-based strategies utilize 2D fingerprints, 3D similarity search and reverse pharmacophore mapping for target prediction. Reverse pharmacophore and reverse docking have proved to be efficient and cost-effective target identification approaches that increase the chances of success in the drug discovery process (Liu et al., 2010).
Structure-based pharmacophore determines the chemical features with their spatial relationships in the ligandbinding site. Chemical features reflect the interactions that play a key role in the modulation of protein function by interacting with crucial residues (Wolber, Seidel, Bendix, & Langer, 2008). PharmMapper server identifies the target protein by reverse pharmacophore screening against more than 1500 drug targets from PharmTar-getDB (Liu et al., 2010). Also, this server provides function and disease-relevant information for the identified target proteins. The efficiency of drug discovery relies on the knowledge of protein function and regulation in the pathogenesis of a disease or metabolic disorder (Burbaum & Tobal, 2002). Thus, additional information on signalling and metabolic pathway(s) associated with diseases would provide a basic understanding towards the role of cellular proteins in the pathogenesis of the disease.
The present investigation was aimed at identifying the diabetes target for 6-(hydroxymethyl)-3- [3,4,5-trihydroxy-6-[(3,4,5-trihydroxyoxan-2-yl)oxymethyl]oxan-2-yl]oxyo xane-2,4,5-triol (SID 242078875) (Figure 1). The SID 242078875 was isolated from the ethanolic extract of Syzygium densiflorum Wall. ex Wight & Arn (Myrtaceae) fruits that exhibited antidiabetic property (Gopinath, David Raj, Nagarajan, & Karthikeyan, 2015). Further, it is necessary to determine the therapeutic role of the isolated compound to correlate with the antidiabetic effect of ethanolic extract of S. densiflorum fruits. In this work, we used PharmMapper server to obtain the human target hit for SID 242078875. Subsequently, the identified targets were screened manually against DAPD  to identify the obtained diabetes targets along with their pathway information. Further, molecular docking, binding free energy calculation, quantum mechanics/molecular mechanics (QM/MM) calculations and molecular dynamics (MD) simulation were performed to understand the mechanism of SID 242078875 in regulation of the blood glucose level.

Screening of potential diabetes targets for SID 242078875
Target identification is one of the main steps in the determination of the therapeutic role of a new molecule (Hughes, Rees, Kalindjian, & Philpott, 2011). The current investigation was aimed at understanding the Figure 1. Structure of 6-(hydroxymethyl)-3- [3,4,5-trihydroxy-6-[(3,4,5-trihydroxyoxan-2-yl)oxymethyl]oxan-2-yl]oxyoxane-2, 4,5-triol (SID 242078875). antidiabetic effect of SID 242078875 by identifying the target protein. In this study, PharmMapper server was used to predict the disease targets from PharmTargetDB. PharmTargetDB has pharmacophore models, which describes the binding modes of known drug at the binding sites of protein-ligand complexes that were extracted from various databases. PharmMapper flexibly aligns the submitted small molecule onto each pharmacophore model of proteins in the target list and calculated the fit values between the small molecule and the pharmacophores. Further, it presents the aligned pose with the corresponding pharmacophore model and lists the candidate targets based on the fit values. Finally, PharmMapper provides the top N hits of the ranking list, from which the user may select potential target candidates (Liu et al., 2010;Xie, Kinnings, Xie, & Bourne, 2012).
In the present screening, "conformational generation" and "human protein targets" options were selected. Conformational generation allows the PharmMapper server to generate conformers for query compound by in-house multi-objective evolution algorithm-based conformational analysis program (Liu et al., 2009). Further, "Perform GA Match" with default parameters was selected from the "Advanced Options" menu. Genetic algorithm (GA) performs post-optimization of the alignment in order to maximize the fit values. As a result, PharmMapper server listed human targets that have fit value score higher than the cut-off (2.0) and the screened targets were used for further investigation. In addition, the text-based search was employed to screen the obtained human targets against DAPD (www.mkarthikeyan.bioinfoau.org/dapd). The screened diabetes target molecules were further investigated by molecular docking and MD simulation methods.

Protein and ligand preparation
The crystal structure of screened target proteins were obtained from Protein Data Bank (Berman et al., 2000). The structures were prepared and optimized by the addition of hydrogens, rectifying the missing atoms in side chain and loop region. Furthermore, the protein(s) was refined by fixing overlapping hydrogens through H-bond optimization and terminal flips, or by minimization. The restrained minimization was employed using Optimized Potentials for Liquid Simulations (OPLS)-2005 force field (Jorgensen, Maxwell, & Tirado-Rives, 1996), Minimization was performed until the average root mean square deviation (RMSD) of the non-hydrogen atoms reached 0.3 Å (Tripathi & Singh, 2014). The overall protein preparation was performed using glide protein preparation wizard in Maestro (Maestro, version 10.1, Schrödinger, LLC, New York, NY, 2015). The SID 242078875 was prepared by LigPrep (LigPrep, version 3.3, Schrödinger, LLC, New York, NY, 2015.), which fixed the bond orders, generated ionization states and tautomers. The possible tautomeric and ionization states of compound were generated at the pH range of 7.0 ± 2.0 and have been rigorously adjusted via Epik program rather than separate ionizer and tautomerizer treatments. The optimized structures were minimized with the OPLS-2005 force field.

Molecular docking analysis
Molecular docking simulation was performed in order to find the probable binding mode of SID 242078875. The prepared target proteins were subjected to grid generation to ensure that possible active sites were not missed (Schrödinger, 2015). The co-crystallized ligand was considered as the reference molecule and a grid-enclosing box was centred at the co-crystallized ligand. The grid box was generated around the ligand-binding site of the screened targets. The position of grid box is set as XYZ axis with radius 2.0 Å and the van der Waals radii of receptor atoms as 1.0 Å with a partial charge cut-off 0.25 Å to soften the potential for non-polar part of receptor. The SID 242078875 was docked onto the ligand-binding site of the screened targets using glide extra-precision (XP) docking (Glide, version 6.6, Schrödinger, LLC, New York, NY, 2015). Glide score (a modified and extended version of the empirically base function), Glide energy (modified Coulomb-van der Waals interaction energy), hydrogen bond interaction and hydrophobic interactions were considered to investigate the therapeutic effect of SID 242078875.

QM-polarized ligand docking
The lowest-energy docked complex was further subjected to Quantum Polarized Ligand Docking (QPLD). The QPLD protocol was employed in docking calculations rather than usual fixed charges assigned by the OPLS-AA force field. It improves the partial charges on the ligand atoms by replacing them with charges derived from quantum mechanical calculations. In this hypothesis, the protein is considered as MM region and ligand as QM region (Selvaraj, Singh, & Singh, 2014). When there are covalent connections between the QM and MM regions, it uses frozen localized molecular orbitals along the covalent bonds to construct an interface between the two regions that improves the binding pose and accuracy (Singh & Muthusamy, 2013). The atomic charge was calculated from the electrostatic potential energy surface of the ligand, which is generated from a single-point calculation using density functional theory (DFT) employing 6-31G*/LACVP* basis set, B3LYP density functional and "Ultrafine" SCF accuracy level (iacc = 1, iacscf = 2). (Cho, Guallar, Berne, & Friesner, 2005). Glide then re-docks each ligand with updated charges and returns the most energetically favourable pose. The QPLD protocol utilizes Glide (Glide, version 6.6, Schrödinger, LLC, New York, NY, 2015), Jaguar (Jaguar, version 8.7, Schrödinger, LLC, New York, NY, 2015) and QSite (QSite, version 6.6, Schrödinger, LLC, New York, NY, 2015). Jaguar is a high-performance ab initio quantum chemical program for both gas and solution phase simulations and specializes in fast electronic structure predictions for molecular systems. It performs DFT calculations and computes molecular orbitals, electron density and electrostatic potential for the molecules (Bochevarov et al., 2013). QSite is a QM/MM program that applies QM to the reactive centre of a protein active site and MM to the rest of the system. Its accuracy allows detailed understanding of reactions involving proteins, making it a powerful tool for lead optimization .

E-pharmacophore generation
The pharmacophore overlay of SID 242078875, cocrystallized "experimental" pose, and most active compounds for each target was generated from the protein-ligand complex with Phase (Phase, version 4.2, Schrödinger, LLC, New York, NY, 2015). Structure-based pharmacophore or energy-optimized pharmacophores (E-pharmacophore) approach is based on mapping of the energetic terms from the Glide XP score function onto atom centres. Structure-based pharmacophore combines the speed of pharmacophore screening with the energetic binding terms from Glide XP (Loving, Salam, & Sherman, 2009;Salam, Nuti, & Sherman, 2009).

Prime MM-GBSA free energy calculations
Prime/MM-GBSA approach (Prime, version 3.9, Schrödinger, LLC, New York, NY, 2015) was used to calculate the binding free energy. The docked poses obtained from QPLD were minimized using the local optimization feature in Prime. OPLS-AA 2005 force field was used to calculate the protein-ligand complex energies in GB/SA continuum solvent model (Du et al., 2011;Lyne, Lamb, & Saeh, 2006). Prime uses a surface generalized Born model employing a Gaussian surface instead of a van der Waals surface for better representation of the solvent-accessible surface area (SASA). The following equation was used to calculate the binding free energy where E complex , E protein and E ligand are the minimized energies of protein-inhibitor complex, protein and inhibitor, respectively.
where G SA (complex) , G SA (protein) and G SA (ligand) are the surface area energies for protein-inhibitor complex, protein and inhibitor, respectively.

QM/MM calculations
The QPLD complexes were subjected to QM/MM calculations with QSite (QSite, version 6.6, Schrödinger, LLC, New York, NY, 2015) employing hybrid DFT with B3LYP/6-31G++** basis set (Becke, 1993;Binkley, Pople, & Hehre, 1980;Lee, Yang, & Parr, 1988;Pietro et al., 1982). The use of B3LYP hybrid functional is most popular in the calculation of properties and reactions of organic molecules (McCarren, 2009;Yanai, Tew, & Handy, 2004). It provides good results with more accuracy with respect to geometrical parameters and energy calculation (Hopmann & Himo, 2010). 6-31G++** basis set gives additional polarization functions and flexibility to electronic density. The diffuse functions in the basis set increases the spatial extent of electronic density (Jensen, 2006). The molecular electrostatic potential map, frontier molecular orbitals (i.e. Highest Occupied Molecular Orbitals and Lowest Unoccupied Molecular Orbitals) and HOMO-LUMO energy gap (HLG) were calculated by QM/MM calculations to understand the electronic molecular features of the SID 242078875.

MD simulation
MD simulation was performed to ensure the stability of SID 242078875 in the binding cavity of the screened target proteins. MD simulation was carried out using GROMACS 4.6.3 package with the GROMOS96 (53a6) force field. The GROMOS96 (53a6) was used here to prepare the protein topology (Oostenbrink, Soares, Van der Vegt, & Van Gunsteren, 2005;Oostenbrink, Villa, Mark, & Van Gunsteren, 2004). The SID 242078875 topology was obtained from PRODRG server (Schüttelkopf & Van Aalten, 2004). The cubic unit cell was defined and solvated with explicit water by adding simple point charge. Further, each system was neutralized by adding either Na + or Cl − counter ions. The whole system was subjected to energy minimization with the help of 50,000 minimization steps and a tolerance of 1000 kJ/mol/nm. Electrostatic and van der Waals (vdW) interactions were applied based on the particle mesh Ewald method with a cut-off distance of 1.0 nm for short-range neighbour list and 1.0 nm for Coulomb cutoff and vdW interactions (Kawata & Nagashima, 2001). The LINCS (A linear constraint solver for molecular simulations) and SETTLE (An analytical version of the SHAKE and RATTLE algorithm for rigid water models) algorithms were used to constrain the bond angles and geometry of the water molecules (Hess, Bekker, Berendsen, & Fraaije, 1997;Miyamoto & Kollman, 1992). V-rescale coupling was used to regulate the temperature (weak coupling method). The Parrinello-Rahman coupling was employed to maintain a constant temperature of 300 K and a constant semi-isotropic pressure of 1 bar with a coupling time of 2 fs and the coordinates were saved (Martoňák, Laio, & Parrinello, 2003). The system was equilibrated with NVT and NPT simulations. The dynamic behaviour of protein-ligand complex was observed in an equilibrated system for 50 ns MD run with a time step of 2 fs.

Results and discussion
Screening of diabetes targets for SID 242078875 In this study, potential disease targets for SID 242078875 were predicted using PharmMapper server. PharmMapper automatically finds the best mapping poses of SID 242078875 against all the pharmacophore models in PharmTargetDB through a "reverse" pharmacophore mapping approach. It listed out 897 best mapping poses that belong to 319 human disease targets. Of the 319 human targets retrieved from PharmMapper, 5 targets such as dipeptidyl peptidase-IV (DPP-IV), protein-tyrosine phosphatase 1B (PTP1B), phosphoenolpyruvate carboxykinase (PEPCK), glycogen synthase kinase-3β (GSK-3B) and glucokinase (GK) were recognized as potential diabetes targets while performing textbased manual search of 319 targets in DAPD (Figure 2). Modulating these protein functions balance the blood glucose level and thus help in diabetes treatment ( Figure 3, see supplementary data). Pathways associated with the screened targets are depicted in Table 1. Further, pharmacophore features for SID 242078875 were generated with Phase module (Phase version 4.2, Schrödinger, LLC, New York, NY, 2015) and the alignment between the molecule (SID 242078875) and receptor-based pharmacophore models of potential targets are shown in Figure 4. The receptor-based pharmacophore feature of complex crystal structures is shown in Table 2. The alignment (Figure 4(b)-(f)) shows the selected pharmacophore features for the interaction of the drug molecule with its respective potential target.

Molecular docking analysis
X-ray crystal structures of DPP-IV (PDB id: 4PNZ), PTP1B (PDB id: 4I8N), PEPCK (PDB id: 1M51), GSK-3B (PDB id: 4AFJ) and GK (PDB id: 3FR0) were obtained from Protein Data Bank and prepared using protein preparation wizard. The active sites were determined to all the five targets by generating the grid with reference to its co-crystal ligands that has been reported as a potent inhibitor/activator to the respective target. Extra precision glide docking was performed to predict the binding pose of SID 242078875 with screened targets. The predicted binding pose was obtained through molecular docking. Glide score, glide energy, hydrogen bond and hydrophobic interactions were considered for evaluating the binding affinities and the molecular basis of the interaction with SID 242078875 (Table 3). The docking scores ranged from −14.88 to −10.337 kcal/mol and Glide energies ranged from −53.857 to −42.572 kcal/mol, revealing better binding affinities for SID 242078875 with the screened targets. The root mean square deviation (RMSD) between predicted binding pose of SID 242078875 and the crystallographic coordinates of DPP-IV, PTP1B, PEPCK, GSK-3B and GK was found to be .37, .79, .47, .72 and .57 Å, respectively. An RMSD of 2 Å is considered as the cut-off for correct docking. Herein, the observed RMSD values suggest the reliability of the Glide XP docking mode in reproducing the experimentally observed binding mode (Tripathi & Singh, 2014). In addition, hydrogen bond interaction with the active residues supports potent antagonist activity of SID 242078875 with DPP-IV, PTP1B, PEPCK, and GSK-3B; agonist activity for GK ( Figure S1).

QM/MM docking analysis
Accurate binding pose of SID 242078875 with the screened targets were predicted using QPLD and the obtained glide score, glide energy, hydrogen bond and hydrophobic interactions were tabulated (Table 4). The electronic charges play a key role in the protein-ligand interaction. XP docking obtains the charges from OPLS MM force field, while QPLD uses mixed QM/MM methods to compute the ligand charge distribution that increase the accuracy of docking. The fixed charges of ligand obtained from the OPLS force field was replaced by QM force field in the protein atmosphere. Hence, polarization effects in the quantum region significantly contribute to accurate interactions of protein-ligand complexes (Cho et al., 2005). In this study, variation in the glide score, hydrogen bond number and distances were observed between the XP docking and QPLD (Tables 3 and 4). Protein-ligand complexes obtained by XP-docking and QPLD were superimposed to check the binding mode of the SID 242078875 in the binding cavity of screened targets. Superimposed complexes clearly show that SID 242078875 is fit into the binding cavity through the interaction(s) with active site residues ( Figure 5). Moreover, correlation between Glide XP and QPLD scores for isolated compound yielded a statistically significant correlation coefficient of .702. Glide score of SID 242078875-DPP-IV and SID 242078875-GK are found to be comparatively less in the QPLD; however, these complexes showed more hydrogen bond interactions than the XP docking complexes ( Figure S2). Further, proteinligand interaction analysis provides more insights into the mechanism of SID 242078875 to determine its therapeutic effect.

Molecular recognition analysis
Molecular recognition plays a key role in promoting fundamental biomolecular events such as drug-protein, enzyme-substrate and drug-nucleic acid interactions. The detailed understanding of drug interactions with their target protein/nucleic acid may provide a conceptual framework for determining the desired potency and therapeutic role of the drug (Mohan, Gibbs, Cummings, Jaeger, & DesJarlais, 2005). The therapeutic effect of drug molecule depends on the ability to interact with a particular binding site on target protein to exert a desired biologic effect. Ligands that share favourable interactions will exert the similar therapeutic effect on the target protein (Kortagere & Ekins, 2010). Therefore, the therapeutic role of novel compounds can be determined by analysing the favourable interaction at the binding site of the target protein. The common residues that form hydrogen bond interaction in the protein-ligand complexes of XP docking and QPLD were noted for molecular recognition analysis.
It is reported that PEPCK-binding pocket consists of Phe517, Phe525 and Phe530 that form the walls of binding pocket (Dunten et al., 2002). The residues Phe530 and Asn533 are found to have a role in the catalytic mechanism of PEPCK (Carlson & Holyoak, 2009). The reported potent inhibitors are found to be placed between the Phe517 and Phe530 residues of the enzyme (Table S3). The SID 242078875-PEPCK complex showed interactions with Ala287, Asn292, Trp527, Pro528, Phe530 and Asn533 (Figure 5(c)). Herein, the interaction with Phe530 and Asn533 theoretically supports the inhibitory activity of SID 242078875 on PEPCK.
The small molecule GK activators bind to the allosteric site and stabilize the high-glucose affinity confirmations and increase the affinity of GK for glucose (Ratcliffe, 2011). This region contains hydrophobic amino acids (Val62, Ile159, Val452 and Val255) and polar amino acid residue (Arg63), which forms the hydrophobic pocket for GK activators (Kumari & Li, 2008). All the reported potent GK agonists showed interaction with Arg63 of GK (Table S5). Similarly, the SID 242078875-GK complex has also showed interaction with Arg63 ( Figure 5(e)) and thus, theoretically supports glucokinase as the agonist activity of SID 242078875.
In addition to interaction analysis, co-crystal ligands were re-docked with the respective target protein to check the docking score and interactions. Re-docking analysis shows that SID 242078875 and co-crystal ligands shares most similar interactions (Tables S1-S5). Docking scores of SID 242078875 and co-crystal ligands were compared and represented as a bar graph ( Figure S3). As shown in Figure S3, SID 242078875 showed better score compared to co-crystal ligands of DPP-IV inhibitors ( Figure S3(a)), PTP1B inhibitors ( Figure S3(b)), PEPCK inhibitors ( Figure S3(c)), GSK-3B inhibitors ( Figure S3(d)) and GK agonists ( Figure S3(e)). However, a few of the PTP1B inhibitors and GK agonists showed better score than the SID 242078875.
Pharmacophore overlay of SID 242078875, cocrystallized "experimental" pose, and most active compounds was generated for DPP-IV, PTP1B, PEPCK, GSK-3B and GK (Figure 6(a)-(c)). This pharmacophore overlay clearly represents pharmacophore features such as hydrogen bond donors and acceptors largely contribute in the binding of SID 242078875 and most active compounds in the active site region of its respective target.

Binding energy calculations
The obtained QPLD posture was minimized through local optimization feature in Prime, and the complex energies are calculated in OPLS-AA force field and GBSA continuum solvent medium. MM/PBSA combines an explicit molecular mechanical model with a continuum method for the solvation free energy. This method is more efficient than the traditional free energy method (Cozzini et al., 2004). The molecular interaction mechanism of protein and ligand is based on the theory of second law of thermodynamics. Thus, it is desirable to perform lead optimization on the basis of binding free energy, which in turn provides information regarding thermodynamic behaviour of the ligand necessary for drug design (Kitamura et al., 2014). The energy values Figure 6. Pharmacophore overlay of SID 242078875 (a), co-crystallized "experimental" pose (b), and most active compounds (c) with DPP-IV, PTP1B, PEPCK, GSK-3B, and GK.  , and PEPCK (e). The map was calculated using density functional theorem at B3LYP/6-31G++**. The rainbow colours in the map denote the electrostatic potential energy around the molecules. Electrostatic potential of the SID 242078875 is increases in the order of Red < Yellow < Green < Cyan < Blue < Purple. The map represents that electrophilic and nucleophilic attacks favours the hydrogen bonding interactions. The red region denotes the preferred site for electrophilic attack indications and purple/blue represents the preferred site for nucleophilic attack. obtained through QPLD are represented in Table 5. Calculated energy component shows that Coulomb energy (ΔG Coulomb ) and covalent energy (ΔG covalent ) are the most favourable contributors to ligand binding, whereas van der Waals energy (ΔG vdW ), lipophilic energy (ΔG lipo) and hydrogen bonding energy (ΔG Hbond ) had also shown effective contribution to ligand binding but comparatively less than the Coulomb energy and covalent energy. The ΔG bind values ranged from −75.87 to −42.86 kcal/mol and show better binding affinities for the SID 242078875 in the active site of screened targets.

QM/MM calculations
The molecular electrostatic potential, HOMO, LUMO, HOMO-LUMO energy gap (HLG) and QM/MM energy were calculated to study the stereoelectronic complementarities of SID 242078875 and the most active compounds by B3LYP/6-31G++** basic set using jaguar module. The QM/MM energy showed that the ligand is effective in discerning correct binding poses, which is calculated as the Coulomb-van der Waals force of the complex and estimated from the electrostatic potential energy of the ligand. The value of the electrostatic potential was mapped onto an electron density isosurface corresponding to the overall molecular size of SID 242078875. The predicted electron density map shows the electron-rich and -poor regions that present the value of the electrostatic potential in addition to structure. Generally, a negative potential surface is a delineated region in the ligand for electrophilic attack, while a positive surface area defines nucleophilic attack. The electrostatic potential of SID 242078875 at a particular surface area was represented with rainbow colours to designate the electrostatic potential value (Figure 7). Herein, red represents a large negative potential value, while purple represents a large positive potential value and orange, yellow and green represent intermediate values of the electrostatic potential. The electrostatic potential map of GET 49 with DPP-IV (Figure 7(a)), PTP1B (Figure 7(b)), PEPCK (Figure 7(e)), GSK-3B (Figure 7(d)) and GK (Figure 7(c)) clearly shows the electron donor and acceptor regions of SID 242078875. The hydrogen bond interactions were formed between the SID 242078875 and targets due to electrophilic and nucleophilic attacks.
The molecular orbital energies at ground state (HOMO) and first excited state (LUMO) were calculated to characterize the reactive sites and substituent influence on the electronic structure of the compounds. The HOMO value of the SID 242078875 represents the ability to donate the electron and the LUMO value represents the ability to accept the electron (Figure 8). The energy difference of HOMO-LUMO was calculated as the band gap energy. The calculated values were compared with the most active compounds. The HOMO-LUMO energy gap measures the electron conductivity. Interestingly, the HOMO-LUMO energy gap of SID 242078875 was found to be lower than the most active compounds (Table 6). Decrease in the energy gap corresponds to the stability and possibility of a compound to form the interaction between the ligand and receptor (Reddy & Singh, 2015).

MD simulation
The flexibility and binding affinity of SID 242078875 with receptors (DPP-IV, PTP1B, PEPCK, GSK-3B and GK) for a period of 50 ns specific time period were studied through MD simulation. The total energy, potential energy, pressure and temperature ( Figures S4-S7) of the equilibrated system have been illustrated in the supplementary contents. The observed trajectories show that the MD simulation system was stable during the whole simulation period. The average values of total energy, potential energy, pressure and temperature were depicted in Table 7. The deviation in the system values was represented as standard deviation that shows the system was stable during the whole simulation period. The total energy and potential energy varies among the complex because of the variation in size and residues. The RMSD, root mean square fluctuations (RMSF), SASA, radius of gyration (Rg) and number of hydrogen bonds (H bond) were calculated to evaluate the stability and dynamic behaviour of protein-ligand complex.
The backbone RMSD of five docked complexes with respect to the initial conformation was calculated as a function of time. The conformational stability of the protein-ligand complex during the simulation period was assessed by the calculated backbone RMSD (Figure 9(a)). The SID 242078875 has been bound with five different receptors that differ in size and secondary structure, hence the initial run to obtain the equilibration phase shows variation for each complex and later it maintains the stable protein-ligand complex in equilibrated condition. The RMSD plot of five complexes show that the trajectories of SID 242078875-DPP-IV are more stable from 20 to 50 ns with slight decrease in RMSD during 40-45 ns; SID 242078875-PTP1B was stable from 32 to 50 ns; SID 242078875-PEPCK was   stable from 5 to 50 ns. SID 242078875-GSK-3B was stable from 23 to 50 ns; the SID 242078875-GK was stable during 19 to 25 ns and it has increased without much deviation, further it maintained the stability from 45 ns. The average RMSD of all the five complexes are shown in Figure 10, which explains the RMSD mean variation in the entire period of MD simulation and the last 10 ns (40-50 ns) of MD simulation. The fluctuation in the RMSD was represented as the standard deviation (SD). The overall SD of all complexes show the deviation in the initial run, while RMSD mean variation during the last 10 ns (40-50 ns) is much less than the overall RMSD mean variation. This confirmed the ligand stability in the last 10 ns, and it indicates the stability of the SID 242078875 with the target proteins (Figure 9(b)).
The flexibility of each residue in protein-ligand complex was evaluated by RMSF. The RMSF of hydrogen bond interacting residues of DPP-IV, PTP1B, PEPCK, GSK-3B and GK were observed to understand the mobility of residues during the whole simulation period. In SID 242078875-DPP-IV, the residues Glu205, Glu206 and Tyr666 were found to be the most important for DPP-IV inhibitory activity. RMSF of Glu205, Glu206 and Tyr666 are .17, .18 and .13 nm, respectively. Nearby residues in the binding cavity also showed very less fluctuation around~1.5 nm (Figure 11(a)). The RMSF of residues Arg24, Asp48, Ser216, Gly220 and Arg221 in SID 242078875-PTP1B were observed at .65, .32, .11, .1 and .12 nm, respectively; it is found that RMSF for Arg24 and Asp48 are comparatively increased, because these residues lie next to the loop region (Arg24) and in the middle of the loop region (Asp48) (Figure 11(b)). In SID 242078875-PEPCK, the RMSF values for Ala287, Asn292, Trp527, Pro528, Phe530 and Asn533 were found to be . 25, .16, .18, .19, .45 and .21 nm, respectively. All the interacting residues showed less fluctuation except for the loop residue Phe530 (Figure 11(c)). The SID 242078875-GSK-3B showed very less RMSF for both interacting residues Val135 (.14 nm) and Asp200 (.11 nm) (Figure 11(d)). In SID 242078875-GK, the interacting residues Tyr61, Arg63, Asp158 and Lys459 exhibit .19, .21, .24 and .3 nm, respectively. A slight increase in the RMSF is due to its position in the loop region (Arg63, Asp158) and terminal region (Lys459) (Figure 11(e)). Overall, RMSF of all five complexes were showed deviation only in the loop and terminal regions, which excludes the active site residues. However, very few residues were observed in the loop region, which has strong hydrogen bond with the SID 242078875. The RMSF of active site residues was found to be less and it confirms the protein-ligand stability in active sites of all screened targets; however, less fluctuation in the active site residues depict the atomic fluctuations of SID 242078875 and activeness inside the binding site pocket.
Total SASA of the five complexes were calculated to evaluate the overall change in the shape of receptors during the simulation (Figure 12). Further, average total SASA was calculated to estimate the SASA during simulation period and calculated values were expressed as  a mean ± standard deviation. The standard deviation represents the variation in the total SASA. The average total SASA for SID 242078875-DPP-IV, SID 242078875-PTP1B, SID 242078875-PEPCK, SID 242078875-GSK-3B, and SID 242078875-GK complexes were 344. 04 ± 4.55, 170.17 ± 3.29, 295.96 ± 3.89, 194.03 ± 3.56 and 231.67 ± 5.3. Herein, all the five receptors belong to various families and differ in sequence length and size of the drug-binding cavity. Thus, the average total SASA values vary according to their protein size. However, all the complexes have shown very less deviation during MD simulation, which indicate that the ligand was bound tightly in the binding pocket and did not induce any significant change in the total SASA upon binding of SID 242078875 to the target protein.
The compactness of protein-ligand complex was measured by radius of gyration (Rg) of protein and was plotted against simulation time (Figure 13), which provides the structural changes in the protein-ligand complexes during simulation time. The Rg values of SID 242078875-DPP-IV, SID 242078875-PTP1B, SID    242078875-PEPCK, SID 242078875-GSK-3B and SID 242078875-GK showed fluctuation between 2.6 and 2.76 nm with mean of 2.69 ± .0, 1.9-1.99 with mean of 1.96 ± .01, 2.43-2.53 with mean of 2.5 ± .01, 2.11-2.2 with mean of 2.15 ± .01 and 2.23-2.45 with the mean of 2.32 ± .04, respectively. Overall, the Rg values of SID 242078875-DPP-IV and SID 242078875-GK showed slightly higher fluctuations with standard deviation of .03 and .04 nm, respectively, it maintained the compactness after 45 ns, while the other three complexes maintain stability from the initial run of the MD simulation. The number of hydrogen bonds between receptor and ligand was calculated during simulation and plotted against time ( Figure 14); it perceives that the SID 242078875 has high potential towards the acceptor-donor relations for strong interaction between protein and ligand. These acceptor-donor relation results with hydrogen bond interactions throughout the 50 ns of timescale confirmed the presence of SID 242078875 in the binding cavity of DPP-IV, PTP1B, PEPCK, GSK-3B and GK. Overall, the RMSD, RMSFs, SASA, radius of gyration and number of hydrogen bonds confirm the stability of SID 242078875 in the binding cavity of DPP-4, PTP1B, PEPCK, GSK-3B and GK.

Conclusion
Target identification is one of the main steps in determining the therapeutic role of a new molecule. In this study, we have used reverse pharmacophore mapping and databases search approaches to identify the protein targets for SID 242078875. The reverse pharmacophore screening retrieved a sum of 897 pharmacophore poses of 325 human protein targets for the query SID 242078875. Subsequent text-based search with DAPD identified five diabetes targets such as dipeptidyl peptidase-4, protein-tyrosine phosphatase 1B, phosphoenolpyruvate carboxykinase, glycogen synthase kinase-3β and glucokinase. Further, it was examined with molecular docking, molecular recognition analysis, binding free energy calculation, QM/MM calculations and MD. Overall, the series of computational tools explore that SID 242078875 has the potential to modulate the function of DPP-IV, PTP1B, PEPCK, GSK-3B and GK for balancing the blood glucose level. Thus, the SID 242078875 may be tested as a potential lead molecule for the treatment of diabetes.

Supplemental material
The supplementary material for this paper is available online at http://dx.doi.org/10.1080/07391102.2015. 1124289.