Molecular insights of benzodipyrazole as CDK2 inhibitors: combined molecular docking, molecular dynamics, and 3D QSAR studies.

Abstract Benzodipyrazoles have been previously evaluated for their in vitro CDK2 inhibitory activity. In the current investigation, we identified a six-feature common pharmacophore model (AADDRR.33) which is predicted to be responsible for CDK2 inhibition. An efficient 3D QSAR (r2 = 0.98 and q2 = 0.82) model was also constructed by employing PLS regression analysis. From the molecular docking studies, we examined the binding patterns of compound 7aa with the target protein and also calculated the binding energy using MM-GBSA calculations. Three hydrogen bonds with Lys 33, Glu 81, and Leu 83 are conserved even after 1000 ps run in a molecular dynamics simulation. We identified the slight displacement in bond lengths and the conformational changes occurred during the dynamics. The results also elucidated the protein residue–ligand interaction fractions which clearly explained the involvement of non-H-bond interactions.


Introduction
Cyclin-dependent kinases (CDKs), of family serine/threonine kinases, regulate the complex processes of the cell division cycle (1,2). Specific CDK-cyclin complexes are responsible for various events known to take place during cell cycle in a sequential and orderly fashion. CDK2 is required to complete G1 and to trigger the S-phase, and CDK4 is required to integrate extracellular signals and directs the cell cycle (3,4). Further explicating, CDKs are inactive as monomers and activation requires binding to the corresponding regulatory proteins and phosphorylation by CDK-activating kinases (CAK) on a specific threonine residue (5). A great deal of effort is now revolved around the CDK2 regulation due to its physiological role in cell division and gene transcription, which rationally precedes cancer development and has benefits as an attractive target for discovery of new antitumor agents (6)(7)(8).
From the recent discoveries, benzodipyrazole moieties are affirmed as a potent CDK2 inhibitors based on the preevaluated in vitro analysis (9). Aiding the conventional drug discovery, we computationally simulated and profoundly evaluated these compounds for their structural conformations with relevance to the CDK protein by assessing the pharmacophore model and QSAR studies. Molecular docking and dynamics simulations helped in understanding the binding mechanism of a CDK2 inhibitor, which are also supported by MM-GBSA calculations and analysis of several trajectories with respective to protein ligand interactions.

Data set and ligand optimization
Precise data set (Table 1) of 60 compounds with benzodipyrazole scaffold was primitively selected (9). Ligands from the data set were structurally produced, using the Chemsketcher tool of Schrodinger suite (Schrödinger, New York, NY). These were later refined, initially by optimizing geometrically and energy minimization of the structures 2D to 3D was performed by using the Ligprep tool of Schrodinger suite. This includes generation of possible ionization states at target biological pH through EPIK program, and also conformers and tautomers by applying optimized potentials for liquid simulations (OPLS -2005) force field. Later these geometrically optimized structures were conditionally chosen to generate pharmacophore hypothesis and built the 3D QSAR model (10). Compound selection and experimental activity (IC 50 ) against CDK2 were all originated from the same laboratory. All the geometrically optimized and energy minimized ligands were arranged as a 3D library for further usage. Generation of common pharmacophore hypotheses By the allocation of activity threshold range, the whole data set was fractioned into active (pIC 50 47), inactive (pIC 50 56), and intermediate (pIC 50 : 6-7). These ligands were internally validated through PHASE (a fine-grained conformational sampling and a range of scoring technique tool) to identify a common pharmacophore hypothesis, accompanied by a set of aligned conformations with the relativity of bind affinity (11). A 3D pharmacophore model was developed using a set of pharmacophore features to generate sites for all the compounds, concurring with various chemical features that may make easy non-covalent binding among the ligand and its binding pocket. Distinctive six pharmacophore features are hydrogen bond acceptor (A), hydrogen bond donor (D), hydrophobic group (H), negatively ionizable (N), positively ionizable (P), and aromatic ring (R). A maximum of six and a minimum of five sites were selected in order to obtain an efficient pharmacophore model. Hypotheses were generated by a systematic variation of the number of sites (n sites ) and the number of matching active compounds (n act ). With n act ¼ n actÀtot . Initially (n actÀtot ) is the total number of active compounds in the training set, n sites (12). The scoring protocol provides a ranking of different hypotheses to choose the most appropriate for further investigation.

3D QSAR studies
For QSAR development, from the outcome of the scoring protocol, the best hypothesis was chosen (AADDRR.33). Pharmacophore-based QSAR models were generated for hypothesis using 48 training set ligands (80% of dataset were selected randomly) and a grid spacing of 1.0 Å . By implementing regression analysis corresponding with PLS factors, QSAR models containing one to nine PLS factors were generated. These models were validated by predicting the activity of test set ligands. The accuracy of the models increases with increasing number of PLS factors until over-fitting starts to occur (13). The predictive value of the models was evaluated by leave one-out (LOO) and leave-half-out (LHO) cross-validation. The cross-validated coefficient, R 2 cv , was calculated using the following equation: where Y predicted , Y observed , and Y mean are the predicted, observed, and the mean values of the target property (pIC 50 ), respectively. (Y observed ÀY mean ) 2 is the predictive residual sum of squares (PRESS). The predictive correlation coefficient (r 2 pred ), based on the molecules of the test set, is defined as where SD is the sum of the squared deviation between the biological activities of the test set and mean activities of the training set molecules, PRESS is the sum of squared deviation between predicted and actual activity values for every molecule in the test set. According to the literature (14), 3D-QSAR models were accepted if

Molecular docking and binding energy calculations
In silico binding interactions of CDK2 and benzodipyrazole derivatives were performed by using the Glide XP algorithm of Schrodinger Suite. Prior to the docking studies, CDK2 protein was retrieved from the PDB website (PDB ID: 2BKZ) and was processed by removing non-protein components and adding missing amino acids and hydrogens to satisfy the valence and be optimized by using the OPLS-2005 force field. A grid was generated by the glide module at the autophosphorylation site of kinase domain of CDK2 and compound 7aa was docked into the prepared grid by employing flexible protein and ligand parameters (15). MM/GBSA calculations were performed for docked protein ligand complex to determine the binding energy (16). The total free energy of binding is then expressed in the form mentioned below in the following equation: where DG bind is ligand binding energy In other terms where gas phase molecular mechanical energy is DE gas (i.e. Contributions from the van der Waals energy, electrostatic energy, and internal energy), the solvation free energy are DG solvation (polar and non-polar contributions) and TDS is the entropy (17).

Molecular dynamics
Docking complex of the most active compound of the data set molecules (compound 7aa) was selected for molecular dynamic studies using Desmond by D.E. Shaw research (18). The docked complex is also used for trajectory analysis of 1000 ps (equals to 1 ns) molecular dynamics simulations.
A solvent system was built around the docked complex with the TIP4PEW water solvent model and neutralizing Sodium ions were added. The system was maintained at 0.15 M salt concentration (Na + Cl À ) which is similar to the physiological conditions. Later the soaked simulated system is introduced into the molecular dynamics workspace. Dynamics simulations were performed at a constant number of atoms, at constant pressure (1.01352 bar) and constant temperature (300 K), i.e., NPT ensemble for 1000 ps. For ensembling, we used the Nose-Hoover chain thermostat with a relaxation time of 1 ps and Martyna-Tobias-Klein barostat with 2 ps relaxation time and isocratic cooling style and Coulombic interactions were cutoff at 9 Å radius (19).

Results and discussion
Pharmacophore model development hypotheses: AADDRR, AAADDR, and ADDRRR. Survival scores of best selected common pharmacophore hypotheses are tabulated in Table 2. Survival scores and activities of AADDRR.33 and AAADDR.46 are almost similar and it might be due to the presence of an H-bond acceptor and loss of an aromatic ring in AAADDR.46. It is very clear that the selected scaffold benzodipyrazole consists of three aromatic rings and it is the reason for selection of AADDRR.33 as best fit CPH and its 3D spatial arrangement with inter-pharmacophoric feature distances are shown in Figure 1. Overall multi-ligand alignment score for a given reference pharmacophore (AADDRR.33) is the average score from the best individual alignments. Alignment of 3D structures over the pharmacophore AADDRR.33 is shown in Figure 2.

QSAR statistics and model validation
We built a pharmacophore-based QSAR model by segregating dataset into different training sets and test sets, with varying the grid space, and visualizing the model results. Contour mapping of the reference ligand, interpreted as the dipyrazole units attached to the benzene ring, the carbonyl linkage and alkyl groups attached to the pyrazole nucleus, and the phenyl moiety attached to the nitrogen of pyrazole, are all shown to be favorable conditions for the increase in activity. From regression analysis of the dataset ligands over AADDRR.33 hypothesis (Table 3) of the total nine factors, PLS-7 was considered to be the best as the regression coefficients r 2 was 0.98 (for training set) and q 2 was 0.82 (for test set). PLS 8 and 9 have shown to be better regression coefficients but they are nearer to 1 and this condition is called over fitting which should not be considered. Predicted activity of all the dataset ligands obtained from QSAR studies considering PLS 7 is listed in Table 1. Data were plotted by Figure 1. 3D spatial arrangement of AADDRR.33 common pharmacophore and inter site distances of hypothesis. extrapolating and correlating results with the actual (experimental) and predicted activities obtained from the QSAR based on the PLS regression analysis. The efficiency of the system was based on the slope of the line and the alignment of the ligands around the line (the slope of the line passes nearer to the origin of the plot, if the regression coefficient is nearer to one). The plots of actual activity versus activity predicted using PHASE of all dataset ligands and training set ligands are shown in Figure 3.

Molecular docking and binding energy calculations
Least energy conformation state of compound 7aa was considered for elucidation of binding phenomenon with the kinase domain of the CDK2 receptor (PDB ID: 2BKZ). Molecular docking studies confirm that the compound 7aa form three hydrogen bonds each with Lys 33 (2.06 Å ), Glu 81 (1.86 Å ), and Leu 83 (2.03 Å ) which is shown in Figure 4, and compound 7aa has a dock score of À9.988 (XP Gscore). One of the two pyrazole rings is contributing with two H-bonds with Glu 81 (hydrogen attached to nitrogen is acting as an H-bond donor) and Leu 83 (nitrogen is acting as an H-bond acceptor). The third H-bond is present between ester C ¼ O and Lys 33 where oxygen of ester is acting as an H-bond acceptor. Dock scores of all benzodipyrazole derivatives were tabulated in Supplementary Table 1.
To elicit the energy system of docked complex of 2BKZ and compound 7aa, we calculated its binding energy (DG bind ) using the MM-GBSA method which is simply a molecular mechanics which is based on the Generalized Born model that is improved with the hydrophobic solvent accessible surface area. GB/SA is one of the most commonly used implicit solvent models. From the calculation, we found that compound 7aa is having DG bind of À67.09 kcal/Mol. A detailed analysis shows that there is a remarkable contribution of columbic (À19.12 kcal/Mol) and van der Waal's (À37.66 kcal/Mol) terms, whereas the contribution of covalent term is negligible (À0.25 kcal/Mol).

Molecular dynamics
Molecular dynamics simulations were performed to identify the conformational changes of ligand and receptor pocket when exposed to a solvent model (water) and at a constant temperature of 300 K (normal body temperature). We introduced previously docked complex of 2BKZ and compound 7aa into a solvent model and the energy of the whole system was minimized in order to attain stable conformation in insolvent environment. With the energy minimization, compound 7aa changed its conformation and rearranged its H-bonds with receptor pocket and surrounding solvent system ( Figure 5). Compound 7aa formed a new H-bond with one of the surrounding water molecules and also the remaining three previously formed H-bonds were rearranged, such as bond distance between ligand and Lys 33 was reduced to 1.544 Å , bond distance between ligand and Glu 81 was reduced to 1.628 Å , and dramatically a slight increase was observed in the bond distance with Leu 83 (2.237 Å ). The water molecule that bound to ligand also formed an H-bond with Asp 86 amino acid with larger bond length. From the 2D illustrations of the protein ligand complexes (Figure 5), it is clear that there is a conformational change during energy minimization with solvent model which is absent during docking studies.
There is a shift of amino acids or compound structure towards amino acids like Ile 10, Val 18, and Phe 80 after energy minimization.
After 1000 ps run, H-bonds with previous three amino acids remain intact with slight difference in their bond distance, i.e., Lys 33 (1.794 Å ), Glu 81 (1.477 Å ), and Leu 83 (2.047 Å ) which is shown in Figure 6. There was a complete displacement of H-bond with water molecules. Before molecular dynamics Asp 86 was bound to water, the later Asp 86 was replaced with Ile 10. We observed that the total flip of the conformation of ligand after 1000 ps run Ile 10 amino acid following solvation energy minimization was near to the terminal ethoxy group, whereas after molecular dynamics, Ile 10 reached nearer to the ester group and formed an H-bond with water molecules.
During in depth analysis of the bonding pattern between residues and ligand, we observed a significant role of hydrophobic interactions, and water bridges along with Hbonds. Gila 81 is the only residue showing 100% H-bond interaction fraction with ligand, whereas Leu 83 with 98% while the remaining 2% was contributed by water bridge. Dramatically, during the molecular dynamics simulation, Hbond interaction fraction of Lys 33 was reduced to 36% and also an ionic bond was formed by contributing 55.5% of the total interaction fraction. Leu 134, Tyr 15, Ile 10, Ala 31, Val 18, and Phe 80 formed hydrophobic interactions with ligand in which Leu 143 had the highest hydrophobic interaction fraction (79.5%). An interaction feature of various residues involved in binding is shown in Figure 7. Formation of water bridges is significant during 450-500 ps and 800-950 ps time intervals. Binding pattern of protein residues over a time period of 1000 ps was illustrated in the protein-ligand interaction diagram in Figure 8.
Root mean square fluctuation (RMSF) plots are useful for understanding the local changes in the protein chain during  the molecular dynamic simulations. RMSF of protein side chains and alpha carbons are depicted in Figure 9. With respect to the secondary structure of the protein, the RMSF plot was categorized into three regions as alpha helical (pink), beta-strand (blue), loops in white, and green lines, representing that the amino acids interacted with ligand. A major portion of the interactions was confined between 85 and 150 residues and also all the interactions were formed from the amino acids present in the beta-strands. Very few residues near 50 and 240 in protein backbone Ca exceeded 1.5 Å and most of the remaining amino acids lied below 1 Å . Fluctuations are relatively higher in the case of the side chain structure of protein. Higher fluctuations were observed in loop regions. Monitoring the RMSD of the simulated protein can give insights into its structural conformation throughout the simulation. Small protein changes of the order of 1-3 Å are perfectly acceptable and changes much larger than that indicate that the protein is undergoing a large conformational change during the simulation. Protein RMSD data of C-alphas and backbone structures in Figure 10 revealed that the conformational changes of protein over 1000 ps simulation were acceptable and also most of the protein deviations were confined between 0.9 Å and 1.20 Å . The ligand RMSD over right Y-axis indicates the ligand stability with respect to the protein and its binding pocket. The RMSD plot confirmed that the ligand remained intact  with the binding site of protein as at the end the plot of ligand was no larger than that of the protein (backbone and C-alphas). If the ligand RMSD were found to be significantly larger than that of the protein, ligand will be diffused from the initial binding site.
We also examined the molecular properties like ligand RMSD, molecular surface area (MolSA), solvent accessible surface area (SASA), and polar surface area (PSA) of the ligand during the course of molecular dynamics ( Figure 11). RMSD of ligand is almost consistent between 0.7 and 0.8 Å till 450 ps and later fluctuations in RMSD were significant. There is only 8 Å change in the final molecular surface of ligand at the end of 1000 ps. Similarly, initial and final polar surface areas also showed no significant change. However, a remarkable change was observed in the SASA during the simulation. Initial SASA of ligand 7aa was about 48 Å 2 and, after 1000 ps simulation, it increased to 76 Å 2 . It might be the reason for the formation of an H-bond with a water molecule (water bridge) during the last stage of the simulation. This large increase of the surface area might be the reason for the remarkable change in the conformation of the ligand.

Conclusion
In the current investigation, we have elucidated the common pharmacophore model (AADDRR.33) responsible for CDK2 inhibition from a series of Benzodipyrazoles. The identified pharmacophore model consists of two hydrogen bond acceptors, two hydrogen bond donors, and two aromatic rings. An efficient 3D QSAR model was established for the given set and we proposed that this model provides a possible structural basis for further drug investigations for CDK2 inhibition. Molecular docking studies revealed the binding possibilities of benzodipyrazoles in the kinase domain. Figure 10. Root mean square deviation of simulated structures at 1000 ps. RMSD evolution of a protein on left Y-axis and RMSD evolution of a ligand on right Y-axis. Figure 11. Fluctuations in the ligand properties during 1000 ps molecular dynamics study. RMSD, root mean square deviation of a ligand with respect to the reference conformation. MolSA, molecular surface area. SASA, surface area of a molecule accessible by a water molecule. PSA, polar surface area, i.e., solvent accessible surface area in a molecule contributed only by oxygen and nitrogen atoms.
Further molecular dynamic simulations clearly elucidated the conformational changes in the protein ligand complex during a certain period of time.