KAISO inhibition: an atomic insight

In today’s world, the pursuit of a novel anti-cancer agent remains top priority because of the fact that the global burden of this malady is continuously increasing. Our work is no different from others in searching for new therapeutic solutions. To achieve this, we are looking into Epigenetics, the phenomenon governed by hypermethylation and hypomethylation of tumor suppressor genes and oncogenes, respectively. Our target for this study is an important intermediary methyl-CpG binding protein named kaiso. In our study, we have used the X-ray crystallographic structure of Kaiso for virtual screening and molecular dynamics simulations to study the binding modes of possible inhibitors. The C2H2 domain comprising LYS539 was used for screening the inter bio screen Database having 48,531 natural compounds. Our approach of using computer-aided drug designing methods helped us to remove the execrable compounds and narrowed our focus on a selected few for molecular simulation studies. The top ranked compound (chem. ID 28127) exhibited the highest binding affinity and was also found to be stable throughout the 20 ns timeframe. This compound is therefore a good starting point for developing strong inhibitors.


Introduction
According to GLOBOCAN in 2008, there were 12.7 million cancer cases, 7.6 million cancer deaths, and 28.8 million people living with it, indicating that the global burden of this malady is continuously increasing and yet no practical and general effective drugs or methods of control are available (Galal et al., 2011). Thus, the pursuit of novel anti-cancer agents remains top priority (Eckhardt, 2002).
Kaiso, also known as ZBTB33 (zinc-binding protein) is one of the MBPs implicated in tumorogenesis (van Roy & McCrea, 2005); it not only recognizes the consecutive pair of methyl-CpG sequence (Prokhortchouk et al., 2001;Prokhortchouk & Hendrich, 2002), but also has the binding specificity toward non-methylated sequence CTGCNA (Daniel, Spring, Crawford, Reynolds, & Baig, 2002). Like other MBPs, Kaiso also shows DNA methylation-dependent transcription repressor activity (Daniel et al., 2002;Prokhortchouk et al., 2001) which largely depends on its association with histone deacetylase-containing corepressor complex (N-CoR) (Yoon, Chan, Reynolds, Qin, & Wong, 2003). Further, the interaction between Kaiso and p120 catenin (p120ctn), a member of the armadillo family that owns β-catenin, is known (Daniel & Reynolds, 1999); p120ctn and Kaiso not only modulates the target genes of β-catenin, but also has an effect on canonical Wnt pathways (Daniel et al., 2002). These modulations indicate the importance of Kaiso in the development of cancer. Further, matrilysin, c-myc, cyclin D1, S100A4, CDKN2A (p16/INK4A), MTA2 and Wnt11 are the genes transcriptionally regulated by Kaiso, and the role of these genes in tumorogenesis is well established (Kim et al., 2004;Spring et al., 2005;Vinken et al., 2006). The elevated expression of Kaiso in mouse intestinal tumors and series of human colorectal tumors indicate the role of Kaiso in the etiology of tumor in the colon (Prokhortchouk et al., 2006). It was observed by Prokhortchouk et al. (2006) that in Kaiso-deficient mice, there is detectably altered expression of the above-mentioned genes and Kaiso-deficient mice also show resistance to intestinal tumorogenesis when bred onto an Apc Min/+ genetic background, indicating a role of Kaiso in tumor development and a potential target for therapeutic intervention (Cofre, 2012;Prokhortchouk et al., 2006). Further, its implication in colon cancer (Figure 1) development by regulating CDKN2A (p16/INK4A) has been reported by Lopes et al. (2008); they also showed that inhibiting Kaiso by siRNA treatment affects the expression of p16 and proposed that Kaiso inactivation can improve the efficacy of current treatment regimes.
Computer-aided drug designing (CADD) has become an indispensible tool in the field of novel drug discovery (Alvarez, 2004;Chikan, Bhavaniprasad, Anbarasu, Shabir, & Patel, 2013). The main aim of our study is to explore novel Kaiso inhibitor compounds of natural origin. To achieve the discovery of new therapeutic solutions we use CADD, in which we performed a multistep structure-based virtual screening (VS) of more than 48,000 natural compounds obtained from the interbioscreen (IBS) database, using the crystallographic structure of Kaiso. In the present study, we are using the crystal structure of Kaiso, available in the protein data bank (PDB), bearing accession code 4F6 N (Buck-Koehntop et al., 2012) and considering certain C2H2 domain amino acids as a key residue for docking studies.
VS has become a promising tool for identifying active lead/active compounds and has combined with the pipeline of drug discovery in most pharmaceutical companies. The absorption, distribution, metabolism, elimination (ADME) and toxicology (Tox) properties of a drug are conventionally viewed as part of drug development (Hodgson, 2001). Structures with unfavorable ADME/Tox have been identified as the major cause of failure of candidate molecules in drug development. In order to overcome the late stage failure in drug development, we used a web-based application called PreADME/T (Lee et al., 2003) which has been developed in response to a need for rapid prediction of drug-likeness and ADME/Tox data (van de Waterbeemd & Gifford, 2003).

Material and methods
The crystal structure of the catalytic site of human Kaiso protein was retrieved (PDB ID: 4F6 N) from the PDB. The retrieved structure was energy-minimized using Figure 1. The pictorial illustration of Kaiso activity at the promoter region of p16. In the left figure, Kaiso's C2H2 domain recognizes the hyper-methylated promoter of p16 leading to its downregulation and turning on of the cell cycle, a possible factor in the etiology of colon cancer. The figure on the right shows inhibited C2H2 domain not able to bind to the methylated promoter region leading to the upregulation of p16 and turning off of the cell cycle a factor that can have a therapeutic effect on colon cancer. SPDB viewer at the default cut off root mean square deviation (RMSD) value of 0.30 Å using OPLS 2001 force field.

Virtual screening
VS has become a promising tool for identifying active lead/active compounds and has combined with the pipeline of drug discovery in most pharmaceutical companies. AutoDock Vina, is used to perform the VS of the ligand data-set. Autodock Vina which uses an iterative local search algorithm, improves the average accuracy of binding mode prediction. The Vina plug-in (Seeliger & de Groot, 2010) for PYMOL (Delano, 2009) was used for the screening. Natural compounds from the IBS database, having 48,531 natural compounds were used for the VS of Kaiso inhibitors. From the initial data-set, a total of 50 compounds were selected based on their calculated binding affinity with C2H2 of Kaiso for further conformational analysis.

Molecular docking
AutoDock 4.2 was used for molecular docking. Auto-Dock uses binding free energy evaluation to find the best binding mode. AutoDock energy values were calculated by the characterization of intermolecular energy (consisting of van der Waals energy, hydrogen bonding energy, desolvation energy, and electrostatic energy), internal energy of ligand, and torsional free energy. The docking energy is obtained from the van der Waals energy and hydrogen bonding energy together, while the binding energy is built up from van der Waals energy and desolvation energy. The binding strength and the location of ligand in most of the cases can be decided by the electrostatic interaction between ligands and receptor. The hydrophobic interaction obtained from the docking can affect the agonistic activity to a larger extent.

Molecular dynamics simulations
Molecular dynamics (MD) simulations were carried out using GROMACS 4.5.5 (Berendsen, van der Spoel, & van Drunen, 1995). It is primarily designed for biochemical molecules like proteins, lipids, and nucleic acids that have a lot of complicated bonded interactions. The components of the simulation system were protein, ligand, and water. The GROMOS 43a1 force field was preferred for the simulation. Water molecules were represented using a simple point charge (SPC216) model. Six Cl − counter-ions were added by replacing water molecules to ensure the overall charge neutrality of the simulated system. The minimized conformation of Kaiso as well as the docked ligand was employed in the MD simulation process. GROMOS 43a1 forcefield was used for complex MD simulations (Hess, Bekker, Berendsen, & Fraaije, 1997). The force field parameters of the ligand were obtained from PRODRG web server (Schüttelkopf & van Aalten Schuttelkopf & Van Aalten, 2004). The aim of the MD simulation was to get more precise ligand-receptor models in a state close to the natural conditions and to explore the binding modes of the ligands further. Although molecular docking offers reasonable binding structures for investigated ligands, the MD simulation can account for even the smallest variances. Energy minimization process and position restraint procedure were performed in association with NVT and NPT ensembles. An NVT ensemble was adopted at constant temperature of 300 K, with a coupling constant of 0.1 ps and time duration of 100 ps. After stabilization of the temperature, an NPT ensemble was performed. In this phase, a constant pressure of 1 bar was employed with a coupling constant of 5.0 ps with a time duration of 1 ns. NPT ensemble was finished after pressure stabilization. The coupling scheme of Berendsen was employed in both NVT and NPT ensembles The Particle mesh Ewald method for long-range electrostatics, a 14 Å cutoff for van der Waals interactions, a 12 Å cutoff for Coulomb interaction with updates every 10 steps, and the SHAKE algorithm to constrain bond lengths were used (Darden, York, & Pedersen, 1993;Kholmurodov, Smith, Yasuoka, Darden, & Ebisuzaki, 2000;Hess, Kutzner, van der Spoel, & Lindahl, 2008). A final MD run was performed for 20 ns.
RMSD matrices and cluster analysis RMSD matrices had been computed on the concatenated trajectory by least square fitting on backbone atoms for each of the complexes. The RMSD matrices had been processed using GROMOS algorithm to extract clusters of similar conformations. A cluster analysis for the MD trajectories was performed to determine the preferred conformation and relative population of all three forms. g_cluster, a built-in analysis tool in GROMACS was used for the analysis of the trajectories. An RMSD cutoff of 0.25 was assigned for all the structures and all of them within the cutoff range were assigned as neighbors; the structure having the most number of neighbors was identified as the center of the cluster and separated from the data-set of structures (Vipperla, Febin Prabhu Dass, & Jayanthi, 2013). The process was continued till all the structures were categorized into each cluster.

Principal component analysis (PCA)
In order to identify the dominant motions in the protein lead complexes, we used PCA technique which takes the trajectory of a MD simulation and extracts the principal modes involved in the motion of the molecule (Amadei, Linssen, & Berendsen, 1993). A covariance matrix was put together using a simple linear transformation in Cartesian coordinate space and the diagonalization of the covariance matrix was carried out, that yields a set of eigenvectors which gives a vectorial depiction of every single component of the motion indicative of the direction of motion. Each eigenvector has a corresponding eigenvalue which describes the energetic involvement of each component to the motion (Vipperla et al., 2013). The protein regions which are accountable for the most significant collective motions can be acknowledged using this analysis. The GROMACS inbuilt tools g_covar & g_anaeig were used for performing PCA.

Molecular visualization & MD analysis
All the visualizations were carried out using Pymol, Ligplus, VMD (Humphrey, Dalke, & Schulten, 1996) and graphs were plotted using Grace Program (Turner, 2005). The trajectories were analyzed using the inbuilt tool in the GROMACS distribution.

Result and discussion
VS and in silico ADME/T The approach of VS and in silico ADME/Tox analysis using the crystal structure of DNMT1 was successful in identifying novel Kaiso inhibitors. From the 48,531 natural products selected for the study, subsets of lead-like natural products were selected using AutoDock Vina on the basis of their binding energy. In order to focus on natural products that could be promising for further development, ADME/Tox analysis was done using the PreADME/Tox tool. In this analysis, apart from checking their mutagenic and carcinogenic property, we shortlisted the compound based on their drug likeness (Lipinski, Lombardo, Dominy, & Feeney, 2012) and their  compliance with ADME and related properties of typical drugs (van de Waterbeemd, Camenisch, Folkers, Chretien, & Raevsky, 1998;Zhao et al., 2001). For drug likeness, we looked for molecular weight (MW), calculated octanol/water partition coefficient (logP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), and total polar surface area (tpsa). In these, the later five fall under Lipinski's rule of five (RO5). Table 1 shows the above described data of the top six compounds. The above-mentioned properties play a pivotal role in deciding the fate of the compound and analyzing them in silico not only saves time but also is economically less challenging. The permissible ranges of RO5 are MW ≤ 500 D, clog p ≤ 4.5, HBD ≤ 5, HBA ≤ 10.These ranges are associated with 90% of orally active drugs that have achieved phase II clinical status (Lipinski, 2004). The RO5 properties of our explored lead natural products are compliant with the value expected from typical drugs.

Molecular docking
The data-set containing 48,531 natural products were selected for VS. Based on the activity of the compounds on the key residue and their binding energy, they were categorized into further subsets of lead-like natural products using AutoDock Vina. Further screening was performed based on the mutagenic and carcinogenic properties of the compounds, drug likeness and their compliance with ADME/T. The top 50 compounds obtained from VS were further studied using AutoDock Vina. An optimized binding site obtained from the literature was used for performing the docking studies. Among the top 50 compounds, only six compounds were found to block the C2H2 domain by non-covalent interactions.
The six final shortlisted natural products were docked using the AutoDock 4.2 tool into the optimized catalytic binding site (Figure 2) of the protein, and the results were analyzed using Discovery Studio 3.5 client software. In Table 2, we have shown the results generated. Three of the natural products were found to form hydrogen bonds with the active site.
The complex 1 obtained with the lead 1 compound IB screen ID: 28,127 exhibited lowest binding energy of −5.28 kcal/mol and also showed a greater number of H-bonds with one of the key residues LYS539. The other five compounds exhibited a single H-bond with this key residue. The top two complexes (leads 1 and 2) were selected for performing MD simulations to understand the vital molecular changes that occur in the complexes.
The ligand-receptor interaction (covalent or non covalent) and the number of hydrogen bonds formed has Table 2. Autodock analysis of six Lead natural products. been displayed in Figure 3. Lead 1 (chem. ID 28,127) has a ΔG of −5.28 kcal/mol and forms seven hydrogen bonds with TYR550, TYR562, GLN563, and LYS 539 at the enzyme's binding pocket. The binding pocket in this case is of the following amino acids THR538, ILE542, TYR550, GLU535, TYR562, ASN561, GLN563, and LYS539. The O 11 and O 6 positions of the ligand are interacting with LYS539 and the bond formed between them has a distance of 2.56 and 1.90 Å, respectively. Figure 4(a) shows the mapping of lead 1 against the hits (TYR550, TYR562, GLN563, and LYS 539).
The second best interaction with the targeted site of Kaiso is lead 2 (chem. ID 51111). Its binding pocket comprises ASN561, TYR550, ILE560, ILE542, HIS543, LYS539, TYR536, PHE531, and HIS540 It has a ΔG of −7.75 kcal/mol and forms three hydrogen bonds. The interaction of our interest here is that of LYS539 and ligands at the H 28 position. The distance between the two is 2.25 Å. Figure 4(b) shows the mapping of hydrogen bonds with lead 2.

MD simulations
The top two lead complexes were simulated for a period of 20 ns in explicit water to understand the behavior of the complexes. The RMSD of the protein was calculated based on the backbone atoms. The trajectories of both the complexes converged within the first few nanoseconds ( Figure 5). The ligand RMSD was very much stable throughout the simulation in lead 1, but small fluctuations were observed in the lead complex at various time frames throughout the simulation that might be due to the changes occurring in the position of the ligand in the binding pocket, which was also evident from the cluster analysis output. The Root mean square fluctuation (RMSF) and Rg and intramolecular hydrogen bond analysis are described in Supplementary Information. The intermolecular H-bonds were relatively stable throughout the simulation in both complexes, but higher number of H-bonds were witnessed in the lead 1 complex. The average number of H-bonds per time frame was 3.5 in lead 1, whereas it was considerably lower in Lead 2 which has an average of 1.5 H-bond per time frame for the entire simulation period. The total energy of the two lead complexes was found to be stable. The interaction energy was found to be −64.84 kcal/mol for lead 1, which was significantly higher than the lead 2 which has −49.75 kcal/mol, suggesting much better stability.

RMSD matrices and cluster analysis
A separate ndx file was generated for both the trajectories consisting of both the protein and ligand groups for performing cluster analysis. An RMSD matrix generated over the backbone atoms for both the trajectories was used as input for the clustering analysis. A total of 2001 snapshots were generated for clustering for each trajectory based on the RMSD. The lead 1 trajectory was distributed into six clusters, with the most populated cluster representing 82.2% of the total conformations placed between the time frame of 2 and 20 ns, whereas the lead   2 trajectory was allocated into 14 clusters, with the most populated cluster representing 53.5% of the total conformations mostly between 3 and 16 ns time period.
The middle structure of each cluster was selected as the representative of the cluster and used for the analysis (Figure 6). The analysis of these representative structures has shown that both the lead compounds displayed minor changes in the position but were relatively stable in the binding region. The major difference was observed in the positioning of the third helix below the binding region.
From the clustering analysis, it was evident that both the lead compounds were relatively stable in the binding pocket and a small change in the overall structure of the protein was witnessed between the two complexes. To monitor and compare the overall dominant motions  between the two lead complexes, PCA was performed on both the trajectories. The covariance matrices were built, which predicts the correlated or anti-correlated motion between a pair of residues. The overall flexibility was calculated by the trace of the diagonalized covariance matrix. The PCA was performed on both the trajectories to monitor the overall strenuous motions of this protein.
The diagonal covariance matrices were built over the Cα atoms of the protein for each trajectory and was used to capture captures the degree of collinearity in atomic positions for 120 residues within the Kaiso structure for each pair of atoms. The covariance matrices predict whether the pair of residues have a correlated or anti-correlated motion for all native and mutant forms. In Figure 7, the dark red color clusters represent the residues moving together during the course of the simulation, whereas the light red color clusters indicates moderately correlated motions and the blue color clusters correspond to the residues that move opposite to each other. The diagonal represents the RMSF plot obtained from the earlier analysis. The overall flexibility was calculated by the trace of the diagonalized covariance matrix. The trace values for both the complexes was found to be 5.10942 (nm 2 ) for complex 1 and 12.8766 (nm 2 ) for complex 2. Among these values, complex 2 showed high values suggesting an overall increase in the flexibility than complex 1.
The eigenvalues obtained through the diagonalization of the covariance matrix elucidates the atomic contribution on motion. Similarly, the eigenvectors explains a collective motion accomplished by the particles. A total of 360 eigenvectors were generated for the entire trajectory, indicating that the first seven eigenvectors accounted for more than 85% of the overall system motion for both the complexes. Within the top eigenvectors, the first two accounted for a significant amount of overall motion in each case. After the assessment of the magnitude of the configurational space explored by the simulations, it was possible to attempt a consistent description of the free energy landscapes. This calculation of the free energy landscapes was based on the first two eigenvectors and their free energy profiles of each trajectory. The free energy profiles in the three-dimensional projections were similar, and each featured two main energy basins differing by about 3 kJ/mol. Besides, the first complex trajectory featured well-defined and lower energy basins, whereas the second complex trajectory exhibited more flexibility and spanned a larger configurational space. The two basins referred to as basin A and B can be seen in the Figure 8.

Conclusion
The interaction of Kaiso with methylated DNA in cancer and its implication in the tumorogenesis of intestinal neoplasia in mice makes it a potential therapeutic target, and our study here looks into the prospects of possible drugs from natural compounds. The rational approach of drug designing here gives us two novel drug-like compounds for further in vitro and in vivo studies. The MDS approach in this study was used to check the stability of the complexes shortlisted for further analysis. The PCA and covariance matrix analyses gave us an insight into the atomic movements over time and were pivotal in plotting the free energy land scape of both the complexes. The generated data has shown that they have immense potential in blocking the proposed C2H2 domain. The interaction of our interest in particular is that with LYS539; this particular amino acid in the C2H2 domain of Kaiso is selective toward methylated DNA (Buck-Koehntop et al., 2012). The blockage of this amino acid along with others is changing the paradigm of the DNA binding region; thus, we believe that inhibiting Kaiso can affect the expression of tumor suppressor genes without change in epimutation in their promoter region. This feature, we suppose, gives it an advantage over inhibiting DNMTs as the global hypomethylation achieved by blocking DNMTs might also induce tumorogenesis by activation of oncogenes, induction of chromosomal instability, and mutagenesis.

Supplementary material
The supplementary material for this paper is available online at http://dx.doi.org/10.1080/07391102.2014.974072.