Molecular docking uncovers TSPY binds more efficiently with eEF1A2 compared to eEF1A1

Testis-specific protein, Y-encoded (TSPY) binds to eukaryotic translation elongation factor 1 alpha (eEF1A) at its SET/NAP domain that is essential for the elongation during protein synthesis implicated with normal spermatogenesis. The eEF1A exists in two forms, eEF1A1 (alpha 1) and eEF1A2 (alpha 2), encoded by separate loci. Despite critical interplay of the TSPY and eEF1A proteins, literature remained silent on the residues playing significant roles during such interactions. We deduced 3D structures of TSPY and eEF1A variants by comparative modeling (Modeller 9.13) and assessed protein–protein interactions employing HADDOCK docking. Pairwise alignment using EMBOSS Needle for eEF1A1 and eEF1A2 proteins revealed high degree (~92%) of homology. Efficient binding of TSPY with eEF1A2 as compared to eEF1A1 was observed, in spite of the occurrence of significant structural similarities between the two variants. We also detected strong interactions of domain III followed by domains II and I of both eEF1A variants with TSPY. In the process, seven interacting residues of TSPY’s NAP domain namely, Asp 175, Glu 176, Asp 179, Tyr 183, Asp 240, Glu 244, and Tyr 246 common to both eEF1A variants were detected. Additionally, six lysine residues observed in eEF1A2 suggest their possible role in TSPY–eEF1A2 complex formation essential for germ cell development and spermatogenesis. Thus, more efficient binding of TSPY with eEF1A2 as compared to that of eEF1A1 established autonomous functioning of these two variants. Studies on mutated protein following similar approach would uncover the causative obstruction, between the interacting partners leading to deeper understanding on the structure–function relationship.


Introduction
Protein-protein interactions (PPI) are regulated by the organizational complexities of protein maintaining biological functions such as signal transduction, gene expression, and cell cycle regulation. Understanding 3D structural topology and domains organization help in deciphering the functional domains of the protein.
Despite the availability of structural information on protein-protein complexes in the database, predicting PPI by computational methods is one of the most challenging and resource intensive problems (Elcock, Sept, & McCammon, 2001). We opted to focus on a Y-linked gene known to play significant role in the process of spermatogenesis. The Y chromosome accounts for less than 2% and is the smallest one in the human genome (Vogt, 2005). This chromosome has an essential role in male sex development, spermatogenesis and regulation of fertility. Thus far, a total of 27 genes encoding proteins have been mapped (Skaletsky et al., 2003). One such protein is human testis-specific protein, Y-encoded (TSPY) consisting of 308 amino acid (33 kDa) located proximally on the Y chromosome (Yp11.2) close to putative gonadoblastoma locus (GBY) (Salo et al., 1995;Schnieders et al., 1996). TSPY is a member of proto-oncogene SET and NAP superfamily (Vogel et al., 1998). Members of this family harbor a highly conserved SET/NAP domain participating in various cellular functions, thereby regulating gene expression and cell cycle progression (Li & Lau, 2008;Oram, Liu, Lee, Chan, & Lau, 2006). In particular, proteins with SET/NAP domain have been demonstrated to be histone chaperones and are interactive partners of various transcription factors. TSPY, specifically, plays a significant role in controlling spermatogonial renewal, accurate transition of meiotic prophase during spermatogenesis, cell proliferation by abbreviating the cell cycle checkpoints, and expediting a rapid transition through the G2/M stages even under unfavorable conditions (Honecker et al., 2004;Oram et al., 2006;Schnieders et al., 1996). In humans, TSPY is mainly expressed in fetal and adult testis with limited expression in gonocytes and prespermatogonia during early development stages (Honecker et al., 2004;Lau, Li, & Kido, 2011). TSPY is upregulated in testicular carcinoma (Lau, Chou, Iezzoni, Alonzo, & Komuves, 2000), seminomas, and melanomas (Gallagher et al., 2005) but shows reduced expression in prostate cancer (Lau, Lau, & Komuves, 2003) and hepatocellular carcinoma (Yin et al., 2005).
TSPY interacts with eukaryotic translation elongation factor 1 alpha (eEF1A), a regulatory protein monitoring cell growth and expression of the oncogenic genes (Kido, & Lau 2008). The eEF1A is a GTP-binding protein, one of the most abundant protein synthesis factors. It exists in two variants, eEF1A1 and eEF1A2, encoded by two separate genes located on human chromosome 6q14 and chromosome 20q13.3, respectively (Lund, Knudsen, Vissing, Clark, & Tommerup, 1996). These variants have similar functions during protein synthesis (Tomlinson et al., 2005). However, small differences in the amino acids between the two eEF1A variants may affect their functions during protein synthesis. Deregulation or misexpression of translational machinery with eEF1A may lead to development of several types of cancers where translational control is seriously compromised. The eEF1A1 and eEF1A2 are oncogenes expressed in breast and lung adenocarcinomas and germ cell tumors (Chen & Madura, 2005;Kulkarni et al., 2007;Tomlinson et al., 2005). While role of eEF1A1 in cancer development is less clear, eEF1A2 is an established oncogene (Anand et al., 2002). Previous studies revealed various post-translational modifications in both the variants leading to alterations in their potential for oncogenic transformations (Dever, Costello, Owens, Rosenberry, & Merrick, 1989;Rosenberry et al., 1989).
A high resolution crystal structure of Saccharomyces cerevisiae eEF1A revealed various functionally important sequences spread across three structural domains of the protein (Andersen et al., 2000). Three-dimensional (3D) structures of human eEF1A1 and eEF1A2 modeled using the yeast counterpart, as template have shown highly conserved nature of these residues linked to the translation related functions of eEF1A (Soares, Barlow, Newbery, Porteous, & Abbott, 2009). Three consensus GTPbinding motifs were uncovered in domain I, whereas domains II and III are implicated in delivering aminoacylated-tRNAs to the ribosome besides acting as an initiator (Dever, Glynias, & Merrick, 1987;Slobin, 1980). Both the eEF1A variants bind with TSPY's C-terminus domain and regulate protein synthesis (Kido, & Lau 2008). However, no reports regarding the TSPY structural analysis and binding residues involved in such interactions having clinical implications are available.
In the present study, by employing bioinformatic tools, we deduced 3D structures of TSPY and eEF1A variants and uncovered protein-protein interactions between the two proteins along with their binding sites playing significant roles in germ cell development and spermatogenesis. This base line data is envisaged to be useful for comparison of the mutant elongation factors with that of normal ones elucidating precisely the effect of changed amino acids affecting the 3D structure.

Materials and methods
Target sequences, secondary structures, and domain analysis The amino acid sequence of Homo sapiens TSPY protein (Accession No: AAB51693.1) was retrieved from NCBI protein database. The sequences of Homo sapiens eEF1A1 (SwissProt Accession No: P68104) and eEF1A2 proteins (SwissProt Accession No: Q05639) were obtained from Swiss-Prot database. To identify regions of similarity/variation between eEF1A1 and eEF1A2, pairwise sequence alignment was assessed through EMBOSS Needle (Rice, Longden, & Bleasby, 2000). SOPMA was used for calculating the secondary structural features of the protein sequences (http://npsa-pbil. ibcp.fr) (Geourjon & Deleage, 1995). Domain analysis was assessed using CDD (Conserved Domain Database) (Marchler-Bauer et al., 2011).
Three-dimensional structure modeling and evaluation PSI-BLAST against Protein Data Bank (PDB) was carried out to identify their homologous structures based on the maximum identity with high score and lower e-value (Altschul et al., 1997). Template with 35% similarity (PDB ID: 2e50, crystal structure of SET/TAF-1beta/IN-HAT chain A, Resolution: 3.30 Å) with TSPY and yeast elongation factor (PDB ID: 1F60, crystal structure of the yeast elongation factor complex eEF1A:eEF1BA chain A, Resolution: 1.67 Å) with similarity of 80% for both eEF1A1 and eEF1A2, were selected to build 3D structures using Modeller 9.13 (Sali & Blundell, 1993). Models showing the closest C α RMSD (root-mean-square deviation) with respect to their templates upon superposition were selected for structural analysis and docking studies. Steriochemical properties were assessed by analyzing Ramachandran pot using PROCHECK through SAVES server (htttp://nihserver.mbi.ucla.edu/SAVES/) (Laskowski, Macarthur, Moss, & Thornton, 1993). The coarse packing qualities of the models were confirmed using WHATIF server (http://swift.cmbi.ru.nl/servers/ html/index.html) (Vriend, 1990). ProSA verified the protein structure from X-ray analysis, NMR spectroscopy and other theoretical calculations (Wiederstein & Sippl, 2007). The modeled structures were confirmed by Protein Quality Predictor server (ProQ) (Wallner & Elofsson, 2006).

Protein docking
Binding site residues were retrieved by CASTp server (Joe Dundas et al., 2006) and selected residues were used as constraints under protein docking, through HADDOCK web server (Dominguez, Boelens, & Bonvin, 2003). HADDOCK integrates restraints from mutagenesis and NMR chemical shift data in protein docked complexes which were then converted into a series of ambiguous interaction restraints (AIR). HADDOCK run resulted in 200 final protein docked complexes, which were then analyzed in the context of RMSD to the native structure. The HADDOCK Score was calculated based on sum of the van der Waals energy (E vdw ), electrostatic energy (E elec ), desolvation energy, energy from restraint violations (E AIR ), and the buried surface area (HADDOCK Score = E vdw + E elec + E AIR ). After subjection to simulation and water refinement the HADDOCK score was again calculated i.e. HADDOCK Score = 1.0E vdw + .2E elec + .1E dist + 1.0E AIR . Where E dist is the distance restraints energy contribution that includes both unambiguous interaction restraints and AIRs. The best docked complexes with lowest intermolecular energies were selected for further studies (Dominguez et al., 2003;van Dijk, Boelens, & Bonvin, 2005;van Dijk & Bonvin, 2006). Final docked complexes, TSPY-eEF1A1 and TSPY-eEF1A2 were analyzed using Protein Interactions Calculator (PIC) to confirm the interacting residues within 5 Å (Tina, Bhadra, & Srinivasan, 2007). All the structures and docked complexes were visualized using PyMol server (http://www.py mol.org; DeLano Scientific, San Carlos, CA, USA).

Amino acid variations and secondary structure
Human eEF1A1 and eEF1A2 showed 92% sequence identity and 98% homology at the amino acid level. Pairwise alignment identified amino acid variations at 35 positions between eEF1A1 and eEF1A2 proteins ( Figure 1). TSPY was found to have 46.10, 10.71, 2.92, and 40.26% of α helices, extended strand, β turns, and random coils, respectively. Whereas, eEF1A1 and eEF1A2 encompasses 32.25 and 28.08% α helices, 21.21 and 22.25% extended strand, 8.01 and 9.07% β turns, Figure 1. Pairwise sequence alignment between human eEF1A1 and eEF1A2. The two sequences showed 92% identity and 98% homology at the amino acid level. Identical residues are shown in gray background and variant amino acid differences are in yellow. One dot denotes semi-conservative amino acid residues and two dots, conserved substitutions. and 40.26 and 38.53% random coils, respectively. A stretch of 181 residues (108-288 position) pertaining to nucleosome assembly protein (NAP) domain was predicted in TSPY protein. Both eEF1A variants encompass three domains. Domain I (residues 9-238), domain II (residues 242-331), and domain III (residues 335-438) are made up almost entirely of the beta-strands; each domain contains two beta-sheets forming a beta-barrel ( Figure 2).

Quality of the predicted 3D structure
Models with RMSD of .39, .13, and .11 Å for TSPY, eEF1A, and eEF1A2, respectively, were selected for structural analysis and docking studies ( Figure 3). Structural localization of amino acid variations between eEF1A1 and eEF1A2 is shown in Figure 4. Ramachandran dihedral statistics for TSPY revealed a total of 88.1, 8.1, 2.7, and 1.1% residues in most favored, additionally allowed, generously allowed, and  disallowed regions, respectively. However, eEF1A1 and eEF1A2 were found to bear 95.2 and 93.9% residues in favorable regions, 3.7 and 4.8% residues in the additionally allowed ones, .5 and 1.1% residues in the generously allowed regions, and .5 and .3% in the disallowed regions, respectively. The Ramachandran plots show the excellent geometry of the models (SI Figure 1). The overall G-scores were −.13, −.06, and  −.05 for TSPY, eEF1A1, and eEF1A2, respectively, indicating that the predicted models were acceptable. Coarse packing quality of atoms in models gave the scores of −1.009 (TSPY), −.775 (eEF1A1), and −.769 (eEF1A2), suggesting that the models generated were of good quality. ProSA predicted Z-scores values were −6.21, −10.07, and −9.91 for TSPY, eEF1A1, and eEF1A2, respectively, evidencing highly reliable structures. Additionally, the energy plots showed the local model quality based on plotting energies as a function of amino acid sequence position (SI Figure 2).

Essence of protein docking
The predicted binding residues obtained by CASTp corresponding to TSPY, eEF1A1, and eEF1A2 proteins having more than 50% solvent accessible area were further analyzed by docking (Tables 1(A) and 1(B)). Ten best clusters generated by docking for each complex (TSPY-eEF1A1 and TSPY-eEF1A2) are shown in Table 2(A) and (B). Clusters with low HADDOCK scores and low RMSD values were considered as the best docked complexes. Model with HADDOCK score of −129.5 ± 8.6 kcal/mol and RMSD value of 1.2 ± .7 Å, was selected for TSPY-eEF1A1 complex ( Figure 5(A)). Moreover, TSPY-eEF1A2 chosen complex had a HADDOCK score of −114.4 ± 25.6 kcal/mol and RMSD value of .9 ± .6 Å (Figure 6(A)).

Docked complexes and protein interactions
Analysis of docked complexes (TSPY-eEF1A1/eEF1A2) using PIC server showed the presence of interacting residues elucidating extensive H-bonding interactions and interacting interface demonstrating the abundance of polar amino acid residues (Tables 3 and 4). TSPY was shown to interact with eEF1A1 having buried surface area of 1861.2 ± 17.6 Å 2 ( Figure 5(B)). High affinity was seen in TSPY-eEF1A2 complex with 2324.4 ± 280.7 Å 2 buried surface area (Figure 6(B)). Furthermore, TSPY's NAP domain was found to mediate interactions between the two proteins. Fourteen amino acid residues of eEF1A1 (2 were part of domain II and 12 belonged to domain III) showed interaction with 12 amino acid residues of TSPY's NAP domain (

Discussion
TSPY protein harbors a conserved SET/NAP domain and serves diverse cellular functions, including regulation of transcription, cell cycle, and chromatin assembly/remodeling (Honecker et al., 2004;Lau et al., 2000;Slobin, 1980). Currently, the factors mediating the main physiological functions of TSPY are not known. Interactions between TSPY and eEF1A could potentially enhance protein synthesis associated with cell proliferation and germ cell tumorigenesis (Anand et al., 2002). The present study is the first attempt towards the computational analysis of TSPY protein interaction with eukaryotic elongation factor 1 alpha (eEF1A) employing proteinprotein docking approach. As crystal structures were not available in databases, 3D models were predicted by homology modeling to ascertain their functional relevance. Both eEF1A1 and eEF1A2 are 92% identical and predicted to have similar tertiary structures. Our data is an extension of the earlier work showing amino acid variation at 35 positions between the two variants of eEF1A. The eEF1A1 is reported to be highly conserved indicating its mandatory role for critical biological functions. Substantial level of variation within the eEF1A2 suggests its adaptability during the course of evolution across the species (Soares & Abbott, 2013). Our in silico results demonstrate TSPY interaction with eEF1A variants, eEF1A1, and eEF1A2. The detected interacting amino acid residues could be the potential ones regulating the structural-functional relationship between TSPY-eEF1A1/A2 complexes thereby affecting the protein synthesis. Our work substantiates the earlier studies demonstrating the stronger binding of TSPY-eEF1A2 complex as compared to TSPY-eEF1A1 complex (Kido & Lau, 2008). Additionally, domain III  of both the eEF1A variants is not the sole interacting domain with TSPY but also domain II from eEF1A1 and domain I from eEF1A2 actively participate in the complex formation. The lysine residues belonging to eEF1A2 are reported to be involved in acetylation, ubiquitation, and methylation events occurring during post-translational modifications (Soares & Abbott, 2013). Therefore, the occurrence of six lysine residues in eEF1A2 interacting with TSPY suggests their possible roles in enhanced protein synthesis and cancer development through post-translational modifications (Lee, Ann & Wang, 1994;Soares & Abbott, 2013). A vast repertoire of phosphorylation events at serine, threonine, and tyrosine residues have been reported for eEF1A but with a greater affinity towards eEF1A2 than eEF1A1 variant (Panasyuk, Nemazanyy, Filonenko, Negrutskii, & El'skaya, 2008). While the tumorigenicity status of the eEF1A1 is less evident, eEF1A2 has been often implicated as an oncogene (Anand et al., 2002). The eEF1A is found to be in excess over the other translation elongation components. However, it is not yet clear if tumors with upregulated eEF1A2 would have a greater protein synthesis capacity (Tomlinson et al., 2007). Regardless of the high degree of amino acid identity between eEF1A variants, both forms might perform completely distinct non-canonical functions as reported earlier suggesting independent oncogenic roles of eEF1A2 (Anand et al., 2002;Kulkarni et al., 2007).
A deeper insight towards the interaction of eEF1A variants with TSPY and their binding residues would open up newer avenues to understand the mechanism of germ cell development, spermatogenesis, and tumorigenesis, highlighting its biological and clinical relevance. Taken together, bioinformatic approaches coupled with molecular biology prove to be more reliable tools addressing important events such as germ cell development and spermatogenesis.

Conclusions
We have developed homology models and set out protein docking to probe the interaction of TSPY and eEF1A variants. Despite having a significant sequence homology, both eEF1A1 and eEF1A2 displayed differen-tial binding affinity for TSPY, a testis-specific protein, and probably undergo different regulatory pathways. TSPY appears to regulate protein synthesis by binding specifically to eEF1A2 during spermatogenesis and germ cell tumor growth. Thus, the present study has uncovered a novel TSPY-eEF1A2 mediated pathway of protein synthesis in normal and pathological conditions that may have therapeutic implications.

List of abbreviations TSPY
Testis-specific protein, Y-encoded GBY Gonadoblastoma locus on the Y chromosome eEF1A Eukaryotic translation elongation factor 1 alpha PPI Protein-protein interactions RMSD Root-mean-square deviation

Supplementary material
The supplementary material for this paper is available online at http://dx.doi.10.1080/07391102.2014.952664.