Studies on the interactions of SAP-1 (an N-terminal truncated form of cystatin S) with its binding partners by CD-spectroscopic and molecular docking methods

SAP-1 is a 113 amino acid long single-chain protein which belongs to the type 2 cystatin gene family. In our previous study, we have purified SAP-1 from human seminal plasma and observed its cross-class inhibitory property. At this time, we report the interaction of SAP-1 with diverse proteases and its binding partners by CD-spectroscopic and molecular docking methods. The circular dichroism (CD) spectroscopic studies demonstrate that the conformation of SAP-1 is changed after its complexation with proteases, and the alterations in protein secondary structure are quantitatively calculated with increase of α-helices and reduction of β-strand content. To get insight into the interactions between SAP-1 and proteases, we make an effort to model the three-dimensional structure of SAP-1 by molecular modeling and verify its stability and viability through molecular dynamics simulations and finally complexed with different proteases using ClusPro 2.0 Server. A high degree of shape complementarity is examined within the complexes, stabilized by a number of hydrogen bonds (HBs) and hydrophobic interactions. Using HB analyses in different protein complexes, we have identified a series of key residues that may be involved in the interactions between SAP-1 and proteases. These findings will assist to understand the mechanism of inhibition of SAP-1 for different proteases and provide intimation for further research.


Introduction
Cystatins are a family of cysteine proteases inhibitors, present in wide range of organisms with highly conserved structural folds. Despite more than 90% sequence similarity, the members of this group show highly dissimilar binding affinity/binding mechanism towards their binding partners. Cystatins are well known for their abundance in saliva and semen (Abrahamson, Barrett, Salvesen, & Grubb, 1986). Cystatin S is a potent inhibitor of papain-like cysteine proteases and considerably expressed in many body fluids (Turk, Stoka, & Turk, 2008). The specific role in these fluids is blurred but antibacterial and antiviral properties are present, consistent with a protective function (Turk et al., 2008). Several three-dimensional (3D) structures of other human cystatins are available in protein data bank (PDB) in native form and complexed with cysteine proteases (Alvarez-Fernandez, Liang, Abrahamson, & Su, 2005;Jenko et al., 2003;Kolodziejczyk et al., 2010;Renko, Požgan, Majera, & Turk, 2010), elucidating their inhibitory property. Despite these quite broad structural data, detailed understanding of what determines the specificity profiles of different cystatins is missing. The structures of cystatin S and their complexes with papain or other binding partner are still unrevealed. SAP-1 is an N-terminally truncated form of cystatin S in which initial eight residues are missing (Isemura, Saitoh, & Sanada, 1984). Previously, we have reported SAP-1 acts as a cross-class protease inhibitor in in vitro studies, and the interactions with different protease were examined through surface plasmon resonance (Yadav et al., 2013). In the present study, we employed computational methods to get insight into the interactions between SAP-1 and its binding partners. The 3D structure of the SAP-1 is generated by homology modeling, refined with molecular dynamics (MDs) simulation, validated through numerous web servers and finally complexed with papain, pepsin, trypsin, chymotrypsin, prostate specific antigen (PSA), and lactoferrin (LF) using ClusPro 2.0 server.
In addition, we also examined the interaction-derived changes in SAP-1 secondary structure, using circular dichroism (CD) spectra, and secondary structure elements were scrutinized. The paper discusses the critical binding sites and potential mode of action of SAP-1 with different binding partners. These findings might improve our understanding regarding fundamental inhibitory mechanisms of cystatins and offer hint for advance research.

Sequence retrieval
The protein sequence of cystatin S was retrieved from the UniProt Knowledgebase (UniProtKB) (Boutet, Lieberherr, Tognolli, Schneider, & Bairoch, 2007). For getting the sequence of SAP-1 for model building, the initial eight residues were removed from the sequence of cystatin S after detaching the signal peptide.
Template selection NCBI position specific iteration-basic local alignment (PSI-BLAST) (Altschul, Gish, Miller, Myers, & Lipman, 1990) was performed against PDB proteins with default parameters to select the template protein and in turn the template was downloaded from PDB.

Model building
Now to model the sequence on the basis of this template structure, we used Modeller 9v7 program (Eswar, Eramian, Webb, Shen, & Sali, 2008) and generated 20 homology-based models. The best scoring model was selected on the basis of lowest discrete optimized protein energy score. The model was further refined through multiple steps of loop modeling for the disordered loop regions using ModLoop server (Fiser & Sali, 2003). Finally, the potential energy of the model was minimized using steepest descent followed by conjugate gradient method after adding the CHARMm force field (Brooks et al., 1983) and MMFF94 partial charge (Halgren, 1996) in Discovery Studio 2.5.

MDs simulation
Following the homology modeling, MDs simulations of SAP-1were carried out with the Gromacs-4.0.5 suite of programs using the gromacs force field (Berendsen, van der Spoel, & van Drunen, 1995;van der Spoel et al., 2005). The protein was embedded in a cubic box containing the single point charge water model, which was extended to at least 9 Å between the protein structure and the edge of the box. The sodium (Na + ) counter ions were added to neutralize the system. The dimension of the cubic box was set to .8 nm. The system contains 8 Na + ions and 10,825 water molecules. Before the dynamics simulation, internal constraints were relaxed by energy minimization. The 3D structure was first energy minimized with 15,000 steps of the steepest descent method, and a 20 ps position restraining simulation was carried out, restraining the protein by 1000 kJ/mol. A harmonic constraint was applied to relieve close contacts before the actual simulation. Finally, a 5 ns MD simulation was performed. During the MD run, the LINCS algorithm (Hess, Bekker, Berendsen, & Fraaije, 1997) was used to constraint the length of bonds and the waters were restrained using the SETTLE algorithm (Miyamoto & Kollman, 1992). The time step for simulation was 2 fs. Using the leapfrog algorithm in the NPT ensemble, each of the components, namely, protein, water, and ion was separately coupled. The Berendsen temperature coupling and Berendsen pressure coupling (Berendsen, Postma, van Gunsteren, DiNola, & Haak, 1984) (the coupling constants were set to .1 and .5, respectively) were used to keep the system in a stable environment (300 K for temperature and 1 bar for pressure). During these steps, the particle-mesh Ewald method for long-range electrostatics, a 14 Å cut-off for van der Waals interaction, and a 10 Å cut-off for Coulomb interactions were used. Discovery Studio 2.5 was used for visual inspection and analysis.
The GROMACS suite of tools along with a secondary structure recognition algorithm (DSSP) (Kabsch & Sander, 1983) was used for all types of MD simulation studies. Electrostatic potential, molecular surface (Spassov & Yan, 2008), and structural alignments were done in DS suite. Microsoft Excel program was used for preparation of the graph and for structure visualization, DS Visualizer was employed.

Structure validation
To evaluate the model quality, we have checked several parameters like Ramachandran plot to assess the stereochemical quality of the protein structure (Laskowski, MacArthur, Moss, & Thornton, 1993), ERRAT score to analyze the statistics of non-bonded interactions between different atom types (Colovos & Yeates, 1993), Verify3D score to determine the compatibility of an atomic model (3D) with its own amino acid sequence (1D) (Lüthy, Bowie, & Eisenberg, 1992) and PROVE to calculate a statistical Z-score deviation for the model from highly resolved and refined PDB-deposited structures (Pontius, Richelle, & Wodak, 1996).
The proteases and SAP-1 structures were submitted as receptor and ligand, respectively, to ClusPro proteinprotein docking server (Kozakov et al., 2010) for complex prediction. The analysis was conducted using default parameters. The FireDock server (Mashiach, Schneidman-Duhovny, Andrusier, Nussinov, & Wolfson, 2008) was used for further refinement and scoring for global energy values of top 10 complexes and the model with the lowest energy was selected for further analysis. The FireDock web server performs high-throughput flexible refinement and scoring of protein-protein docking solutions. The method simultaneously targets the problem of flexibility and scoring of solutions produced by fast rigid-body docking algorithms.

CD analysis
The protein (SAP-1) was purified as described previously (Yadav et al., 2013) and the proteases of analytical grade were obtained from a local supplier. CD measurements were carried out with a Jasco spectropolarimeter (J-815) equipped with a Jasco Peltier-type temperature controller (PTC-424S/15). The instrument was calibrated with d-10-camphorsulphonic acid. Spectra were collected in a cell of 1 mm path length for far-UV CD, with scan speed of 100 nm/min and response time of 1 s. Each spectrum was the average of two scans. The CD spectra were recorded at a protein concentration of .5 μM over the wavelength range 190-250 nm. The CD results were expressed as mean residue ellipticity (MRE) in deg cm 2 d mol −1 which is defined as: where θ obs is the CD in millidegree, n is the number of amino acid residues, l is the path length of the cell, and C p is the mole fraction.

Results and discussion
Template selection and homology modeling In PSI-BLAST, all the queries are recognized as members of cystatin superfamily, and PDB entries from cystatin D and cystatin C show maximum similarity with the target protein. Among these, the PDB ID 1ROA (Alvarez-Fernandez et al., 2005) is chosen as template with 1.80 Å resolution, 60% sequence identity, and 98% sequence coverage (Table 1). An alignment between template and target protein is presented in Figure 1.

Stability of the model
To assess the stability of the modeled structure, we have analyzed multiple structural parameters in GROMACS, like DSSP profile, then root-mean-square deviation (RMSD) and root-mean-square fluctuation (RMSF) analysis, and then radius of gyration (R g ) studies. DSSP studies shows that the main cystatin fold of five anti-parallel β-sheets wrapped around a α-helix at the center as well as L1 (46-49) and L2 loops (95-98) remains conserved throughout the simulation period and regular fluctuations are observed in pre-(6-10) and post-(28-31) α-helix regions near N-terminal and AS loops (65-84) (Figure 2). The RMSD profile of the model gets stabilized with the time and the standard deviation lies around 1.5 Å (Figure 3(A)). R g almost follows the same pattern as RMSD (Figure 3(B)). From the RMSF analysis, it is evident that the β-sheets are the most stable secondary structures and loops, and terminal regions fluctuate more than others (Figure 3(C)). All these different analysis proves the reliability of the modeled structure.

Structure validation
The stereochemical qualities of the model are evaluated with Ramachandran's plot calculated using PROCHECK ( Figure S1, Supplementary material). The geometry of the overall structure is acceptable as there are no outliers in the Ramachandran plot and the stereochemical parameters for main-chain and side-chains are well defined in  the modeled structure. Ninety nine percent of residues fall into allowed regions. Verify3D examines the compatibility of an atomic model (3D) with its own amino acid sequence by assigning a structural class (i.e. a 3D profile) to each residue based on its position and environment (α, β, loop, polar, non-polar, etc.). As per Verify3D score, 81.82% of the residues are having average 3D-1D score >.2, which is above acceptable threshold, indicates towards the good quality of models. The non-bonded interactions between   different atom types in the second β sheet region mainly from residue number 35-39 consists of a twisted conformation that does not satisfy the error value threshold of 95% and so the overall quality score comes down to 93.069 ( Figure S2, Supplementary material). The PROVE score measures the average magnitude of the volume irregularities in the structure and calculates a statistical Z-score deviation for the model from highly resolved (2.0 Å or better) and refined (R-factor of .2 or better) PDB-deposited structures; here, we got average Z score .009 which is very close to the acceptable range ( Figure S3, Supplementary material). All these results together prove the reliability of the structure.

Protein-protein interactions
To further define the binding sites of SAP-1 for different protease, molecule docking was employed. The crystal structures of proteases were taken from the PDB. The stereoview of the docked poses of SAP-1 within proteases is provided in Figure 4. The interacting surface of the SAP-1 model is surrounded by negatively charged residues consisting of ASP (8, 10, 13, 101) and GLU (14,47,93,100). Thus, we can conclude that SAP-1 is able to interact well with electropositive surface of its binding partner. The docking results showed the existence of numerous hydrogen, hydrophobic, polar, and other forces between SAP-1 and proteases. To identify the residues involved in interaction in the protein-protein interface, we have analyzed the hydrogen bonding (Table S1, Supplementary material). The conserved regions around L1, L2 loop of the model are recognized by CASTp, cons-PPISP, and the InterProSurf server as binding pockets. Additionally, CASTp detects some binding cavities around AS loop and cons-PPISP, and the InterProSurf credits some N-terminal residues as protein-protein interaction sites. These results are in agreement with previous crystallographic and mutagenic studies (Bode et al., 1988;Kordiš & Turk, 2009;Stubbs et al., 1990). In docked complexes, we can observe that the SAP-1 does not directly interact with catalytic residues of proteases; rather it interacts with the residues in their neighborhood more frequently. Four definite zones can be located around the interactive site of SAP-1 that take part in interaction with proteases, namely (i) residues 1, 2 (I, I), (ii) residues 7-15 (Y-D-A-D-L-N-D-E-W), (iii) residues 44-55 (R-A-R-E-Q-T-F-G-G-V-N-Y), and (iv) residues 93-102 (E-I-Y-E-V-P-W-E-D-R). Among these, residues ILE2, ASP8, ASP10, ASP13, GLU14, GLU47, THR49, ASN54, GLU93, GLU100, ASP101, and ARG102 come common in most of the complexes (Table S1, Supplementary material).
Critical residues involved in the interaction between SAP-1 and different proteases were observed through docking studies (Table 2). ClusPro 2.0 server was used for protein-protein interaction study and it was found that the negatively charged domain of SAP-1 consists of anti-parallel β-strands interacting with positively charged domain of various binding partner. SAP-1 does not directly interact with catalytic residues of papain (CYS25, HIS159, and ASN175), pepsin (ASP32 and ASP215), trypsin (HIS63, ASP107, and SER200), and chymotrypsin (HIS57, ASP102, and SER195), but interacts in such a way that it almost completely blocks their active sites (Figure 4). Further, binding mode suggests that SAP-1 fits into the cavity formed by catalytic residues of the proteases. The residues of SAP-1 interacting with other proteases include GLU, ASP, and THR. Major interacting residues in the interaction with papain are GLU93 and GLU100 along with other interacting residues ILE2, THR49, and ASN54 ( Figure 5(A)); with pepsin major residues are GLU47 and ASP242 plus other interacting residues THR49, THR77, and THR218 ( Figure 5(B)); with trypsin major residues are ASP10, GLU14, and GLU47 including other residues GLY5, ASP8, THR49, TYR59, and ARG62 (Figure 5(C)); with chymotrypsin major residues are GLU10, GLU47, THR49, and THR174 with other residues ILE1, ASN54, GLU93, and THR222 ( Figure 5(D)).
The catalytic triad of PSA is composed of HIS57, ASP102, and SER195, with the kallikrein loop spanning Figure 6. 3D structure of SAP-1 docked with PSA. The residues involved in hydrogen-bonded network are shown in ball-and-stick representation. SAP-1 is shown in golden color and PSA in sky blue in the complex structure. the stretch 95-101, including an 11-residue insertion (95A-95K). The crystal structure of PSA with its substrate (KGISSQY) (Ménez et al., 2008) shows that residues TYR94, LEU95I, GLN170, GLN174, SER195, SER214, GLY216, SER217, GLU218, and ARG224 make hydrogen bond (HB) contacts with substrate. Our docked complex of SAP-1 with PSA predicts that ARG36, ARG38, ARG60, PHE95H, GLU147, THR151, SER192, GLY193, and GLU218 of PSA make H-bond contacts with SAP-1 (Table S1, Supplementary material). Similarly, here also, SAP-1 does not interact directly with the catalytic triad of PSA but does interact with surrounding residues. SAP-1 binds PSA in such a way that it almost completely blocks the substrate binding site of PSA ( Figure 6). LF is folded into two homologous globular domains, viz. N-and C-lobes and each lobe consists of two sub-domains, N1, N2 and C1, C2, respectively, that enclose a conserved iron binding site. SAP-1 binds to LF surface cavity formed between its two lobes and forms intermolecular HBs (Figure 7). The amino acid residues which participate in HB formation are ARG2, ARG3, ARG4, ARG30, and ASP265 of LF (Table S1, Supplementary material).

Far-UV CD studies
Far-UV CD spectroscopy is one of the most sensitive physical techniques for analyzing secondary structure and monitoring structural changes occurring in proteins (Venyaminov & Vassilenko, 1994;Yang, Wu, & Martinez, 1986). The proteases-induced unfolded states contain significant amount of secondary structure as judged by far-UV CD measurements. SAP-1 exhibits a large negative band at around 202 nm and a lower positive one at around 230-235 nm as depicted in Figure 8. The secondary structure content of the native SAP-1, estimated from the CD spectra obtained (190-250 nm), was 7.93% α-helix and 29.12 β-strands. This reflects the typical secondary conformation of the protein, and if the α-helix or β-strands contents changes, the spectra will change accordingly (Sanei, Asoodeh, Hamedakbari-Tusi, & Chamani, 2011;Sreerama & Figure 7. 3D structure of SAP-1 docked with LF. The residues involved in hydrogen-bonded network are shown in ball-and-stick representation. SAP-1 is shown in golden color and LF in sky blue in the complex structure. Woody, 2004). All these results of secondary structure content are in accordance with our modeled 3D structure of SAP-1.
To obtain insight into the structural changes; we followed the complexation changes of SAP-1 using far-UV CD spectroscopy. The CD spectra of SAP-1 with different proteases were measured in the far-UV region. The characteristic peaks were found to change by means of complexation with proteases. To analyze the structural changes of SAP-1 quantitatively, the raw CD spectra of SAP-1 with various proteases were scanned ( Figure 8) and secondary structure components computed using K2D (http://www.ogic.ca/projects/k2d2/) are listed in Table 3 (Perez-Iratxeta & Andrade-Navarro, 2008). The secondary structure content mainly shows a significant increase in α-helix structure content and decrease in β-strands in almost all complexes (Table 3). The secondary structure content mainly shows a significant decrease in β-structure content, with increase in disordered conformation (Bolotina, 1987). The results of CD analysis indicate that the free form of SAP-1 is more stable when compared to protease-bound form (Mansouri, Pirouzi, Saberi, Ghaderabad, & Chamani, 2013;Omidvar et al., 2011;Sattar et al., 2012). There was a major shift in secondary structure contents after complexation with papain, pepsin, and LF but in case of trypsin, chymotrypsin, and PSA small changes were observed. These results suggest the occurrence of conformational changes at the secondary structural level in the reaction between SAP-1 and protease.

Conclusion
In this work, we have explored the inhibitory interaction between SAP-1 and different proteases. CD data revealed a major shift in secondary structure contents after complexation with papain, pepsin, and LF but in case of trypsin, chymotrypsin, and PSA small changes were observed. By molecular modeling, MD simulation, and protein docking studies, we have predicted and validated the structures of the SAP-1 as well as their complexes with papain, pepsin, trypsin, chymotrypsin, PSA, and LF. The docked complexes elucidate structural details of interaction and indicate that some key residues are involved in the interactions. We can conclude that the more electropositive residues are present in proteases surface, whereas more electronegative residues are