Modelling family 2 cystatins and their interaction with papain

Cystatins are extensively studied cysteine protease inhibitors, found in wide range of organisms with highly conserved structural folds. S-type of cystatins is well known for their abundance in saliva, high selectivity and poorer activity towards host cysteine proteases in comparison to their immediate ancestor cystatin C. Despite more than 90% sequence similarity, the members of this group show highly dissimilar binding affinity towards papain. Cystatin M/E is a potent inhibitor of legumain and papain like cysteine proteases and recognized for its involvement in skin barrier formation and potential role as a tumor suppressor gene. However, the structures of these proteins and their complexes with papain or legumain are still unknown. In the present study, we have employed computational methods to get insight into the interactions between papain and cystatins. Three-dimensional structures of the cystatins are generated by homology modelling, refined with molecular dynamics simulation, validated through numerous web servers and finally complexed with papain using ZDOCK algorithm in Discovery Studio. A high degree of shape complementarity is observed within the complexes, stabilized by numerous hydrogen bonds (HB) and hydrophobic interactions. Using interaction energy, HB and solvent accessible surface area analyses, we have identified a series of key residues that may be involved in papain–cystatin interaction. Differential approaches of cystatins towards papain are also noticed which are possibly responsible for diverse inhibitory activity within the group. These findings will improve our understanding of fundamental inhibitory mechanisms of cystatin and provide clues for further research.

Mammalian type 2 cystatins show extensive diversity due to existence of multigene families, genetic polymorphism of coding sequences and occasional posttranslational modifications (Dickinson, Thiesse, & Hicks, 2002). S-type cystatins (cystatin S, SA and SN), member of family 2 cystatins, are secreted proteins (Ryan et al., 2010) and mainly expressed in submandibular and sublingual saliva as well as in tears, urine and seminal fluid but not in parotid saliva , Ryan et al., 2010. Although, S-like cystatins exhibit more than 90% sequence similarity, yet cystatin S is a significantly poorer inhibitor than cystatin SA or SN for papain family of cysteine proteases (Baron, DeCarlo, & Featherstone, 1999;Dickinson, 2002) and all three show no activity against legumain due to absence of Asn residue as seen at 40th position of Figure 1 (Dickinson, 2002). S-type cystatins have evolved in primates to inhibit a very narrow, specialized subset of host cysteine proteases in contrast to other salivary cystatins such as cystatin C, cystatin D and mainly protects mucosa and their secretions from dietary and environmental cysteine proteases (Blankenvoorde, Henskens, van't Hof, Veerman, & Nieuw Amerongen, 1996;Dickinson, 2002;Ruzindana-Umunyana & Weber, 2001).
Cystatin M/E is also a secreted glycoprotein distantly related to other human type 2 cystatins, and reveal a five-residue insertion in the α-helix 2/loop ( Figure 1). The expression pattern of human cystatin M/E is unique among cystatins, as high expression levels are largely confined to cutaneous epithelia (epidermis, hair follicles, sebaceous glands and sweat glands), whereas other cystatins are mainly ubiquitously expressed (Cheng et al., 2008;Zeeuwen et al., 2001). Cystatin M/E controls skin barrier formation by the regulation of both cross-linking and desquamation of the stratum corneum (Elias, 2005;Zeeuwen, 2004;Zeeuwen, Cheng, & Schalkwijk, 2009). It directly regulates the activity of cathepsin V, cathepsin L and legumain, thereby controls the processing of transglutaminases (Cheng et al., 2006). Cystatin M/E is also reported as a novel candidate tumor suppressor gene for breast cancer (Zhang et al., 2004).
Several three-dimensional structures of other human cystatins are available in Protein Data Bank (PDB) in both monomeric and dimeric form and some in complex with cysteine proteases elucidating their inhibitory activity. Some of these cystatins are active mostly in monomeric state and loses their inhibitory property on dimerization by domain swapping (Abrahamson & Grubb, 1994). Here, we make an effort to model four human cystatins, namely M/E, S, SA and SN by comparative modelling and verify their stability and viability through molecular dynamics (MDs) simulations and docking studies.

Methodology
The amino acid sequences (121 amino acids in each case) excluding signal peptide region of cystatin M/E, S, Figure 1. Sequence alignment of template and target proteins. Conserved regions L1 (Q-X-V-X-G) and L2 (P-W) and cysteine residues are marked. SA and SN (UNIPROT Accession No. Q15828, P01036, P09228 and P01037, respectively) and their basic informations like sequence length, molecular mass, signal peptide, reactive sites, amino acid modifications and disulphide bond positions were collected from the most publicly available domain UNIPROT (http://www.uniprot.org/). The obtained protein sequences were also characterized by Expasy-ProtParam tool (Gasteiger et al., 2005) and XtalPred server (Slabinski et al., 2007).
Template selection NCBI Position Specific Iteration-Basic Local Alignment (PSI-BLAST) (Altschul, Gish, Miller, Myers, & Lipman, 1990) was performed against PDB proteins with default parameters to select the template protein and in turn the template was downloaded from PDB. The templatetarget relationship was reconfirmed by multiple sequence alignment between template protein and query sequences via Clustal X (Thompson, Gibson, Plewniak, Jeanmougin, & Higgins, 1997) with the default parameters. Utopia CINEMA (Pettifer, Sinnott, & Attwood, 2004) was used to edit the alignment.

Model building
Refined three-dimensional homology models of all the four proteins were generated using a comparative modelling program MODELLER (Eswar, Eramian, Webb, Shen, & Sali, 2008) of Discovery Studio, based on given sequence alignment and selected template. MODELLER determined probability density functions (Sali & Blundell, 1993) using statistical and classical mechanics with reference to a database of known protein structures as a guide for spatial restraints. Then distance and angle constraints were derived from template strand together with the energy terms that governed the relative spatial arrangement of atoms within the models. Later, it was optimized in Cartesian space with conjugate gradients and MD methods. Five structures were generated per protein and model with lowest value of the normalized discrete optimized protein energy (Sali & Blundell, 1993;Sali, Potterton, Yuan, van Vlijmen, & Karplus, 1995) was chosen as the best model and consequently subjected to MDs simulation.

PDB file preparation
The second identical chain B and crystal waters were removed from the template protein structure, conformers were deleted on basis of occupancies and further the occupancies were corrected. Then template protein and the four raw evaluated structures of DS MODELLER were repaired by Swiss PDB Viewer (Guex & Peitsch, 1997) and further by XLEAP of AMBER tool 1.5 (Case et al., 2010) mainly to clear the nomenclature issues.

MDs simulation
In total, five systems (one template & four models) were prepared for simulation. All MD simulations were performed using GROMACS (GROningen MAchine for Chemical Simulations) 4.5.4 (Hess, Kutzner, van der Spoel, & Lindahl, 2008) simulation packages employing Charmm27 (Chemistry at Harvard Macromolecular Mechanics) force field (MacKerell et al., 1998). The systems were solvated by explicit SPC/E (extended simple point charge model) water model (Berendsen, Grigera, & Straatsma, 1987) in cubic boxes maintaining a minimum 10 Å distance from cube edge. Counter ions (Cl À and Na + ) were added by randomly replacing water molecules to achieve a neutral simulation cell. Each system was then minimized using a steepest descents integrator (Apol et al., 2011) in order to relieve from steric clashes or inappropriate geometry.
A 100 ps NVT equilibration was performed at 300 K with position restraints applied to all of the backbone atoms in order to relieve any bad contacts at the side chain solvent interface. The velocity rescale thermostat (Bussi, Donadio, & Parrinello, 2007) was used with a temperature coupling time constant (τT) of 0.1 ps. All bond lengths were constrained using the linear constraint solver (LINCS) (Hess, Bekker, Berendsen, & Fraaije, 1997) algorithm, which allowed for a 2 fs time step. Long-range electrostatic interactions were approximated using the particle mesh Ewald (PME) method (Darden, York, & Pedersen, 1993) with a fourth-order spline interpolation and a 0.15 nm Fourier grid spacing. The short range non-bonded interactions were defined as van der Waals (VDW) and electrostatic interactions between particles within 10 Å. Then position restraints were withdrawn and a 100 ps NPT simulation was conducted, relaxing the system into an isotropic Parrinello-Rahman barostat (Parrinello & Rahman, 1981) set to 1.0 bar of pressure in all directions and a pressure coupling time constant (τP) of 2.0 ps. The production MD that followed was performed with the velocity rescale thermostat and Parrinello-Rahman barostat, as well as the LINCS and PME treatments as described. Snapshots of the trajectory were taken in every 2 ps.
The GROMACS suite of tools along with a secondary structure recognition algorithm (DSSP) (Kabsch & Sander, 1983) was used for all types of MD simulation studies. Electrostatic potential, molecular surface (Spassov & Yan, 2008) and structural alignments were done in DS suite. Microsoft Excel program was used for preparation of the graph and for structure visualization, DS Visualizer and Chimera (Pettersen et al., 2004) was employed.

Binding site study: molecular docking
Computed Atlas of Surface Topology of Proteins (CASTp) (Dundas et al., 2006), cons-PPISP (Chen & Zhou, 2005) and the InterProSurf server (Negi, Schein, Oezquen, Power, & Braun, 2007) were utilized in order to locate the binding-site amino acids. To validate the binding sites and investigate the mode of binding, each of the four optimized models was docked with papain (PDB ID 1KHQ) (Janowski, Kozak, Jankowska, Grzonka, & Jaskólski, 2004) using ZDOCK program (Chen & Weng, 2002) in Discovery Studio 2.5. ZDOCK is a rigid-body docking algorithm that uses Fast Fourier Transform (FFT) to perform an exhaustive six-dimensional grid-based search in the translational and rotational space between the two molecules. ZDOCK searches orientational space by rotating the ligand around its geometric centre keeping the receptor protein fixed in space. For each sampled angle, the algorithm rapidly scans translational space using FFT based on a 3D grid with step size of 1.2 Å and only the ligand translation corresponding to the best geometric match between the two proteins is retained. The binding sites of papain (CYS 25, HIS 159 and ASN 175) (Drenth, Jansonius, Koekoek, Swen, & Wolthers, 1968;Kamphuis, Kalk, Swarte, & Drenth, 1984) and the possible inhibitory sites of cystatins (Bode et al., 1988;Kordiš & Turk, 2009;Stubbs et al., 1990) were defined to screen the potential docking configurations. A total of 2000 docked poses were generated for each of the cystatin-papain complex models. All the predicted complexes were ranked based on a scoring function  combining shape complementarity, desolvation energy and electrostatics, and also re-ranked by ZRANK program (Pierce & Weng, 2007) which uses a more detailed weighted energy function with VDW, electrostatics and desolvation terms to obtain the best docked pose. For further refinement through RDOCK  program, five high scoring complexes resulting from ZRANK ranking were selected on basis of known papain-cystatin conformations by binding site root-meansquare deviation (RMSD) calculation. RDOCK employs a three-stage energy minimization scheme, followed by the evaluation of electrostatic and desolvation energies using CHARMM polar H force field. Ionic side chains were kept neutral in the first two stages of minimization, and reverted to their full charge states in the last stage of brief minimization. The best docking pose with the lowest RDOCK energy was considered.
Position optimizations of all the four complexes were carried out by using all-atom CHARMM27 (MacKerell et al., 1998) force field with Adopted Basis Newton-Raphson minimization algorithm (Chu, Trout, & Brooks, 2003) in DS 2.5. In order to identify the amino acids at the protein-protein interface, hydrogen bond (HB) analysis was performed and interaction energies were computed in vacuum using implicit distance-dependent dielectrics for each of the minimized complexes in DS 2.5. Further, relative percentage of solvent accessibility was considered by NACCESS program (Pollastri, Baldi, Fariselli, & Casadio, 2002). Binding free energies between the papain and cystatins were determined at physiological ionic strength with the Molecular Mechanics Poisson-Boltzmann non-polar Surface Area (MM-PBSA) method (Grandis, Bizzarri, & Cannistraro, 2007;Srinivasan, Cheatham, Cieplak, Kollman, & Case, 1998) in DS 2.5. The procedure is based on a combination of molecular mechanics and continuum solvent approaches to give estimation for the binding free energy of a protein complex.

System specification
All the work was done using windows operating system except simulation, which was performed by Fedora 10 OS in IBM D10 workstations with Xenon quad core processor.

Results and discussion
All the proteins are hydrophilic in nature and except cystatin M/E, all including the template cystatin C are declared as unstable in both monomeric and dimeric forms by Expasy-ProtParam tool and XtalPred server.

Detection of disordered regions in protein
All the intrinsic disorder predicting servers detect consistently, the initial N-terminal region and the final two or three C-terminal residue as disordered region. Some other portions such as regions 76-83 for Cystatin M/E and 80-90 (approximately) for the rest of the cystatins, come common in RONN and DisEMBL which can be elucidated as AS loop from later studies (Figure 4). DisEMBL also marks the post-helix region of all proteins.

Template selection & homology modelling
In PSI-BLAST, all the queries are recognized as member of cystatin superfamily and PDB entries from cystatin C and cystatin D show maximum similarity with the target proteins. Among these the higher resolution (1.70 Å) structure, PDB ID 3GAX (Kolodziejczyk et al., 2010) is chosen as template despite the presence of some missing residues and mutations at L47C and G69C, as the rest of the structures are either in domain swapped dimeric form or an inhibitory site mutant. The mutations of 3GAX are reversed; the missing residues are fabricated in Discovery Studio Biopolymer module and subjected to 2000 step energy minimization with smart minimizer of Discovery Studio before using for modelling.
The identities of PSI-BLAST search and an alignment between template and target proteins are presented in Table 1 and Figure 1, respectively. It is evident that the identities are in safe zone and the queries can be homology modelled with this template (Rost, 1999). Secondary structure prediction of all five template-target proteins also shows a good correlation with template, which also recommends 3GAX as a template for these cystatins (Table SI, Supplementary Material).
Further, conserved regions and cysteine residues responsible for forming disulphide bonds in the target proteins are perfectly aligned with template ( Figure 1). It is also clear from alignment as the N-terminal region in the template is truncated; the models will be abnegated from the proper structure for the mentioned region. Hence, the target sequences fed to MODELLER starts from the 13th column of the alignment (Figure 1) for all target proteins.

Stability of the models
The stability of the modelled structures is first assessed with DSSP profile (Figure 2), then with RMSD, rootmean-square fluctuation (RMSF) analysis and radius of gyration (R g ) studies (Figure 3). It is evident from DSSP studies that the main cystatin fold of five anti-parallel β-sheets wrapped around a α-helix at the centre as well as L1 (45-50) and L2 loops (97-101) remains conserved throughout the simulation period in all the modelled structures (Figure 2). At the same time, regular fluctuations are observed in pre-(5-10) and post-(25-30) α-helix regions near N-terminal and AS loops (65-75) (Figures 2 and 3(B)). It might also be noted that the AS loop and post-helix regions are designated as disordered regions by the intrinsic disorder prediction servers.
The RMSD of the models gets stabilized with the time, shows a maximum standard deviation of 0.2 Å for cystatin M/E (Figure 3(A)) in comparison to other three modelled structures. A RMSF analysis reveals β-sheets are the most stable secondary structures in the models and loops; terminal regions fluctuate more than others (Figure 3(B)). Radius of gyration (R g ) almost simply replicates the RMSD pattern (Figure 3(C)) and records a highest standard deviation of 0.24 Å for Cystatin S among all four models. All these observations altogether prove the reliability of the structures.
In human cystatin C, under mild denaturing condition (lowering of pH or increase in temperature or both) or due to disruption of the hydrophobic core in presence of mutation (L68Q for human cystatin C, 69th column in Figure 1) or in presence of both, the L1 loop transforms into an extended β-strand and participates in domain swapping during dimerization (Janowski et al., 2001). In this study, no notable irregularity is observed in the specific site (61st position for cystatin M/E and 57th for the rest in DSSP Profile Figure 2) although cystatin M/E has a more hydrophilic methionine (M) residue at the same position (69th column in Figure 1) compared to valine (V) of others. But, L1 loop (45-50 in Figure 2(B)) of cystatin S shows somewhat a different nature from others (Figures 2(B) and 3(B)) which may be an indication towards dimerization.
Apart from these generalized notions, some specific features about these modelled structures can also be noted. Cystatin M/E, although showing the most near to perfect DSSP as well as RMSD profile (Figures 2 and  3(A)) among the four models, the AS loop region differs markedly from others in the length and by the absence of second α-helix immediately after the AS loop perhaps due to the presence of R-V-T-G-D portion as seen in the template-target alignment (Figure 1). This portion is also noted as disordered region from earlier study. In Cystatin S, the L1 loop shows some flip-flop between bend and turn after 2 ns run and regains its bend status towards the final stages of simulation ( Figure 2).
The residues 60-65 (the end portion of 3rd β-sheet) demonstrate a fluctuation of secondary structure after 14 ns ( Figure 2) and it contribute significantly in the RMSD plot (Figure 3(A)). Cystatin S also exhibits highest ebb and flow around AS loop (Figure 3(B)). More- over, cystatin S is the only model which shows a consistent increase in the radius of gyration over the time of simulation. All these features perhaps point towards its propensity towards dimerization by domain swapping.
C-terminal residues (105-109) of cystatin SA (114-121) are transformed into coil conformation after 12 ns and remain so (Figure 2(C)). Residues in between 75-80 shift towards turn from helix conformation after 17 ns (Figure 2(C)) and both of these regions (residue 72-109) are also noted as disordered region. These abnormalities as well have their part in RMSD (Figure 3(A)).
Finally for cystatin SN, DSSP detects a consistent βbridge instead of β-sheet present in the N-terminal region of the rest of the models (Figure 2(D)).
As an additional elucidation of stability, two disulphide bonds characteristic to family 2 cystatins can also be taken into account. Both, report a bond length of 2.03 ± 0.004 Å in all four modelled structures during 20 ns of simulation. The 3GAX monomer also shows good threading with all four models. The main-chain atom-based RMSD values of cystatin M/E, cystatin S, cystatin SA and cystatin SN with reference to template structure are calculated to be 2.286, 1.717, 1.359, and 1.302 Å, respectively (Figure 4). The functional domains of all models show good alignment with the template. All these results approve the stability of the modelled structure in contrast to Expasy-ProtParam tool and Xtal-Pred server prediction.
The electrostatic molecular surfaces of the final models differ from each other at physiological pH ( Figure 5) mainly due to their difference in isoelectric pH (pI) and variation in sequence (Figure 1). This gives an indication that binding strategies of these models towards receptor may possibly differ from each other.

Structure validation
The stereochemical qualities of the models are evaluated with Ramachandran's plot (Ramachandran & Sasisekharan, 1968) calculated using PROCHECK ( Figure S1, Supplementary Material). Backbone Psi and Phi dihedral angles of the refined structures reveal that at least 95% residues of each protein are in most favored regions and none is in the disallowed regions. Moreover, the overall G-factor of each model is greater than the recommended value (À0.50) which approves the correctness of the models (Table 2).
WHAT_CHECK provides an overall summary of the quality of the structure in comparison to a set of reliable structures. Structural Z-scores offer a number of constraint-independent quality indicators whereas RMS Zscores give an impression of how well the model conforms to common refinement constraint values. The detailed results were given in Table 2.
Verify3D examines the compatibility of an atomic model (3D) with its own amino acid sequence by assigning a structural class (i.e. a 3D profile) to each residue based on its location and environment (alpha, beta, loop, polar, non-polar etc.). The server reports more than 80% residues have a score > 0.2 for each refined model (Table 2 and Figure S2, Supplementary Material) which indicate towards the good quality of models.
ERRAT by analyzing the statistics of non-bonded atom-atom interactions in reference to a database of reliable high-resolution structures identifies a marked improvement in each structure after simulation (Table 2 and Figure S3, Supplementary Material).
Z-score RMS (volume Z-score RMSD) values, computed by PORVE program, measures the average magnitude of the volume irregularities in the structure, also gives near acceptable values (≈1) for all of the final structure (Table 2).
Another well-known validation server ProSA determines Z-scores as a measure of overall model quality by calculating the deviation in total energy of the structure with respect to an energy distribution derived from random conformations. The negative Z-scores for all models stand for the correctness of the models (Table 2).
Further, the packing quality of the model is assessed by ANOLEA, which uses a very sensitive and accurate atomic mean force potential to calculate the non-local energy profile (NL-profile) of a proteins structure. All negative non-local normalized energy Z-score establishes the accuracy of the models (Table 2 and Figure S4, Supplementary Material).
Finally, Molprobity server also does not report the presence of any bad angles or bonds in the modelled structures. All these results altogether prove the reliability of the structures. Table 2 summarizes all these results.

Binding site identification
The conserved regions around L1, L2 loop of each of the model are recognized by CASTp, cons-PPISP and the InterProSurf server as binding pockets. Additionally, CASTp detects some binding cavities around AS loop and cons-PPISP and the InterProSurf credits some N-terminal residues as protein-protein interaction sites. These results are in agreement with previous crystallographic and mutagenic studies (Bode et al., 1988;Kordiš & Turk, 2009;Stubbs et al., 1990).
Some physical properties of protein-protein interface of these four docked complexes (Figure 6) are analyzed and listed in Table 3. The interface accessible surface area (ASA) represents the difference in the water ASA between the complex and the single proteins; provides information on the protein-protein geometric fit. A higher ASA is, in fact, an indication of a higher shape complementary between the molecules. ASA values between 1000 and 2000 Å 2 are a feature of stable enzyme-inhibitor complexes (Nooren & Thornton, 2003) and all the docked complexes are in harmony with the observation. We can also note that the complexes mainly characterized by polar interfaces, which approve the nonobligatory nature of enzyme-inhibitor complexes and suggesting the presence of number of HBs. To identify the residues involved in interaction in the protein-protein interface, interaction energy (IE) and relative solvent accessible surface areas (rSASA) are calculated and HB analysis is performed (Table 4). The IE is the sum of the VDW and electrostatic (E) energy and the value is broken down per-residue for each residue located at the interface (Table SII, Supplementary Material). rSASA areas (%) of each residue in the complex (rSASc) and individual proteins (rSASi) are calculated in NACCESS and higher ΔSAS values (rSASc$ rSASi > 30) are listed in Table 4 and SIII, Supplementary Material. NACCESS uses tripeptide model to normalize the surface area to eliminate the effects of different residue types.
Residues exhibiting high IE mostly involved either in hydrogen bonding or in hydrophobic interactions or in both (Table 4 and SII, SIII, Supplementary Material). Short strong hydrogen bonds (SSHB)<2.7 Å are observed within some residues, (  (Table 4 and SV, Supplementary Material), which may strengthen the H-bond strength and sometimes show the catalytic effect in certain biosystems (Kim, Kim, Lee, Tarakeshwar, & Oh, 2002;Kim, Oh, & Lee, 2000). Moreover, all SSHBs forms triadic HBs (marked in Table 4 and SV, Supplementary Material) and we can safely predict that these residues play a very important role in the interaction.
Here, we can notice that the catalytic residues (CYS 25, HIS 159 and ASN 175) of papain (Drenth et al., 1968;Kamphuis et al., 1984) do not directly interact with cystatins, rather the residues in their neighbourhood interact with cystatins more frequently. Four definite zones can be located around the active site of papain that take part in interaction with cystatins, namely (i) residues 20-25 (G-S-C-G-S-C), (ii) residues 64, 67 (N, Y), (iii) residues 139, 142 (K, Q) and (iv) residues 156-159 (K-V-D-H). Among these residues SER21, GLY23, ASN64, LYS139, LYS156, VAL157 and ASP158 come common in most of the complexes and ranked highly in IE, HB and SASA analysis (Table 4 and SII-SIV, Supplementary  Material) whereas 156LYS, 139LYS and 142GLN come up as important triadic HB centres (Table 4 and SV, Supplementary Material). It is evident from the IE and ASA analysis (Tables SII and SIII, Supplementary Material) that cystatin M/E interacts differently with papain in comparison to S-type cystatins which shares a somewhat common pattern within them. For S-type cystatins, the conserved L1 (QXVXG) and L2 (PW) loop are mainly responsible for inhibition of papain but in case of cystatin M/E the Nterminal region contributes equally as seen from IE, HB and SASA analysis (Tables SII-SIV Material). In the L1 region residue 46, 47 and in L2 region the highly conserved Trp99/95 are key residues of binding in all four complexes and also take part in triadic HB formation (Table 4 and SII-SIV, Supplementary Material). From IE analysis (Table SII, Supplementary Material) we can say, cystatin M/E and cystatin SN interact much more strongly with papain than the others, which is also obvious from binding free energy results (Table 3). Cystatin S shows least number of hydrogen bonding (Table 3 and SIV, Supplementary Material) and interaction centres (Table SV, Supplementary Material), which goes well with its physiologically inactive and highly specific nature (Dickinson, 2002;Dickinson et al., 2002). Absence of N-terminal as well as C-terminal interactions and presence of less hydrophobic PHE46 instead VAL in L1 loop of cystatin S possibly make it a poorest inhibitor among S-type cystatins.

Conclusion
In this work we have explored the inhibitory interaction between papain and human S-type cystatins and cystatin M/E. By comparative modelling, MD simulation and protein docking studies, we have predicted and validated the structures of the cystatins as well as their complexes with papain, with a noted propensity of cystatin S towards dimerization. The docked complexes elucidate structural details of interaction and indicate that some key residues are involved in interaction. The protein interfaces are characterized with ASA values distinctive for most stable complexes and numerous HBs, hydrophobic contacts that add specificity to the interaction. Four zones are identified in papain mainly responsible for cystatin binding and dissimilar approaches for papain binding between cystatin M/E and S-type cystatins are also recognized. Docked complexes exhibit diverse spec-