Molecular dynamics simulations of the free and inhibitor-bound cruzain systems in aqueous solvent: insights on the inhibition mechanism in acidic pH

The major cysteine protease of Trypanosoma cruzi, cruzain (CRZ), has been described as a therapeutic target for Chagas’ disease, which affects millions of people worldwide. Thus, a series of CRZ inhibitors has been studied, including a new competitive inhibitor, Nequimed176 (NEQ176). Nevertheless, the structural and dynamic basis for CRZ inhibition remains unclear. Hoping to contribute to this ever-growing understanding of timescale dynamics in the CRZ inhibition mechanism, we have performed the first study using 100 ns of molecular dynamics (MD) simulations of two CRZ systems in an aqueous solvent under pH 5.5: CRZ in the apo form (ligand free) and CRZ complexed to NEQ176. According to the MD simulations, the enzyme adopts an open conformation in the apo form and a closed conformation in the NEQ176–CRZ complex. We also suggest that this closed conformation is related to the hydrogen-bonding interactions between NEQ176 and CRZ, which occurs through key residues, mainly Gly66, Met68, Asn69, and Leu160. In addition, the cross-correlation analysis shows evidence of the correlated motions among Ala110–Asp140, Leu160–Gly189, and Glu190–Gly215 subdomains, as well as, the movements related to Ala1–Thr59 and Asp60–Pro90 regions seem to be crucial for CRZ activity.

In humans, the blood stream trypomastigote form can invade a large number of cell types (Clayton, 2010;Sajid et al., 2011). Once within a cell, the parasite transforms into the non-flagellar replicative amastigote form. Infection of visceral organs causes a number of mega syndromes, notably in the heart, colon, and esophagus, but heart failure is more common in infected patients (Clayton, 2010;Sajid et al., 2011). Before to host cell rupture, the amastigote form transform back into trypomastigote that can either re-invade a new host cell or be taken up via a blood meal by the insect vector (Bhattacharya, Hall, Li, & Vaidehi, 2008;Calderano et al., 2014;Cazzulo, Stoka, & Turk, 2001;Clayton, 2010;Sajid et al., 2011;Sayé, Miranda, di Girolamo, Camara, & Pereira, 2014).
The CRZ genes encode a signal peptide (18 residues), a propeptide (104 residues), and the mature enzyme (345 residues), composed by the catalytic domain and a carboxy-terminal extension (C-T, 130 residues) (Duschak & Couto, 2009). Native CRZ has a mixture of isoforms that prevented determination of its 3D structure by X-ray diffraction, but the 3D structures of the catalytic domain of the recombinant enzyme (without the C-T extension) in complex with diverse inhibitors (peptide and non-peptide; irreversible and reversible) have been determined (Duschak & Couto, 2009).
CRZ is very active in epimastigote form and is found in acidic lysosome-like organelles called reservosomes (Duschak & Couto, 2009). Because of its essential role into the parasite cell, CRZ has been described as a therapeutic target for Chagas' disease (Ferreira et al., 2014;Rogers et al., 2012;Wiggers et al., 2013).
In general, these observations are in line with recent experimental and theoretical results emphasizing the role of internal dynamics in cysteine protease function (Gaspari et al., 2010;Henzler-Wildman et al., 2007), and are in apparent contradiction with the rigid lock-and-key model widely accepted for protease inhibition, including cysteine proteases (Laskowski & Qasim, 2000).
Thus, hoping to contribute to this ever-growing understanding of the CRZ dynamic behavior before and after binding to the inhibitor (i.e. in the free and inhibitor-bound enzyme), we have performed MD simulations using two systems in aqueous solvent under pH 5.5 (acid): the free CRZ (i.e. the apo form, CRZ APO ) and CRZ bound to the NEQ176 inhibitor (i.e. the NEQ176-CRZ complex, CRZ NEQ ). This work represents the first study using MD simulation in a timescale of 100 ns to investigate the conformational change involved in the inhibition mechanism of CRZ, an important therapeutic target for Chagas' disease (Capriles & Dardenne, 2007;Durrant et al., 2010;Rogers et al., 2012).

Setup and MD simulation
As the starting structure of the MD study, we used the X-ray crystal structure of the catalytic domain (215 residues) of cruzain (CRZ) in complex with the Nequimed176 (NEQ176) inhibitor (Resolution = 2.62 Å), which is available in the Protein Data Bank under PDB ID 4KLB (Wiggers et al., 2013). The free (CRZ APO ) and bound (CRZ NEQ ) aqueous systems under pH 5.5 (i.e. the pH of the reservosome where the CRZ enzyme is located) were prepared deleting all crystallographic water and ion atoms, and, subsequently, adding hydrogen atoms to the CRZ crystal structure (PDB ID: 4KLB) (Wiggers et al., 2013), using PDB2PQR server (Dolinsky, Nielsen, McCammon, & Baker, 2004;Rostkowski, Olsson, Sondergaard, & Jensen, 2011). Thus, all the amino acids were modeled in the protonation state they exhibit as free amino acids in water under pH 7.0, with the exception of Asp50 and Glu151, which were modeled in their non-dissociated forms, and all histidine residues, which were considered as protonated.
The MD simulations were carried out using the GROMACS 4.6.5 package (Hess, Kutzner, van der Spoel, & Lindahl, 2008). The CRZ enzyme was solvated within a cubic box (box length 74.5 Å) containing 12,823 TIP4P water molecules (Jorgensen, Chandrasekhar, Madura, Impey, & Klein, 1983), in which 8 of them were replaced by 8 sodium counterions to neutralize the system. Then, the OPLS-AA parameter set (Kaminski, Friesner, Tirado-Rives, & Jorgensen, 2001) was employed for the CRZ, TIP4P water model (Jorgensen et al., 1983) and counterions. All simulations were carried out in the NpT ensemble, considering periodic boundary conditions. The temperature was maintained at 300 K, using the Nosé-Hoover thermostat coupling (Nose & Klein, 1983), and isotropic pressure coupling was applied, using the Parrinello-Rahman barostat algorithm (Parrinello & Rahman, 1981). Electrostatic interactions were treated with the Particle-Mesh Ewald method, using a cut-off of 1.0 nm, and Lennard-Jones interactions were also cut off at 1.0 nm (Hess et al., 2008). The LINCS (Hess, Bekker, Berendsen, & Fraaije, 1997) and SET-TLE (Miyamoto & Kollman, 1992) algorithms were used to constrain all bonds and water molecules, respectively. The time step for integration was set to 2.0 fs. Initial water relaxation dynamics were carried out during 2.0 ns for system equilibration, keeping the NEQ176-CRZ complex position restrained. Subsequently, production run was performed during 100 ns, without position restraints.

Analyses of the protein structure and dynamics in the free and bound systems
The protein structure and dynamics in the CRZ APO and CRZ NEQ aqueous systems were evaluated by the analyses of the secondary structure (SS) elements, root mean square deviations (RMSD) and fluctuations (RMSF), and radius of gyration (Rg) using the DO_DSSP, G_RMS, G_RMSF, and G_GYRATE programs, respectively, of GROMACS 4.6.5 package (Hess et al., 2008). With the exception of RMSF, calculated for the last 70 ns of simulation time, the other parameters (SS elements, RMSD, and Rg) were calculated for the entire simulation time (100 ns). All the graphs were plotted using XMGRACE 5.1.19 program (Turner, 2005).

Intermolecular hydrogen bonds analysis in the protein-bound system
The number of distinct hydrogen bonds (H-bond) formed by the NEQ176 inhibitor to the amino acids of the enzyme during the 100 ns of MD simulation of the CRZ NEQ system was calculated using the G_HBOND program of GROMACS 4.6.5 package (Hess et al., 2008).

Analysis of the correlated movements in the free and bound protein systems
The residue-by-residue cross-correlation in the CRZ APO and CRZ NEQ systems were calculated using the generalized cross-correlation approach applied to the atomic coordinates of the Cα-atoms (backbone), during the last 70 ns of MD simulation in aqueous solvent. This approach is based on the mutual information method developed by the Grubmüller group using the G_COR-RELATION program (Lange & Grubmuller, 2006) available in the GROMACS 3.3.3 package (Lindahl, Hess, & van der Spoel, 2001).

Principal component analysis (PCA) of the protein motion in the free and bound systems
The calculation of the covariance matrix and its diagonalization to get the associated principal components describing the direction and amplitude of the enzyme motions in the CRZ APO and CRZ NEQ systems, during the last 70 ns of MD simulation, was performed using the G_COVAR program of GROMACS 4.6.5 package (Hess et al., 2008).
The projection of the simulated structures was conducted on individual trajectories using the Cα-atoms. The G_ANAEIG program of GROMACS package 4.6.5 (Hess et al., 2008) was used for the projection. To avoid general translation and rotation of the enzyme structure, the trajectories were aligned based on the Cα-atoms. Thus, the porcupine plot was built with the MODVEC-TOR.py script based on the extreme conformations derived from the first principal component, using the G_ANAEIG module. The images of porcupine projections were generated using PYMOL 1.5.0.3 software (Schrödinger, 2013).

Results and discussion
3.1. Structure and dynamics of CRZ under pH 5.5 in the free and bound systems In order to shed some light on the difference in the dynamic behavior of the enzyme before and after binding to the inhibitor (i.e. free and inhibitor-bound enzyme), we have performed the first study using 100 ns of MD simulations of two CRZ aqueous systems under pH 5.5 (acid): CRZ APO (free CRZ) and CRZ NEQ (NEQ176-CRZ). In our MD study, the X-ray crystal structure of the CRZ catalytic domain in complex with the NEQ176 inhibitor (PDB ID: 4KLB) (Wiggers et al., 2013) was used as the starting structure. The CRZ catalytic domain contains 215 amino acids, including the catalytic triad residues (Cys25, His162, and Asn182) (Powers et al., 2002;Sajid et al., 2011).
The analysis of CRZ SS elements for each system (CRZ APO and CRZ NEQ ) shows there is no significant variation in the stability during the entire simulation time (100 ns), since the inhibitor binding does not change the protein fold (see Supplementary Figures S1 and S2).
By comparing the root mean square deviation values of all CRZ Cα-atoms (Cα-RMSD) relative to the starting structures (Figure 2(A)), we have observed that both systems, CRZ APO (black line) and CRZ NEQ (red line), achieve the stability at about 30 ns, presenting average RMSD values of 1.5 and 2.5 Å, respectively, in the last 25 ns. In the initial 20 ns of the simulation time, the RMSD values for both systems are practically identical, probably because the two systems (CRZ APO and CRZ NEQ ) were assembled from the same initial 3D structure (NEQ176-CRZ, PDB ID: 4KLB).
Then, to analyze the structure stability in each segment of the protein, we have calculated the root mean square fluctuations of all CRZ Cα-atoms (Cα-RMSF) for both systems, CRZ APO (black line) and CRZ NEQ (red line), over the last 70 ns of simulation (Figure 2(B)). In general, the Cα-RMSF plot shows that there is no major variation in the structure fluctuation with the exception of the loop3 (L3, region between Cys56 and Leu67) and the loop4 (L4, region between Asp87 and Thr107). Interestingly, the presence of the inhibitor increases the L3 and decreases the L4 fluctuation at approximately 1 Å.
In addition, the analysis of the radius of gyration (Rg) (Figure 2(C)) shows only a slight increase of .25 Å for CRZ NEQ system (red line) in comparison with CRZ APO system (black line), during the last 40 ns of simulation. Again, as in the RMSD analysis, Rg values for both systems are practically identical in the first half of the simulation time, probably due to the same reason previously discussed.

Analysis of hydrogen bonding interactions in CRZ NEQ system
The NEQ176 inhibitor moved away from its initial position, approximately 2.0 Å ( Figure S3), through 100 ns of simulation. During the last 70 ns of the CRZ NEQ simulation, the binding mode of NEQ176 was restricted by hydrogen bonds (H-bonds) mainly with Gly66, Met68, Asn69, and Leu160 residues (Figure 3). Interestingly, among the H-bonds described by Wiggers and co-workers for the NEQ176-CRZ complex, only the interaction between the inhibitor and Gly66 seems to be stable (Figure 1(A)-(C)) (PDB ID: 4KLB) (Wiggers et al., 2013).
In the CRZ NEQ system (Figure 3), the N1 atom of NEQ176 establishes an H-bond with the backbone carbonyl group of Gly66 (lifetime = 39.68 ns), which is the most persistent H-bond in this system, while N2 interacts with the backbone carbonyl group of Leu160 (lifetime = 27.60 ns). In addition, the inhibitor N4 and N3 atoms make two H-bonds with the backbone amino group of Met68 (lifetimes = 3.10 and 5.45 ns, respectively), where N4 of NEQ176 also interacts with the backbone amino group of Asn69 (lifetime = 19.60 ns) (Figure 3).
Other H-bonds are also observed in the NEQ176-CRZ complex, but during only a few picoseconds of the simulation time (Supplementary Figure S4). These interactions include all the nitrogen atoms of the inhibitor. Thus, N1 forms H-bonds with the Gly66 backbone amino group (lifetime = 3.49 ns), Gln159 side chain NH group (lifetime = .28 ns), and the backbone carbonyl groups of Ser64 (lifetime = .66 ns), Leu160 (lifetime = 1.14 ns), and Asp 161 (lifetime = 1.74 ns). N2 interacts with the Glu208 carboxylate group (lifetime = .004 ns). N3 makes H-bonds with the NH and OH groups of Ser61 (lifetimes = 1.0 and .08 ns, respectively), and with the side chain NH groups of Asn69 (lifetime = .02 ns) and Gln159 (lifetime = .82 ns). N4 establishes interactions with the OH and NH groups of Ser61 (lifetimes = .08 and .65 ns, respectively), the NH group of Gly66 (lifetime = 4.02 ns), and the side chain NH groups of Asn69 and Asn70 (lifetimes = .07 and .002 ns, respectively). Finally, N5 forms H-bonds with the side chain carbonyl groups of Asn69 (lifetime = .002 ns) and Gln159 (lifetime = .038 ns), the side chain NH group of Gln159 (lifetime = .01 ns), the carbonyl group of Leu160 (lifetime = .006 ns), and the carboxylate groups of Glu207 and Glu208 (lifetimes = 1.47 and .35 ns, respectively).
Moreover, the O1 atom of NEQ176 interacts with the backbone amino group of Gly66 (lifetime = .14 ns) and the side chain NH group of Gln159 (lifetime = .04 ns), while the O2 atom forms interactions with the NH group of Met68 (lifetime = .69 ns), the side chain NH group of Gln159 (lifetime = .02 ns), and the carboxylate group of Glu208 (lifetime = .83 ns) (Supplementary Figure S3).

Cross-correlation map analysis in the free and bound protein systems
To gain further insight into the conformational changes of the enzyme before and after binding to the inhibitor, we have investigated the correlation between the motions of the Cα-atom residues by performing cross-correlation maps analysis for both CRZ APO (Figure 4(A)) and CRZ NEQ (Figure 4(B)) systems, where these coefficients provide information about the correlation between the fluctuations of the positions of the Cα-atom residues.
Overall, we suggest that the binding of the inhibitor NEQ176 is capable of disrupting the correlated motions of the CRZ structure, which probably be associated with the activity of catalytic triad and, therefore, enzymatic function.

PCA in the free and bound protein systems
To better characterize the nature of collective motions observed in the cross-correlation maps of the CRZ APO (Figure 4(A)) and CRZ NEQ (Figure 4(B)) systems, PCA were also carried out on the last 70 ns MD simulation trajectories for both aqueous systems. PCA focuses on these dominant motions, which are represented as porcupine projections based on the ensemble of both CRZ APO ( Figure 5(A)) and CRZ NEQ (Figure 5(B)) systems onto the plane defined by the top two eigenvectors. These projections show the CRZ backbone movements, where the vectors indicate the direction and extent of the motion.
In both systems, the analysis of porcupine projections point out that the movements between the L4 and CRZ core are opposites and present different amplitudes. In the CRZ APO system ( Figure 5(A)), the movements provide an opening of the CRZ structure, exposing its active site (open conformation). However, for CRZ NEQ system ( Figure 5(B)), the enzyme promotes a closing motion of the active site (closed conformation), where the movements of helix 2 (H2) are restricted. Additionally, these results are in line with the radio of gyration analysis, where CRZ in CRZ APO system shows higher values (see Figure 2(C)).
The movements between the enzyme core and L4, related to the open conformation, may be necessary for the biological function of CRZ ( Figure 5(A)). Thus, the . The H-bond with Gly66 is also found in the NEQ176-CRZ complex crystal (PDB ID: 4KLB) (Wiggers et al., 2013). NEQ176 and CRZ residues are in stick model and colored by atom (carbon, cyan or white; nitrogen, blue; oxygen, red), and hydrogen atoms were omitted for clarity. NEQ176 binding, mainly by H-bonds, seems to induce conformational changes in CRZ structure, that are not typically sampled in the CRZ APO system, and the protein conformational population shifts toward those that can best accommodate the inhibitor interaction ( Figure 5(B)).

Conclusions
In this work, we have carried out 100 ns of MD simulation to study the structure and dynamic behavior of cruzain (CRZ) in a complex with a non-covalent inhibitor, Nequimed176 (NEQ176). First, by comparing the conformational changes between the aqueous CRZ APO (free enzyme) and CRZ NEQ systems during the simulation, we have found that, in the apo system, the protein adopts an open conformation, which may represents an active state of CRZ. However, for the CRZ NEQ system, the inhibitor binding changes the protein conformation to a closed conformation, probably representing an inactive state. Additionally, the inhibitor binding at the CRZ through the H-bond interactions with key residues (mainly Gly66, Met68, Asn69, and Leu160) is able to stabilize the closed conformation by blocking the enzymatic active site. Then, cross-correlation analysis show evidence of the correlated motions mainly among the Ala110-Asp140, Leu160-Gly189, and Glu190-Gly215 subdomains, as well as the movements related to Ala1-Thr59 and Asp60-Pro90 regions, seems to be crucial for CRZ activity. Therefore, we believe that these findings will significantly facilitate our understanding of the conformational dynamics in the course of CRZ inhibition, where the molecules that stabilize the closed conformation represent drug candidates to CRZ inhibitor and, consequently, an alternative treatment for Chagas' disease.