Computational analysis of protein conformational heterogeneity

Abstract In this paper, we applied the molecular dynamics (MD) simulations and used thermolysin as the system to study the overall protein dynamics and side chain dihedral angles across the Arrhenius break. Simulations were performed at a high temperature of 36 °C which is above the previously observed Arrhenius break, and the lower temperature of 20 °C which is below the Arrhenius break. We observed different protein dynamics and conformational heterogeneity of side chain dihedral angles of thermolysin at the two temperatures. Our results indicated that certain regions of thermolysin have a higher level of fluctuation at lower temperature. A temperature dependent dihedral angles were also observed at the two temperatures. The changes observed in the protein indicated key areas of temperature sensitivity within the protein. Communicated by Ramaswamy H. Sarma


Introduction
Protein dynamics allow significant movement while maintaining biological structure and function, which are governed by atomic interactions in macromolecules, e.g. hydrogen bonding, electrostatic, and van der Waals interactions. Previously, various experimental approaches, e.g. X-ray crystallography, Kinetic Isotope Effect (KIE), Hydrogen Deuteron Exchange Mass Spec (HDX-MS), and neutron scattering have been applied to study the relationship of protein conformational heterogeneity and their temperature dependence (Dong et al., 2020;Doster et al., 1989;Hu et al., 2014;Keedy, 2015;Keedy et al., 2015;Liang et al., 2004;Liu & Konermann, 2008;Petsko & Ringe, 1984;R eat et al., 1998;Tilton et al., 1992;Zaccai, 2000). Interestingly, B-factors which reflect the amplitudes of thermal fluctuation and crystal disorder in protein crystallography showed a biphasic transition below its optimal temperature (Keedy, 2015;Tilton et al., 1992). Remarkably, studies utilizing HDX-MS strongly suggests a direct correlation between the time scale of conformational fluctuations and the turnover number of the enzyme (Liu & Konermann, 2008). Myoglobin, studied with neutron scattering showed extra mobility above the 180 K (Doster et al., 1989). Notably, a biphasic transition was observed in the Arrhenius plot of Alcohol Dehydrogenase (ADH), where above the break a temperature dependent KIE and a higher activation energy were observed (Nagel et al., 2011(Nagel et al., , 2012. Similarly, another thermophilic enzyme, thermolysin from Bacillus thermoproteolyticus rokko showed an Arrhenius break at a temperature range of 27-30 C which is much lower than its optimal temperature of 70 C (Dong et al., 2020).
Besides in vitro experimental methods, computational methods have also been used to study the protein conformational ensembles and the dynamics of proteins within a range of temperatures (Bowman & Geissler, 2014;Childers & Daggett, 2018;Glass et al., 2010;Jephthah et al., 2019;Merkley et al., 2010;Michetti et al., 2017;Sang et al., 2020;Zhang & Bruice, 2007). In the molecular dynamics simulation studies on B. Stearothermophilus alcohol dehydrogenase (ht-ADH) ES complex, a temperature dependent conformational change was observed and it was concluded that the temperature dependence of active site conformations is caused by both long-range and short-range motions of the ES complex (Zhang & Bruice, 2007). It was also shown that at higher temperature another thermophilic enzymes, nitroreductase and thermophilic serine protease which retain their native conformations and have a lower global flexibility to a much greater degree than their respective mesophilic enzymes, which indicated a higher stability that benefits their functions at higher temperature (Sang et al., 2020).
Previously, we used enzyme Thermolysin from Bacillus thermoproteolyticus rokko, a heat-stable zinc metalloproteinase (Fujii et al., 1983) as our system and studied the temperature dependence of protein structure and function via ambient temperature crystallography analyzed by Ringer and kinetics (Dong et al., 2020;Lang et al., 2010). Thermolysin hydrolyzes peptide bonds on the amino side of bulky hydrophobic residues such as Leu, Ile, Val and Phe (Fujii et al., 1983;Inouye et al., 1996). From the kinetics results, the activation energy of thermolysin was shown to be lower at temperatures above the Arrhenius break than below the break, indicating a better optimized protein conformational sampling (Dong et al., 2020). Interestingly, there were temperature dependent conformational changes of side chains which were uncovered by Ringer analysis (Lang et al., 2010). More specifically, there were temperature dependent dihedral angle distributions above and below the Arrhenius break which suggested a temperature dependent alternate conformations of side chains at various sites, e.g. Glu143 at the active site, Met120 close to the active site, Asn227 on the surface of the protein, that leads to various populations of functional conformational ensembles of thermolysin above and below the Arrhenius break ( Figure 1) (Dong et al., 2020). Within these three residues, the Glu143 is the general base that takes the proton from the active water molecule which initiates the nucleophilic attack on the carbonyl carbon of the peptide bond of the substrate. The Met120 has been shown to indirectly affect the catalytic efficiency (Menach et al., 2012). The Asn227 mutation N227D showed a 5-fold decrease in the catalytic efficiency (de Kreij et al., 2002).
In this paper, we applied the molecular dynamics (MD) simulations, studied this thermophile enzyme and compared the protein conformational heterogeneity at higher temperature of 36 C and lower temperature of 20 C, which are above and below the Arrhenius break respectively (Dong et al., 2020). Our results showed that overall thermolysin structure appeared with a quicker stabilization and regional lower fluctuations at the higher temperature. Alternate dihedral angles were also observed in the side chains of the targeted three residues.

MD simulation
Molecular dynamics simulations were performed using GROMACS (Abraham et al., 2016). The protein structures are obtained from the RCSB data bank (PDB:5T9I) the zinc was placed in the configuration before simulation. These were then converted into a GROMACS .gro file. The simulation space is then defined with the molecule centered in a decahedron shaped cell with a minimum distance of 1.5 nm from the box edge. For stability simulations all systems were solvated in 100 mM NaCl to establish the baseline behavior. Minimization of the molecular system was performed using the steepest descent method, and the potential energy is analyzed to make sure that the system reached its most stable configuration. This configuration is then constrained and equilibrated using a canonical ensemble (NVT) followed by an isothermal-isobaric (NPT) ensemble. The NVT equilibration is performed with the temperatures (20 C and 36 C) coupled by a Velocity Rescale thermostat, which is a modified Berendsen thermostat specific to GROMACS. The NPT thermo-dynamical ensemble was performed with the same velocity-rescale temperature coupling in addition to the Parrinello-Rahmen pressure coupling. As volume is no longer constrained, the Parrinello-Rahmen barostat allows the simulation cell box to change its shape accordingly (Parrinello & Rahman, 1981). Both NVT and NPT equilibrations were performed for a time duration of 1 ns. The fully equilibrated system is used as the starting configuration for the MD dynamic analysis/production simulation. The mdp file has temperature coupling groups was done as 'Protein Non-Protein'. All MD simulations were modeled for 185 ns.
Post-processing analysis methods and tools are applied to visualize and quantify the configurations of the proteinaptamer systems studied using Visual Molecular Dynamics (VMD) software and GROMACS (Humphrey et al., 1996). Visual analysis, though used throughout the simulation process, provided the first look into conformational changes, anomalies and binding occurrences that will provide guidance into further quantitative analysis of the dynamics simulation results of molecular level changes. The Radius of Gyration (R g ) is a measurement of the compactness of a structure (de Gennes, 1979). The R g decreases and increases as atoms in a molecule move together and apart. When the molecule is stable, the R g stabilizes at a finite value with little fluctuation indicating the minimal atomistic movements within the stabilized structure. The root mean square deviation (RMSD) was calculated to determine the changes in the overall structure over time. The root mean square fluctuations (RMSF) the of atomic positions in the trajectory are fit to the initial reference structure to determine the changes in specific amino acids over time (Abraham et al., 2016). The RMSF was calculated from the last 50 ns of simulation. Further comparison to in vitro experiments (Dong et al., 2020) was completed using v angle analysis via the gmx chi function in GROMACS.

Results and discussion
The Root-Mean-Square Deviation (RMSD) and Radius of Gyration (Rg) plots were used as indicators to investigate the overall changes of protein structures. The RMSD of the protein backbone from the lower temperature of 20 C showed more fluctuations at 50 ns, 100 ns, 135 ns and 150 ns, while at the higher temperature of 36 C it showed a gradual increase at a range of 25-50 ns before the system plateaus with little change (Figure 2, Table S1). In the gyration plot, the gyration of simulation 1 and 3 at the low temperature of 20 C changed in the first 25 ns before steadily increasing with a notable spike at 135 ns while at 36 C the system increases in the first 15 ns before plateauing quickly ( Figure S1, Table S1). At both the high and the low temperatures the averaged structure from the last 50 ns simulation showed minor changes of the overall structures ( Figure 3). However, the averaged structure at lower temperature of 20 C showed a higher flexibility in the loop regions including residues 124-131 and residues 187-202 (Figure 3). This concurs with the observations in the RMSD and Rg plots. Repeated simulations Rg values in both temperatures showed variability less than 0.2% among the low temperature and 0.9% with the high temperatures which show that changes in the overall structure were minimal (Table S1, Figure S1). The continuous changes of RMSD and Rg of the protein structure at lower temperature indicated that the overall protein structure has more conformational heterogeneity and remains 'unsettled' during the 185 ns period of simulation, where at higher   temperature the overall protein structure reaches the conformational homogeneity quickly and becomes 'settled' very quickly.
The Root-Mean-Square Fluctuations (RMSF) of the side chains showed similar fluctuations at higher and lower temperatures ( Figure 4A). However, there are residues in the loop regions, e.g. resi 32-38, 92-99, which have notably higher fluctuations (a cutoff of 1.5-fold higher fluctuation was used) at the lower temperature ( Figure 4B and 4C). Repeated simulations (sim 2 and sim 3) showed across all simulations general regions of peaks were consistent and the intensity of the fluctuations varied marginally ( Figure S2). RMSF reflects B-factor or temperature factor for simulation systems, which are used to identify the flexibility of side chains of the protein. The B-factor from the crystal structure was also mapped and used as a reference ( Figure S3), where the regions of resi 32-38 and 92-99 did not though show an obvious higher temperature factor, which could be due to the static disorder and scattering factors (Caldararu et al., 2019). On the other hand, the more flexible regions of thermolysin at the lower temperature of 20 C agrees with the previous HDX-MS study on thermolysin that was performed at 22 C (Liu & Konermann, 2008). Contrarily, when the temperature is higher at 36 C the fluctuations of these residues stay less temperature sensitive. Thus, the higher fluctuation of these residues at the lower temperature indicated an elevated temperature sensitive conformational sampling in these regions at the low temperature.
To look more closely at the temperature influences on the side chain conformational samplings, we analyzed the dihedral angles of amino acids from various regions of the protein at high and low temperatures across the proposed Arrhenius break temperature (Dong et al., 2020). Residue Asn227 which is on the surface of the protein has its v1 angle distributed at two angles across the Arrhenius break. Both at lower temperature and higher temperature, the v1 angle of Asn227 is distributed at À80 and À180 ( Figure 5A and 5D, Figure S4). How this alternate rotamer of Asn227 contributes to the lower activation energy and lower value of A factor at higher temperature is still unclear. However previous studies on thermolysin mutation N227D has shown a 5-fold less catalytic efficiency compared to the wild type, where the proximity of amine group of Asn227 to the active center was compromised (de Kreij et al., 2002). From our results, the Asn227 has two v1 angles of À80 and À180 , where the À80 is likely the more catalytically favorable based on its proximity to the active site. Residue Met120 which is close to the active site also has two dihedral angles distributed across the Arrhenius break ( Figure 5B and 5E, Figure S5). Interestingly, at higher temperature, the v1 angle of Met120 is mainly distributed at À180 and minorly distributed at À80 through the simulation. Contrarily, at lower The root mean square fluctuation (RMSF) plot of residues of thermolysin at temperatures above (colored in light orange) and below (colored in blue) the Arrhenius break temperature. (B) The residues of thermolysin which have 1.5-fold higher flexibility at low temperature compared to higher temperature are mapped on the structure (PDB used: 5T9I) in color light orange. (C) The comparative RMSF of the residues that have 1.5-fold higher flexibility at lower temperature than higher temperature. (This is from the average of three runs and RMSF of thermolysin was calculated from the last 30% (last 50 ns) of the simulation at higher temperature (light orange) and lower temperature (blue). temperature, the v1 angle of Met120 are mainly distributed at angles of À80 and À180 first, and then the À80 distribution almost disappeared after 75 ns. The À80 from our results could be more catalytically favorable which will be consistent with previous studies on mutation L144S, where the mutation stabilized the v1 angle À80 of Met120 via hydrogen bond, showed a 10 times higher catalytic efficiency than wild type (Menach et al., 2012). Residue Glu143 which is the active residue that initiates the nucleophilic attack on water molecule in the thermolysin catalysis mechanism, also has two distributed populations of dihedral angles of À90 and À180 across the Arrhenius break ( Figure 5C and 5E, Figure S6). At higher temperature, two repeated simulation runs showed the v1 angle of Glu143 is mainly distributed at À90 . On the other hand, at lower temperature, the v1 angle of Glu143 in the repeated runs is mainly distributed at both À180 and À90 ( Figure S6). As we know, when at À90 the Glu143 side chain is closer to the catalytic zinc, it favors the catalysis process. This is consistent with the previously observed lower activation energy of thermolysin at the higher temperature (Dong et al., 2020). Furthermore, our results agree with the previous research which showed that residues populate alternate rotamers (Bowman & Geissler, 2014;Fraser et al., 2009;Lang et al., 2010).
We know that at high-temperature environments, thermophilic proteins possess enhanced thermal stability in order to maintain their normal biological functions. For example, previous studies on the temperature dependent heterogeneity of thermophilic enzyme and their mesophilic counterpart showed that the thermophilic enzyme adopts a more compact conformation and lower global flexibility at the higher temperature (Merkley et al., 2010;Sang et al., 2020). The work from this paper probed a thermophilic enzyme's protein conformational heterogeneity with temperature via MD simulation method and analyzed the functional related dihedral angles distributions of the residues above and below the previously proposed Arrhenius break. The results showed that a thermophilic enzyme stays more compact and regional more stable in its optimized functional temperature range. This paper also showed that certain side chain rotamers have alternate conformations with a temperature dependence, which could affect the catalytic efficiency of thermolysin. For future studies, simulations with substrate analogues need to be included to further analyze the interactions of substrate analogues and the enzyme thermolysin.

Acknowledgements
Computational facilities of the Computational Molecular Engineering Lab and the ViCAR Center at North Carolina A & T State University.

Disclosure statement
No potential conflict of interest was reported by the authors.

Ming Dong
http://orcid.org/0000-0002-9121-6020 Figure 5. (A) From the 185 ns simulation, the X1 angle of N227 had two angles across the Arrhenius break. At both higher temperature (orange) and lower temperatures (blue), the X1 angle of N227 is distributed at À80 and À180 C from the simulation. (B) The X1 angle of M120 had two angles across the Arrhenius break of higher temperature (orange) and lower temperature (blue). At higher temperature, the X1 angle of M120 is mainly distributed at À180 and minorly distributed at À80 through the simulation. (C) The X1 angle of E143 differentiated at higher temperature (orange) and lower temperature (blue) at À180 and À90 across the Arrhenius break. (D) The close-up structure of the N227 at À180 (blue) and À80 (orange). (E) The close-up structure of the E143 at À90 (orange) and À180 (blue), and the M120 at À80 (orange) and À180 (blue).