GB1 hairpin kinetics: capturing the folding pathway with molecular dynamics, replica exchange and optimal dimensionality reduction

Abstract We have performed molecular dynamics (MD) and replica-exchange (REMD) simulations of folding of the GB1 hairpin peptide in aqueous solution. REMD results were consistent with a cooperative zipper folding model. 120 MD trajectories at 320 K yielded relaxation times of 1.8 and 100 ns, with the slower assigned to global folding. The MD folding/unfolding transitions also followed the cooperative zipper model, specifying nucleation at the central turn followed by consecutive hydrogen bond formation. Formation of hydrogen bonds and hydrophobic contacts were highly correlated. Coarse-grained kinetic models constructed with the Optimal Dimensionality Reduction (ODR) approach found a folding time of 3.3 and unfolding time of 4.0 Additionally, relaxation times in the 130–170 ns range could be assigned to formation of the transition state and off-path intermediates. The unfolded state was the most highly populated and, significantly, most heterogenous, assembling the largest number of microstates, primarily composed of extended and turn structures. The folded state was also heterogenous, but a to a lesser degree, involving the fully folded and partially folded in-register hairpins at early stages of the zipper pathway. The transition state corresponded to the nucleated hairpin, with central turn and first beta-sheet hydrogen bond, while the off-path intermediates were off-register partial hairpins. Our simulation results were in excellent agreement with experimental data on folded fraction, relaxation time and folding mechanism. The new findings from this work suggest a highly cooperative zipper folding mechanism, nascent hairpin transition state and ∼100 ns relaxation related to intermediate formation. Communicated by Ramaswamy H. Sarma


Introduction
The formation of secondary structure is a biochemical process of central interest, related to structure-function properties of proteins in the physiological environment.It plays a crucial role in protein folding and also is biomedically important by providing insights into diseases related to protein misfolding, such as Parkinson's, Huntington's and Alzheimer's (Howlett, 2003).Over the last decades, multiple studies have undertaken the analysis of secondary structure properties, including 3 D structure, thermodynamics and kinetics.The feature that is most significant and also most difficult to study remains the detailed mechanism of folding.Here combinations of experimental and modeling approaches have produced important insights, though some details remain undetermined (Eaton, 2021;Jas et al., 2001Jas et al., , 2014a)).
This study focuses on a computational analysis of the atomic-level details of the folding pathway of a model b-hairpin, the GB1 hairpin peptide, GEWTYDDATKTFTVTE, derived from residues 41-56 of the B1 domain of protein G. GB1 has been the subject of a large number of experimental studies (Eaton, 2021;Jas et al., 2014a;Lewandowska et al., 2010).Initially, the existence of the GB1 hairpin structure was determined by Blanco et al, using NMR (Blanco et al., 1994) and its stability was confirmed by multiple methods (Jas et al., 2014a), predicting a 30-50% folded population at room temperature.Folding relaxation times of 6 ls at 300 K and 2 ls at 320 K were found in kinetic experiments (Munoz et al., 1997).Analysis of available experimental data tend to support the broken zipper mechanism for the GB1 hairpin folding process (Lewandowska et al., 2010).Also, the importance of the turn for hairpin initiation has been widely recognized (Du et al., 2004;Eaton, 2021;Soranno et al., 2018).While the GB1 hairpin is a model system, providing an example of stable beta structure, the full B1 domain of protein G, consisting of 56 residues that form two hairpins and a helix, has been shown to form amyloid fibrils (Louis et al., 2005).
The b-hairpins have also been the subject of many fruitful computational studies.Earlier work has been reviewed in (Jas et al., 2014a;Lwin & Luo, 2006).Recently, longer molecular dynamics (MD) and replica-exchange MD (REMD) simulations have been performed, enabling more extensive exploration of the GB1 hairpin free energy landscape, folding paths and intermediates (Ahalawat & Mondal, 2018;Best & Mittal, 2011;Daidone et al., 2015;De Sancho et al., 2013;Hazel et al., 2018;Thukral et al., 2011).Four main mechanisms have been proposed for GB1 hairpin folding (Cai & Han, 2022;Lewandowska et al., 2010;Wei et al., 2004).The hydrophobic collapse model assumes the native hydrophobic core forming before the hairpin hydrogen bonds.In the zipper pathway folding starts at the central turn, followed by formation of the inter-strand hydrogen bonds and finally the hydrophobic contacts.In the broken zipper model, the turn appears first, followed by establishment of loose hydrophobic contacts and finally hairpin hydrogen bonds and a native hydrophobic core.Finally, the reptation mechanism assumes folding is initiated by out-of-register inter-strand hydrogen bonding.Analysis of available experimental data tends to support the broken zipper mechanism for the GB1 folding process (Lewandowska et al., 2010).Interestingly, the folding path of the full B1 domain of protein G has also been explored with multiscale modeling (Kmiecik & Kolinski, 2008).
Herein, we performed 1.5 ls of replica-exchange molecular dynamics (REMD) simulations to sample the full conformational space, generate a melting curve and describe a statistical folding pathway of the GB1 hairpin.Additionally, we generated a total of 120 ls of molecular dynamics (MD) trajectories, which were used to directly follow folding and unfolding events.Finally, we used the MD trajectories to build coarse-grained kinetic models of the GB1 free energy landscape with the Optimal Dimensionality Reduction (ODR) approach (Jas & Kuczera, 2018a).Our simulations predict a folded b-hairpin content of ca.45% at room temperature and a folding relaxation time of ca.1.8 ls at 320 K, in excellent agreement with experimental data.The REMD and MD results indicate that the hairpin folds following a cooperative zipper model, with hydrogen bonds and hydrophobic core forming together.The kinetic modeling predicted that a structure involving the central turn and first beta-sheet hydrogen bond was the transition state for folding, and that partial out-of-register hairpins were visited off the main folding path.Compared to previous work, our simulations have several novel features, including direct and ODR kinetic model analysis of transitions, longer simulation times, and use of the updated CHARMM36m protein force field.Our results agree with available experimental data and many previous simulations, and also predict new and interesting details of GB1 folding, as described below.

Methods
The studied system was the GB1 peptide, GEWTYDDATKTFTVTE, derived from residues 41-56 of domain B1 of protein G (Blanco et al., 1994).For this peptide we performed REMD and MD simulations, as described below.

REMD
An unfolded conformation was used to start the REMD simulation.This was solvated in a cubic box, then subjected to brief energy minimization, MD with restrained peptide and 1 ns NPT equilibration at 300 K and 1 bar.All replicas were initiated from this state.Overall, 1500 ns of REMD were performed, with 40 replicas in the temperature range of 300-500 K.The CHARMM36m (Huang et al., 2017) protein parameters and TIP3P (Jorgensen et al., 1983) water model were employed.Simulations were performed with GROMACS 5.1.4(Kutzner et al., 2015;Pronk et al., 2013) under NVT conditions.The simulated system included one GB1 peptide with standard termini (16 residues, 247 atoms), 2882 TIP3 waters, 11 Na þ and 8 Cl-ions.There were 8912 atoms in total, in a cube of 4.444 nm side.Exchanges were attempted and coordinates saved every 5 ps.

MD
Six independent 20 ls MD trajectories (120 ls in total) were generated for the 16-residue GB1 hairpin peptide, using CHARMM36m protein force field and TIP3P water model, using GROMACS 5.1.4.The starting structures were taken from the REMD simulation described above.CA atom RMSD clustering with 0.5 nm radius was performed for the 300 K replica and the central structures from the six clusters with highest populations were used to start the MD trajectories.The system composition and box size were the same as for the REMD simulations described above.MD was performed under NVT conditions, with a constant temperature of 320 K, with structures saved every 2 ps.In both MD and REMD the integration time step was 2 fs, all bonds involving hydrogen atoms were constrained.Direct non-bonded interactions were truncated at 1.2 nm, with a force switch smoothing for van der Waals forces and a long-range Particle-Mesh Ewald (Darden et al., 1999) correction included for electrostatics.
The simulation analysis included calculation of CA atom RMSD from to the reference experimental structure (residues 41-56 of PDB ID 2GB1 (Gronenborn et al., 1991)) presence of native hydrogen bonds, presence of hydrophobic sidechain contacts, solvent accessible surface area of the Trp sidechain, the end-to-end distance, clustering of conformers and finally ODR kinetic modeling (Figure 1).Backbone hydrogen bonds were counted as present when the distance between the C ¼ O oxygen and N-H nitrogen of the interacting peptide groups was below 3.6 Å.In calculation of the folded fraction based on presence of backbone hydrogen bonds, we used both populations of all eight hairpin hydrogen bonds (including both the beta sheet and the two bonds in the turn, HB1-HB8, NHB ¼ 8), and also only the six beta sheet hydrogen bonds, HB1-HB6, (NHB ¼ 6).

REMD results
The main results of the REMD simulations are presented in Figures 2-4.The folded fraction, measured by number of formed hairpin hydrogen bonds, reached a stable range after about 400 ns of the 1500 ns simulation (data not shown).
Figure 2 shows the calculated melting curves, as a function of temperature.The folded fraction is 45-48% at 300 K and 39-41% at 320 K.This is in excellent agreement with the experimental estimates of 30-50% at room temperature (Lewandowska et al., 2010).As is typical for simulations with TIP3P water, the melting curves are shallow and there remains some residual structure (0.5-1.7%) even at 500 K (Hegefeld et al., 2010).
The REMD data were used to generate a statistical description of the hairpin folding pathway, see in Figure 3, showing the populations of the individual native hydrogen bonds, HB1-HB8, calculated over subsets of structures with varying number of total hydrogens bonds formed, NHBT ¼ 0-8.This data indicates that the peptide follows the zipper model of folding.For structures with total hydrogen count NHBT ¼ 1,2 the turn hydrogen bonds HB7 and HB8 in the turn region clearly have the highest populations, nucleating the hairpin.At increasing values of NHBT the beta-sheet structure forms from turn to the ends, with consecutive formation of HB6, HB5, HB4, HB3, HB2 and HB1.This is in accord with some previous simulations and the zipper or broken zipper models (Lewandowska et al., 2010).Also, one of the turn hydrogen bonds, HB7, appears to be less stable and is not fully formed at intermediate stages of folding.
Further insights into the statistical folding pathway are provided in Figure 4. Figure 4(a) presents the free energy profile for unfolding along the hydrogen bond reaction coordinate.This shows a barrier of 3.0 kcal/mol between the unfolded and folded state.This barrier corresponds to formation of the first native sheet hydrogen bond.Comparison with Figure 3 and SI shows this hydrogen bond is HB6, and that the transition state involves formation of the hairpin and the first sheet hydrogen bond.Based on Figure 4(a) we can also see that the folded basin includes both the fully folded hairpin and a partial hairpin with at least one broken hydrogen bond.Comparison with Figure 3   hairpin.A slightly different free energy profile for folding has been proposed based on theoretical analysis (Eaton, 2021).Figure 4(b) shows the average end-to-end distance of the peptide as a function of temperature.There is a monotonic increase of distance with temperature, consistent with previous FRET measurements (Jas et al., 2014a) and with a zippertype unfolding mechanism.
Clustering of the 300 K replica structures by CA atom RMSD with a radius of 0.50 nm yielded 12 clusters.The central structures of the six most highly populated clusters are shown in Figure 5.
More details on the REMD simulations are presented in the Supplementary Information.

MD Results
In our six 20 ls MD trajectories at 320 K we found multiple examples of transitions between folded states, characterized by low RMSD from hairpin and �8 hydrogen bonds, and unfolded states, with high RMSD from hairpin and �0 hydrogen bonds (see SI for details).We used the trajectories to analyze the time scales of dominant motions, measure the cooperativity of the peptide folding reaction, follow the detailed folding/unfolding paths and build coarse-grained kinetic models of the process.
Analysis of hydrogen bond populations yielded an average folded hairpin fraction of 37-39% in the MD trajectories at 320 K, in reasonable agreement with the REMD calculations and experimental data.Secondary structure analysis with the DSSP algorithm was also performed on the MD trajectories, using GROMACS.The results showed populations of 52% coil, 23% beta-sheet, 14% turn, 1% alpha-helix and 1% 3 10 -helix.The beta-sheet fraction found this way is lower than found in our hydrogen bond analysis, due to algorithmic differences-DSSP does not count isolated beta-bridges as parts of sheet, while our averages include contributions from formed native hydrogen bonds in full and partial hairpins.Time scales of motions were estimated from bi-exponential fits to autocorrelation functions (ACFs) for wide range of quantities (for details see SI).Average ACFs for RMSD from folded form, number of hydrogen bonds and Trp3 solvent   (Hazel et al., 2018).
To describe the cooperativity in the hairpin folding, we calculated correlation coefficients, r, between length fluctuations between the eight hydrogen bonds, between the three hydrophobic contacts and between the contacts and hydrogen bonds.The results are presented in Figure 7, with more details in the Supplementary Information.The individual hydrogen bond fluctuations are all highly correlated, with individual r values of 0.72-0.98(SI).The average nearest neighbor correlation is 0.96, and there is only a small weakening with distance, with r ¼ 0.8 for fifth, sixth and seventhnearest neighbors (Figure 7).These results show a high level of cooperativity in the structural fluctuations -all hydrogen bonds tend to be elongated and all tend to be shortened at the same time.The three hydrophobic contacts also exhibit high correlation coefficients, of 0.85-0.91,and correlations between the three contacts and the eight hydrogen bonds are of comparable magnitude, 0.75-0.95(SI).Thus, the backbone hydrogen bonds and sidechain contacts tend to form and break concurrently.This suggests a folding mechanism involving a cooperative zipper, combining hydrophobic collapse for sidechains and consecutive hydrogen bond formation for backbone.The hairpin folding mechanism is thus both concerted and highly cooperative.This is quite different than alpha-helix folding, in which cooperativity spans the distance of 2-4 residues and the folding path may be described as a random walk through folded/unfolded conformations of individual residues (Jas et al., 2014b(Jas et al., , 2021a;;Kuczera et al., 2021).This difference has also been observed experimentally in the effects of viscosity on folding relaxation, where rates were inversely proportional to solvent viscosity for a hairpin and followed a more complicated fractional power law for a helix (Eaton, 2021;Jas et al., 2001).
The detailed pathway of the GB1 hairpin folding was analyzed by following the time course of all eight hydrogen bonds in the course of the six 20 ls MD trajectories.Overall, we found four unfolding and five folding transitions, involving changes between states with 0 and 6-8 hydrogen bonds.Separate plots focusing on the transition periods are provided in the SI.One example is shown in Figure 8.A majority of the transitions follow the zipper mechanism.In the   unfolding direction HB1 and HB2 break first, followed by HB3 and HB4, then HB5 and HB6 and finally the central turn HB7 and HB8 (see Figure 1).In the folding direction, the opposite order is followed.In one case an unfolding transition was initiated by breaking hydrogen bonds HB6-HB8, followed by unzipping HB1-HB5 (SI).There were also several incomplete transitions, involving interconversions between partially folded and partially unfolded structures.
We also generated coarse-grained kinetic models of the GB1 dynamics based on the MD trajectories, using the Optimal Dimensionality Reduction (ODR) approach (Hummer & Szabo, 2015;Jas et al., 2021b;Jas & Kuczera, 2018a, 2018b).This method has been described in detail previously; an outline of the steps is also provided in the SI.Briefly, the MD trajectory structures were clustered by CA atom RMSD to generate a set of discrete microstates (Jas et al., 2021a;Jas & Kuczera, 2018a).Transition counts and transition rates between microstates were then found using the lifetime-based approach (Buchete & Hummer, 2008).Finally, kinetic coarse graining was performed with the PCCA þ algorithm (Kube & Weber, 2007) as implemented in the EMMA package (Senne et al., 2012), where sets of microstates are grouped together into a small number of coarse-grained aggregate states.Finally an optimal reduced-dimensionality rate matrix is determined for the interconversions between the aggregates with ODR (Hummer & Szabo, 2015).About 24 models with different number of starting microstates/clusters and different number of coarse-grained dimensions were generated, presented in detail in the SI.Examples of four models, with number of clusters N c ¼54 and dimensionality N ¼ 2-5, are presented below.
In our kinetic modeling approach, we employ a free parameter, the cutoff radius R c , to perform trajectory discretization.A transition between microstates i and j is counted only when the trajectory moves from a position with CA RMSD R < R c from cluster i center to a position within R < R c from cluster j center.The value of R c is adjusted so that the slowest relaxation time in the full microstate kinetic matrix K agrees with the ACF decay analysis for the simulation.For N c ¼54 clusters the optimal value of the cutoff was R c ¼0. 20 nm.Table 1 shows the correspondence of the relaxation times from diagonalization of the full kinetic matrix K (54 � 54) and the reduced rate matrices R (NxN), N ¼ 2-5.The agreement is excellent, showing that the ODR approach is able to capture the slowest timescales of motion very well.Further, these data indicate a clear time separation between the slowest relaxation, at ca. 1800 ns, and the remaining ones, at 170 ns and below.We also note that while the 1800 ns value is the result of R c adjustment, there is also agreement between the second-slowest ODR time scale (170 ns) and ACF bi-exponential fitting results (50-140 ns).
The structural analysis of the ODR kinetic models is performed in two ways.To roughly categorize the coarsegrained states, we use the representative structures, shown next to the arrows representing transitions.These are the central structures of the most populated microstate/cluster that is part of each aggregate, coarse-grained state.To illustrate the structural heterogeneity of each aggregate state, we also provide the structural ensemble, encompassing the central structures of all clusters in a given aggregate, shown below the kinetic scheme.In the discussion below we will consider the fully folded hairpin, which exhibits the same set of hydrogen bonds as the experimental structure, and two kinds of partially formed hairpins: in-register, involving a subset of hydrogen bonds between the same residue pairs as in the experimental structure, and off-register, with hydrogen bonds between different residue pairs.
The two-state kinetic model (N ¼ 2) of GB1 folding with N c ¼ 54 is presented in Figure 9.The two states are classified as the unfolded, or 'coil', and the folded, or 'hairpin', based on their representative structures.The 'coil' is the most populated (DG ¼ 0), with a lifetime of 4.0 ls, and the 'hairpin' is slightly less stable (DG ¼ 0:1 kcal/mol), with a lifetime of 3.3 ls: The folding time is thus 3.3 ls and the unfolding time 4.0 ls, and the folding relaxation 1.8 ls: As expected, the 'coil' aggregate state is the most heterogenous, made up of 50 individual clusters/microstates, while the 'hairpin' aggregate state includes four clusters.Interestingly, the folded, or hairpin state involves not only the fully formed hairpin, but also two partial in-register hairpins and a turn.This agrees with the REMD folding free energy profile in Figure 4(a).The hairpin fraction and folding and unfolding times from the kinetic model are essentially identical to those determined from the MD trajectory analysis above.
The three-state kinetic model (N ¼ 3) of GB1 folding with N c ¼ 54 is presented in Figure 10.Besides the 'coil' and 'hairpin' states, there also appears an intermediate (I1) with a lifetime of 170 ns and DG ¼ 3:5 kcal/mol above the coil.The intermediate state is made up of a single cluster, forming a partial hairpin with two off-register hydrogen bonds, and located off the folding pathway, representing a subset of unfolded states.
The four-state kinetic model (N ¼ 4) involves the coil, hairpin and two intermediates, shown in Figure 11.Intermediate I1 is the same as for N ¼ 3. The new intermediate, I2, also consists of a single cluster, has a lifetime of 130 ns and DG ¼ 3:7 kcal/mol.It forms a nascent hairpin, with formed central turn and one hairpin hydrogen bond, HB6.Transition Path Theory analysis performed with EMMA (Senne et al., 2012) yields a committor value of q ¼ 0.504 for this state, indicating that this is the transition state for GB1 hairpin folding (Dellago et al., 2002;Vanden-Eijnden, 2014;Weinan & Vanden-Eijnden, 2010).The value of DG ¼ 3:7 kcal/mol for the transition state agrees quite well with the 3 kcal/mol found in the free energy profile based on REMD results (Figure 4(a)).The location of the transition state, early in the folding process, also agrees with Figure 4(a).The kinetic scheme obtained here is quite similar to the four-state kinetic model developed by Ahalawat and Mondal using an AMBER potential (Ahalawat & Mondal, 2018).The barrier height is comparable to that found by Gumbart and coworkers with older CHARMM potentials (Hazel et al., 2018).
The five-state kinetic model (N ¼ 5) is analogous to the N ¼ 4 and is not shown in detail.This model includes the coil, hairpin and three intermediates, I1-I3.States I1 and I2 are the same as for N ¼ 4. The new I3 intermediate consists of a single cluster, with a lifetime of 130 ns and DG ¼ 3:7 kcal/mol.It involves a turn with a single off-register hydrogen bond, and lies off the folding path.Thus, I1 and I3 have similar properties, though different structures.The higher dimensionality models, N ¼ 4-5, suggest that folding occurs  over multiple paths.Also, the direct coil-hairpin transition competes with the path through the I2 transition state.This suggests that that our trajectory discretization and coarse graining procedure leads to loss of some finer details of the transition ensemble.Analysis of kinetic models obtained with various levels of clustering yields similar results to those presented above (see SI for details).We find a highly heterogenous 'coil' metastable state, including extended structures, turns, off-register partial hairpins, as well as a small population of alpha-and pi-helices in the central region.As mentioned above, the 'hairpin' is also heterogenous, though to a much lesser degree.Its contributing structures include the fully folded state and partial in-register hairpins, corresponding to early states along the zipper folding pathway.Across the different models, the lifetimes of the coil and hairpin, as well as forward and reverse transition times remain quite similar to the data shown in Figures 9-11.The transition times from ODR agree with the ACF calculations directly from MD data.The kinetic model is consistent with the zipper model of folding.A nascent hairpin with formed turn and first beta-sheet hydrogen bond is identified as a transition state for GB1 folding.

Conclusions
We have performed extensive replica-exchange (REMD) and molecular dynamics (MD) simulations of folding of the 16residue GB1 hairpin peptide in aqueous solution with the CHARMM36m protein force field and TIP3P water model.The REMD results predict a folded hairpin fraction of 45-48% at 300 K and 39-41% at 320 K. Analysis of hydrogen patterns allows the determination of a statistical folding pathway, which is consistent with a zipper model.
Structures sampled in REMD were used to start six independent 20 ls MD trajectories at 320 K, which were used to analyze the GB1 folding pathway in more detail.The two slowest relaxation times identified in the simulations were 1600-2000 ns (ca.1.8 ls) and 50-140 ns (ca. 100 ns), with the slower one assigned to global folding and the faster to more localized fluctuations involving formation of intermediate structures.A number of folding and unfolding transitions were directly identified in the trajectories, mostly following the zipper model, i.e., nucleation at the central turn followed by consecutive hydrogen bond formation.The transitions were highly cooperative, with all hydrogen bonds exhibiting correlations above 0.80.The folding of the backbone and formation of the four-residue hydrophobic cluster on one face of the hairpin were highly correlated, indicating that our results are best described by a combination of cooperative hydrogen bond zipper and hydrophobic collapse.Our results suggest that the out-of-register partial hairpins are off-pathway intermediates that do not contribute significantly to folding but rather represent structural fluctuations within the unfolded ensemble.
Further insights into the folding pathway were obtained by constructing coarse-grained kinetic models with the ODR approach.These models showed a folding time of 3.3 ls and unfolding time of 4.0 ls corresponding to the roughly 1.8 ls global folding relaxation.Additionally, relaxation times in the 130-170 ns range could be assigned to the formation/decay of the transition state and off-path intermediates.The 'coil' state was the most highly populated and also most heterogenous, consisting of the largest number of microstates, representing primarily extended and turn structures.The 'hairpin' state was also heterogenous, but to a lesser degree, involving the fully folded and partially folded in-register hairpins along the zipper pathway.The transition state corresponded to the nucleated hairpin, with formed turn and first sheet hydrogen bond, while the off-path intermediates were off-register partial hairpins.The coil and hairpin states were long lived, at 3-4 ls: The transition state and intermediates had low populations, with DG ¼ 3:5 À 3:7 kcal/mol above the coil, and much shorter lifetimes, 130-170 ns.
Overall, our simulations indicate GB1 folding is highly cooperative, following a pathway of combined hydrophobic collapse of sidechains and concerted zipper formation by backbone hydrogen bonds.The folding pathway, 40% folded fraction and 1.8 ls relaxation time at 320 K, are all in excellent agreement with available experimental data.Slow interconversions between unfolded conformations of hairpins, including off-register hydrogen bonded states, were also identified in a recent extensive experimental analysis (Soranno et al., 2018).This agreement with observations for GB1, as well as our previous work with various helices (Kuczera et al., 2021), indicates that the CHARMM36m protein force field combined with the TIP3P water model is able to realistically model complex biological processes.Our results are also in qualitative agreement with several previous studies of GB1, which mostly found a stable hairpin, microsecond relaxations, and presence of intermediates (Ahalawat & Mondal, 2018;Cai & Han, 2022;Daidone et al., 2015;Lwin & Luo, 2006).
By applying REMD to obtain statistics, MD to uncover dynamics, and coarse-grained kinetic modeling to generate insight, we were able improve our understanding of the microscopic mechanism of folding for the GB1 hairpin.This involved the highly cooperative nature of the folding, identification of a nascent hairpin as the transition state and assigning a faster relaxation time on a 100 ns time scale to formation of intermediates and transition state.

Figure 1 .
Figure 1.(a) GB1 hairpin backbone with marked hydrogen bonds.The first six hydrogen bonds form the beta-sheet structure of GB1: HB1 and HB2 are between E2 and T15, HB3 and HB4 are between T4 and T13, HB5 and HB6 are between D6 and T11.Two additional hydrogen bonds are in the central turn: HB8 between D6 and K10, and HB7 between D6 and T9.(b) Ribbon model showing the hairpin structure and the hydrophobic cluster involving sidechains of W3, Y5, F12 and V14.Shown are residues 41-56 of full B1 domain of protein G, PDB ID 2GB1 (Gronenborn et al., 1991).

Figure 2 .
Figure 2. Melting curves of GB1 hairpin from REMD simulations.The folded fraction is estimated.based on the fraction of formed hydrogen bonds out of six betasheet h-bonds (NHB ¼ 6) and out of all possible hydrogen bonds, i.e., including two in the turn region (NHB ¼ 8).Errors of the data are about 10%.Plot with error bars is included in SI.

Figure 3 .
Figure 3. Statistical folding pathway of GB1 hairpin, using all backbone hydrogen bonds (NHB ¼ 8).Average populations are color coded for individual hydrogen bonds 1-8 (horizontal axis, labelled as shown in Figure 1(a)) over a subset of REMD structures with a fixed number of total hydrogen bonds (vertical axis).Plot using only the beta-sheet hydrogen bonds (NHB ¼ 6) is given inn the SI.

Figure 4 .
Figure 4. (a) Free energy profile along the reaction coordinate measuring the number of formed native beta sheet hydrogen bonds (NHB ¼ 6).(b) Average endto-end distance as a function of temperature from the GB1 REMD simulations are in qualitative agreement with FRET measurements (Figure 6c of Jas et al., 2014a).

Figure 5 .
Figure 5. Central structures from clustering of 300 K REMD trajectory by CA RMSD with 0.5 nm radius.The six structures shown in panels (a-f) correspond to centers of clusters with populations of 59, 34, 3, 1, 1 and 0.5%, respectively (altogether 98% of total).

Figure 6 .
Figure 6.Autocorrelation functions of the time series of CA RMSD from experimental folded structure (black), number of formed hydrogen bonds (out of all eight, including the central turn, red) and solvent accessible surface area of Trp3 sidechain.The data shown are averages over separate ACFs calculated for the six MD trajectories.The results. of bi-exponential fits yield timescales of 70 and 1900 ns for RMSD, 140 and 1600 ns for hydrogen bond count and 11 and 1800 ns for Trp SASA.

Figure 8 .
Figure 8. Example of a detailed time course of a hairpin folding transition, occurring at ca. 19.1 ls in trajectory 4, following the zipper mechanism.Plot shows lengths of the eight hydrogen bonds, labelled as in Figure 1(a), as a function of time, sampled every 100 ps.More examples are in SI.

Figure 9 .
Figure 9. Two-state (N ¼ 2) ODR coarse-grained kinetic model of GB1 folding based on N c ¼54 clusters.(a) Kinetic scheme with description of coarse-grained aggregate states and representative structures.(b) Structural ensembles showing overlaid central structures for all clusters within each coarse-grained state.The state lifetimes are calculated as s i ¼ À 1=R ii and j !i transition times as s ij ¼ 1=R ij , where R is the reduced dimensionality rate matrix.The free energy differences are calculated as DG ¼ À RTln p pmax � �, with R the gas constant, T ¼ 320 K the temperature, p the population of the current aggregate state and p max the maximum

Figure 10 .
Figure 10.Three-state (N ¼ 3) ODR coarse-grained kinetic model of GB1 folding based on N c ¼54 clusters.Kinetic scheme with description of coarse-grained aggregate states and representative structures.The intermediate I1 is primarily involved in interconversions with the coil, indicating its location off the folding path.

Figure 11 .
Figure 11.Four-state (N ¼ 4) ODR coarse-grained kinetic model of GB1 folding based on N c ¼54 clusters.Kinetic scheme with description of coarse-grained aggregate states and representative structures.Intermediate I1 is the same as in Figure 9.The intermediate I2 converts to both the coil and hairpin state with a timescale of 130 ns, while being formed on a much slower time scale, denoted by the dashed arrows.Transition Path Theory analysis indicates that I2 is a transition state for hairpin folding.

Table 1 .
Comparison of slowest relaxation times from full kinetic matrix K (dimension N c ¼ 54) and reduced dimensionality models with dimensions N ¼ 2-5.Relaxation times, ns.