Evolutionary conservation of protein dynamics: insights from all-atom molecular dynamics simulations of ‘peptidase’ domain of Spt16

Abstract Protein function is encoded in its sequence, manifested in its three-dimensional structure, and facilitated by its dynamics. Studies have suggested that protein structures with higher sequence similarity could have more similar patterns of dynamics. However, such studies of protein dynamics within and across protein families typically rely on coarse-grained models, or approximate metrics like crystallographic B-factors. This study uses µs scale molecular dynamics (MD) simulations to explore the conservation of dynamics among homologs of ∼50 kDa N-terminal module of Spt16 (Spt16N). Spt16N from Saccharomyces cerevisiae (Sc-Spt16N) and three of its homologs with 30–40% sequence identities were available in the PDB. To make our data-set more comprehensive, the crystal structure of an additional homolog (62% sequence identity with Sc-Spt16N) was solved at 1.7 Å resolution. Cumulative MD simulations of 6 µs were carried out on these Spt16N structures and on two additional protein structures with varying degrees of similarity to it. The simulations revealed that correlation in patterns of backbone fluctuations vary linearly with sequence identity. This trend could not be inferred using crystallographic B-factors. Further, normal mode analysis suggested a similar pattern of inter-domain (inter-lobe) motions not only among Spt16N homologs, but also in the M24 peptidase structure. On the other hand, MD simulation results highlighted conserved motions that were found unique for Spt16N protein, this along with electrostatics trends shed light on functional aspects of Spt16N. Communicated by Ramaswamy H. Sarma.


Introduction
Both three-dimensional structure and dynamic features are consequential in dictating protein function. Dynamics are not only important for enzyme catalysis (Benkovic & Hammes-Schiffer, 2003;Yang & Bahar, 2005), but also for non-enzymatic proteins with protein-or DNA-binding functions (Etheve et al., 2016;Furini et al., 2013;Kalodimos et al., 2004). While advances in macromolecular X-ray crystallography have resulted in rapid determination of three-dimensional structures of several protein families, conformational dynamics of proteins remain much more difficult to probe experimentally. Hence, it is not surprising that while evolutionary conservation of protein sequence and structure have been extensively studied, similar studies at the level of protein dynamics are sparse (Maguid et al., 2006(Maguid et al., , 2008M€ unz et al., 2010). Even though modest efforts have been made to probe the evolutionary features of protein dynamics with methodologies like comparative studies of crystallographic B-factors and normal mode analysis using elastic network models of static structures (Kalaivani et al., 2016;Maguid et al., 2006;Micheletti, 2013;Tama & Sanejouand, 2001) and these studies are performed on a large number of protein structures, such studies are lacking in that they are usually performed on single domain proteins and use various approximations. In recent years with advancements in algorithms, software, and computer hardware, molecular dynamics (MD) simulations have become a desirable tool for exploring the dynamics of protein molecules (Karplus & Kuriyan, 2005;Klepeis et al., 2009;Narwani et al., 2020;Orozco, 2014;van der Kamp et al., 2010;Ziada et al., 2018). It is noteworthy that there are reports on delineation of the conserved dynamical features responsible for ligand binding among proteins with similar architectures using MD simulations (Pang et al., 2005). However, these studies reflect inadequacies due to limited simulation time (maximum of 20 ns). Therefore, longer (ms scale) MD simulations on multidomain proteins would not only provide high quality data to study backbone fluctuations, it would also provide an opportunity to study patterns of interdomain motions. Such studies could facilitate the fundamental understanding of structure-dynamics-function relationship in these proteins. Applying such methods on several related proteins could lend deeper insights into evolutionary conservation of dynamics in protein structures.
In this study, we investigate if backbone fluctuations (high frequency motions) and interdomain (inter lobe) motions (low frequency motions) are conserved in a two-domain protein structure involved in protein-protein interaction, using MD simulations of 500 ns to 1 ms. For this purpose, we choose homologous structures of the 'peptidase' module ($50 kDa) of Spt16 protein (Spt16N), which is a part of the chromatin remodeling FACT complex. Spt16N is known to bind histones H3/H4 in vitro Marcian o & Huang, 2016;Stuwe et al., 2008), however, the exact biological function of this module has remained elusive (Liu et al., 2020). Prior to this study, structures of Spt16N from four organisms Saccharomyces cerevisiae (Sc-Spt16N) (VanDemark et al., 2008), Schizosaccharomyces pombe (Sp-Spt16N) (Stuwe et al., 2008), Homo sapiens (Hs-Spt16N) (Marcian o & Huang, 2016), and Cicerarietinum (Ca-Spt16N)  were known. Sc-Spt16N was the first Spt16N structure to be reported and has about 30-40% sequence identity with the other three structures. To make our dataset more comprehensive, we determined the crystal structure (1.7 Å resolution) of Spt16N from Ashbyagossypii (Ag-Spt16N) which has a sequence identity of 62% with Sc-Spt16N. With the addition of this structure, we had proteins with a wider range of sequence variation available for the analysis (Supporting material Figure S1). Interestingly, the structural fold of Spt16N is the same as that of M24 peptidases. Therefore a M24 peptidase, that has 15% sequence identity with Sc-Spt16N (M24 peptidase), (PDB: 5GIU) was also included for this study. Further, another peptidase similar in molecular size to that of Spt16N from an unrelated structural fold sharing only 3% sequence identity with Sc-Spt16N (M1 peptidase) (Agrawal et al., 2019) was used as a negative control (Supporting material Figure S2). Our study reveals that the backbone fluctuations in Spt16N homologs are conserved. To the best of our knowledge, we are the first to report a linear trend between sequence identity and similarity in patterns of backbone fluctuations, as determined by MD simulations. Furthermore, we discuss some functional insights that can be gleaned from interdomain dynamics and electrostatics trends revealed in this work.

Cloning and protein purification
Boundaries of the Spt16N domain (Ag-Spt16N; N-terminal 461 residues) in the putative Spt16 polypeptide of Ashbyagossypii (GenBank accession, NP_9852116) were identified based on sequence comparison with other yeast Spt16N proteins with known structures. The identified DNA sequence was PCR amplified and cloned into pST50Tr, a T7-promoter based expression plasmid (Tan et al., 2005) to form in-frame translational fusion with streptavidin-His6-TEVtag (STRHISTEV, 3.8 kDa). The cloning and expression hosts used in the present studies were Escherichia coli strains, XL-1 blue and BL21-Codon Plus-RIL, respectively. The Spt16N encoding region of the construct was verified by DNA sequencing. For protein expression, the transformed BL21 cells were grown in 3 L of 2xTY broth at 37 C till OD 600 of $0.3 and then shifted to 28 C for further growth. The culture was induced at OD 600 of $0.7 by addition of 0.2 mM isopropyl b-D-1-thiogalactopyranoside. After growing the induced culture for 16 h, the cells were harvested by centrifugation and the cell pellet was suspended in P300 buffer (50 mM sodium phosphate pH 7.0, 300 mM sodium chloride). The soluble fraction of the cell lysate containing desired protein (STRHISTEV-Ag-Spt16N) was subjected to metal ion affinity chromatography by batch method at 4 C. The lysate was mixed with $4 mL of equilibrated Ni-sepharose resin (GE Healthcare). The resin was washed with P300 buffer supplemented with 20 mM imidazole. The bound protein on the resin was eluted with P300 buffer supplemented with 500 mM imidazole. The eluted protein (STRHISTEV-Ag-Spt16N) was dialyzed against T200 buffer (20 mM Tris-HCl pH 8.0, 200 mM sodium chloride) at 294 K while being subjected to digestion by Tobacco Etch Virus (TEV) protease. The purified protein was concentrated to 15 mg/ml and supplemented with 20% (v/v) glycerol followed by flash freezing using liquid nitrogen. The sample was stored at -70 C until further use.

Determination of crystal structure
For the crystallization experiment, the required amount of stored protein was purified through size-exclusion chromatography with T200 buffer on Superdex 200, 10/300 GL column on AKTA purifier system (GE Healthcare). The purified protein was concentrated to 19 mg/ml and crystallization was carried out at 21 C employing under oil microbatch method (D'Arcy et al., 1996). For this purpose, 1 mL of protein solution was mixed with 1 mL of crystallization solution and overlaid with 50 mL of Al's oil (silicon fluid: mineral oil, 1:1,v/ v) in 96-well U bottom plates. Initial crystals were observed in 12-15 different crystallization conditions which were further optimized. The crystals with the best diffraction resolution were grown using 0.2 M potassium nitrate pH 6.9, 20% polyethylene glycol 3350. These were cryoprotected using paratone oil (Hampton research) and diffracted at the protein crystallography (PX-BL21) beamline of Indus-2 synchrotron, India . The diffraction images were collected on a MAR345 image plate with an oscillation of 1 using X-rays of wavelength 0.9794 Å. The data were indexed and integrated using XDS (Kabsch, 2010) and scaled using AIMLESS (Evans & Murshudov, 2013). The molecular-  Figure S1). replacement search model was prepared from Cicerarietinum Spt16N (PDB: 5CE6)  using the program CHAINSAW (Stein, 2008). The molecular replacement solution was obtained using PHASER suite (McCoy et al., 2007) on the CCP4 platform (Winn et al., 2011). Automated model building was performed using PHENIX AutoBuild (Terwilliger et al., 2008). The model was completed by iterative rounds of manual model building using COOT (Emsley & Cowtan, 2004) and refined using phenix.refine (Adams et al., 2010).The quality of the model was analyzed with MolProbity . The intensity data processing and refinement statistics are summarized in Table 1.

Sequence analyses
Sequence identity scores between all the studied structures were calculated using structure-based sequence alignments generated by superpose from CCP4i package (Winn et al., 2011).
For calculating conservation scores, first, Uniprot database was searched for sequences annotated as Spt16 and the sequences were clustered using 90% sequence identity cutoff. This resulted in 215 clusters and one representative sequence from each cluster was downloaded. The downloaded sequences along with sequences of Sc-Spt16N corresponding to PDB file 3BIP were aligned using Clustal Omega server (Madeira et al., 2019). Sequences lacking query cover (fragmented or irrelevant sequences) along with sequences having long ambiguous sections were discarded. The resulting 200 sequences and Sc-Spt16N sequence were re-aligned to obtain the final multiple sequence alignment (MSA). This MSA along with the 3BIP structure were uploaded to ConSurf2016 server (Ashkenazy et al., 2016) for calculating sequence conservation scores corresponding to each position on the 3BIP structure. Four positions of the MSA, corresponding to residues 154, 155, 424, and 425 of Sc-Spt16N, were not assigned any conservation score by the server. This was due to low occupancy at these positions in the MSA.

Structural analyses
The solvent accessible surface area (SASA) of Sc-Spt16N structure was calculated by Areaimol software of CCP4i package (Lee and Richards, 1971) using 1.4 Å probe radius. Residues with >30% SASA were labeled as surface residues and those with <30% SASA were labeled as core residues. For studying structural conservation, the Ag-Spt16N, Sp-Spt16N, Hs-Spt16N, and Ca-Spt16N structures were compared to that of Sc-Spt16N using the superpose tool of CCP4i package (Winn et al., 2011). In order to determine the structure conservation in Spt16N, this inter-lobe motion must be ignored. Hence, the A-Lobe and B-Lobe were aligned independently for calculating per-residue Ca deviations while the hinge region was split into two and appended with either of the lobes. The extended A-Lobe used for this analysis corresponds to residues 4-166 of Sc-Spt16N and the extended B-Lobe used for this analysis corresponds to residues 167-447 of Sc-Spt16N.

Correlation coefficients
Similarities in residue mobility data trends (obtained from crystallographic B-factors, normal mode analysis, or molecular dynamics simulations) among the different structures were quantified using Spearman rank order correlation coefficient (qB) (Spearman & Spearman, 1987). The residue mobility data of the Spt16N structures, M24 peptidase, and M1 peptidase were aligned to that of Sc-Spt16N based on pairwise sequence alignments. Each amino acid position within a structure was assigned a rank based on the magnitude of its B-factor such that the position with highest B-factor was ranked 1. Then, the correlation coefficient (qB) for each pairwise alignment was calculated using the following Equation (1) (Spearman & Spearman, 1987): where d i is the difference in rank between aligned positions of two structures and n is the total number of positions in the alignment. Further to compare the strength of linear relationship between the measure of similarity calculated using B-Factors, RMSF and Ca deviation between structures with sequence identity as well as that of RMSD, Pearson product-moment correlation coefficient (PMCC) was calculated using an online server (Statistics Calculator: Correlation Coefficient. Correlation Coefficient Calculator, 2021) for each data distribution using the following Equation (2) (Lee et al., 1988): where xi and yi are the two corresponding data points from the sample set of size n that need to be tested for the strength of linear relationship.

Normalizing crystallographic B-factors
B-factors of different crystal structure files are not always directly comparable owing to different resolutions in the diffraction data and use of different refinement strategies. To bring the b-factors from different structures on the same scale, two methods of normalizations are frequently employed, viz., z-score based method and unitary based method of normalization (Schneider et al., 2014). While in unitary based method the b-factor values are scaled between a given minimum (B MIN ) and a maximum (B MAX ) value, the z-score based method scales the values of the b-factors such that the distribution has zero mean and unit variance. The zscore normalization distributes the b-factors on either side of zero. To keep the values of the scaled b-factors positive, we used the unitary method of normalization with B MIN ¼ 1 and B MAX ¼ 100. The normalized b-factors (B x-scaled ) are given by Equation (3): where B x is the b-factor of the atom x in the structure, and B min and B max are the minimum and maximum b-factors, respectively in the structure file.

Normal mode analysis
Normal mode analyses of the Sc-Spt16N (PDB: 3BIP), Ag-Spt16N (PDB: 6A8M), Hs-Spt16N (PDB: 5E5B), Sp-Spt16N (PDB: 3CB6), and Ca-Spt16N (PDB: 5CE6) structures were performed using iMODS (L opez- Blanco et al., 2014) with Amber94 force field (Cornell et al., 1995). Residue mobility data of the Spt16N structures were downloaded, aligned according to MSA of the 5 protein sequences, and visualized on overlaid graphs. Normal mode analysis of M24 peptidase was also performed using iMODS. For calculation of correlation coefficients (qB), the residue mobility data of the Spt16N structures and M24 peptidase were aligned to that of Sc-Spt16N based on pairwise sequence alignments. The top 3 predicted normal modes of each structure were visualized with 2 clusters, each corresponding to one of the domains/lobes in the protein.

Molecular dynamics simulations
The molecular dynamics (MD) systems for 5 available crystallographic structures of Spt16N (Ag-Spt16N, Hs-Spt16N, Sp-Spt16N, Sc-Spt16N, and Ca-Spt16N) and those of M24 peptidase and M1 peptidase were prepared with GROMACS 5.1 (Abraham et al., 2015) and the calculations were performed using Optimized Potentials for Liquid Simulation all atoms (OPLS-AA) force field (Tieleman et al., 2006). The solvated systems were prepared in cubic boxes with one side of 10 Å where the protein was surrounded with $30,000 spc216 water models (Berendsen et al., 1981) providing it with a solvated environment similar process was replicated for each system. Depending upon the overall charge on the protein, an appropriate number of Na þ or Clions were added to neutralize the system. Additionally, 0.1 M concentration of NaCl was achieved by replacing water molecules. This was followed by energy minimization using the steepest descent algorithm with 1000 kJ/mol/nm tolerance. The temperature and pressure equilibration runs were performed using Berendsen thermostat (Lemak & Balabaev, 1994) and Parrinello-Rahman (Parrinello & Rahman, 1980) pressure coupling algorithms respectively. Both the runs were performed using LINCS holonomic constraints on the protein for 10 ns with step size of 2 fs, while using 1 nm cut off distances PME electrostatics was used for performing neighbor search while temperature and pressure equilibration and MD run were performed using the leap-frog integrator (Van Gunsteren & Berendsen, 1988). Final MD run for each system except M1 peptidase was performed in duplicates of 500 ns. For M1 peptidase, only one MD run was performed. The root mean square deviation (RMSD) values for the MD runs of 500 ns each were found to be stable in the later parts of all the simulation (Supporting material Figure S6) with fairly stable protein structure (Supporting material Figure S7).
Root mean square fluctuation (RMSF) calculations for the five Spt16N structures and that of M24 peptidase were performed independently on the A-Lobe and B-Lobe; A-Lobe comprising of residues corresponding to 4-166 of Sc-Spt16N and B-Lobe corresponding to residues 167-447 of Sc-Spt16N. RMSF values from the systems with more than one MD runs were averaged. These values from Spt16N structures were aligned based on an MSA of the five sequences. Positions that were absent in Sc-Spt16N were omitted.
Correlated motions of overall protein dynamics were visualized using principal component analysis (PCA). A covariance matrix C was constructed using deviation in coordinates of Ca atoms from MD trajectory (David & Jacobs, 2014). Further, this symmetric matrix C was diagonalized using an orthogonal transformation matrix, R. The columns of matrix R are the eigenvectors, each of which describes a single correlated displacement of a cluster of atoms in multidimensional space. The first few principal modes often describe collective, global motions in the system. In case of Spt16N, these seem to reflect interdomain motions which can be resolved into three basic motions -clamshell, sideways, and twist for twodomain proteins (Jones et al., 2012). Top three eigen modes for each run were visually annotated in any of the three categories along with an oblique mode which is a combination of clamshell and twist motions. The modes were further deconvoluted into the three basic interdomain motion types by modified form of analysis reported previously (Jones et al., 2012) with the help of DynDom3D server (Lee et al., 2003;Qi et al., 2005) (details in the supporting material). Briefly, three most pure forms of clamshell, sideways, and twist motions were searched among all the studied modes. Mode 3 from run 1 of Ag-Spt16N was taken as reference for sideways motion, mode 2 from run 1 of Ca-Spt16N was taken as reference for twist motion, and mode 3 from run 1 of Ca-Spt16N was taken as reference for clamshell motion. The axis of overall interdomain motion of each PCA mode was determined using DynDom3D server (Lee et al., 2003). Angular deviation between the rotational axis of an eigen motion and that of the axis of each basic motion were measured to represent the eigen mode as a combination of the three basic domain movements. Further, to determine overall change in surface electrostatics during MD simulation for all the proteins in the current study, we generated representative structures from the MD trajectories based on clustering of structures with RMSD threshold of 0.1 units. This allowed us to generate 10-15 structures for each protein representing major conformations throughout the MD trajectory. Following this we generated a PQR file and later performed APBS calculation using the PDB2PQR server (Dolinsky et al., 2004) using the CHARMM27 force field (MacKerell et al., 2000) and mapped the same on PyMOL (Rigsby & Parker, 2016).

Results
Through this study we wanted to correlate sequence, structure and dynamics of the N terminal domain of the FACT complex (Spt16N). To achieve this, structures of Spt16N were downloaded from PDB. These included Sc-Spt16N, Sp-Spt16N, Hs-Spt16N and Ca-Spt16N. To make our data more diverse we determined the crystal structure of Ag-Spt16N (Supporting material Table S1). Although Spt16N are known to be involved in protein-protein interactions with histones, a distant structural homolog of this domain (M24 peptidase) has a known enzymatic function. This enzyme was also included in the present study. In terms of structure, Spt16N has two conserved helical lobes (A-lobe & B-lobe) connected by a helical hinge region (Figure 1). Sequence analysis of the structures from the dataset (Spt16N variants and M24 peptidase) showed pairwise sequence identity scores to be in the range of 15-62% (Table 1). When the sequence conservation was mapped on the structure, it was evident that the sequence of B-lobe is relatively more conserved than that of A-lobe. In case of M24 peptidase, the active site is located in a region that is structurally equivalent to B-lobe (more conserved lobe), which reaffirms the fact that functionally important residues are evolutionarily conserved thereby hinting a prominent role of B-lobe in Spt16N function. Hence, it is surprising that the reported H4-tail binding site of Spt16N resides in A-lobe (Figure 1), which is relatively less conserved. Apart from structure, the other dimension associated with functionality of proteins is their dynamics. Therefore, we asked that given the conserved fold and function of Spt16N proteins, how divergent are these proteins in terms of dynamics. Protein dynamics can be classified into high frequency backbone fluctuations and low frequency domain dynamics (Henzler-Wildman et al., 2007). First, we investigated if patterns of backbone fluctuations were correlated with sequence identity scores.

Comparing crystallographic B-Factors
Several previous reports have used normalized values (Equation (3)) of crystallographic B-Factors to study protein backbone fluctuations (Maguid et al., 2006). The correlations between patterns of B-Factors for pairs of structures were deduced using Spearman rank order correlation coefficient (q BB ) (Spearman & Spearman, 1987) (Table 1). For Spt16N structures, the q BB and respective sequence identities appear to be uncorrelated (Figure 2(b)) and it became evident that the distribution lacked well defined correlation (Figure 3(b), correlation coefficient: 0.3). Since crystallographic B-Factors not only depend on the structure of the proteins but also on various experimental parameters involved in their structure  (Table 1) (as M24 peptidase is not a homolog of SPT16N, hence it was not included here). The q B vs % sequence identity graphs for MD RMSF (A), crystallographic B-factors (B), and NMA mobility scores (C) are shown. The Pearson correlation coefficients calculated between q B and % sequence identity scores are reported. Results of linear regression analysis for the RMSF data are also shown (A).
determination (Fenwick et al., 2014;Reichert et al., 2012) this observation does not necessarily imply the lack of correlation between backbone fluctuations and sequence identity scores.

Comparing normal mode mobility
Normal mode analysis (NMA) is another elegant method for comparing protein flexibility in terms of their vibrational modes. The methodology can be directly applied on PDB structures and predicts mobilities as well as domain motions that could be exhibited by the molecule. It is considered as a coarse-grained method of calculating protein dynamics as it overlooks other forces such as solvent damping and local chemical couplings that modulate the vibrational modes (Alexandrov et al., 2005;Kim et al., 2015;Wako & Endo, 2017).However, being a computationally inexpensive tool, NMA provides a useful dynamic parameter of the structures Therefore, we performed NMA calculations on the crystallographic structures using iMODS server, which provides the top normal modes along with mobility scores for each residue(supporting material Figure S5(a-d) and Figure 2(c)).A mobility score represents total movement of that particular residue in motions described by all top 20 normal modes; hence this parameter represents backbone fluctuation values to the NMA analysis. To compare NMA mobilities between the structures, again spearman rank order correlation (q BM ) (Equation (1)) was calculated for the mobility scores. Even in this instance the mobilities between structures, in terms of q BM , show no correlation with their sequence identity (Table 1, Figure 3(c), correlation value: 0.1). Absence of this type of correlation, again, could be due to the limitations associated with NMA in capturing the dynamics of the proteins.

Comparing molecular dynamics RMSF
As mentioned earlier, NMA is a simplistic model to calculate dynamics of the protein and ignores important parameters like effects of protein solvent and intra-protein atomic interactions. Hence, to add rigor to our analysis, we carried out 500 ns MD simulations on all the 7 structures. The root mean square fluctuations (RMSF) values between each pair of structures were compared using the spearman rank order correlation coefficient, q BR . The pattern of RMSF values in the present study indicate a very strong conservation of dynamics amongst Spt16N homologs with almost identical regions of higher and lower RMSF (Figure 2(D)). This is supported by spearman correlation coefficient, which lies in the range of 0.78-0.53 for the four structures belonging to Sc-Spt16N (Table 1). The RMSF of Sc-Spt16N had significantly lower q BR correlation coefficients with those of M24 peptidase (q BR ¼ 0.31) and M1 peptidase (q BR ¼0.12). Furthermore, the RMSF trends of M24 peptidase and M1 peptidase were visibly different from those in Spt16N structures (supporting material   Figure S8(b,c)). The similarity of per residue RMSF scores of the Spt16N homologs (q BR ) were found to be significantly correlated with sequence identity scores (Figure 3(a), correlation coefficient: 0.86). Further, a linear trend was observed in this data. It was noted that the per residue RMSF scores of the Spt16N homologs were more correlated to variation in sequence (sequence identity scores) than that of variation in structure (Ca RMSD of crystal structures) (Figure 4(a), correlation coefficient: -0.73).
From the comparison of q BR with sequence identities of Spt16N homologs it is very clear that the patterns of backbone fluctuations are well conserved among homologs and follows a linear trend when compared across a range of sequence identity scores. These observations suggest that the similarity in the backbone fluctuation patterns is likely to be essential for the conservation of function across diverse members of a protein family. However, the subtle differences in these patterns for homologs with lower sequence identity scores might be required for the proteins to evolve divergent functions while retaining similar structural folds. Therefore, RMSF values can not only tell us about the sequence, structure and dynamics correlation but could also be used to highlight regions with conserved dynamics which may be functionally important thereby shedding some light on the lesser known mode of interactions between the N-terminal domain of FACT complex (Spt-16N) with histones.

Comparing protein lobe dynamics to elucidate functionally important regions
Next, we asked whether the comparison of protein dynamics would be helpful in the identification of functionally important regions of the proteins. It has been shown that low frequency motions in proteins are important in critically accessing larger domain movements and have been shown to be conserved in enzymes that perform similar functions (Bhabha et al., 2015;Marcos et al., 2010).
In our previous analysis we used the mobility scores from NMA while here; to understand low frequency motions we are using the top normal modes found during the NMA analysis. They are widely used in studying inter domain motions (Maguid et al., 2008;Skjaerven et al., 2014). In the present study, the first 3 low frequency NMA modes for the structure of each Spt16N homolog were analyzed. Overall, four types of motions were observed i.e.-clamshell, twist, sideways, and oblique ( Figure 5, supporting material Figure S5). The first 3 low frequency NMA modes from each analysis were found to be displaying similar domain (lobe) movements for all the Spt16N structures and also for M24 peptidase (supporting material Table S3). Amongst these the lowest frequency mode was either twist or oblique i.e.: combination of twist and clamshell; while the other two were always found to be either clamshell or sideways motions. Hence, it can be said that the inter-lobe motion in Spt16N proteins as mapped by NMA was found to be well conserved across all the structures referred to here. It is to be noted that the normal modes of a functionally different protein with a similar fold to that of Spt16N (M24 peptidase) were found to be similar to those of Spt16N, thereby associating the inter-lobe motion as a property of the structural fold; making it difficult to draw any conclusion about functionally important lobe motions in Spt16N.

Comparing principal modes from MD trajectories
As discussed in the previous section, NMA analysis is highly dependent on the protein structure and may fail to take in account various parameters that govern protein dynamics. Hence, to improve our conformation sampling and to take into account most of the parameters that contribute to the protein dynamics, we performed principal component analysis (PCA) on the cumulative 1 ls MD trajectory (duplicates of 500 ns) for all the 6 Spt16N homologs in the present study. Visual analyses of the top three principal components for each of the six Spt16N trajectories revealed combinations of clamshell, twist, sideways, and oblique motions ( Figure 5). However, the order and extent of the contribution of these motions to the overall dynamics were found to vary amongst homologs and even in duplicates of the same run (supporting material Table S4). This is not surprising as interdomain motions are slow and the conformational space sampled is relatively small in a 500 ns MD run. Interestingly, there was at least one instance of clamshell motion among the top three eigen modes of each Spt16N structure, however, neither MD run of the M24 peptidase showed any instance of the clamshell motion in the top three eigen modes.
In order to explicitly understand the hidden patterns in the PCA data, we performed a detailed analysis on the lobe motions. We deconvoluted modes into three orthogonal interdomain motion types (clamshell, twist, and sideways) by using a modified form of analysis reported previously (Jones et al., 2012). This method reveals the percent contribution of each of the 3 orthogonal domain (lobe) motion towards every principal mode.
When the contribution of three orthogonal interdomain motions from the top eigen modes of each of the studied trajectories were obtained, it was observed that our earlier visual assignments (Supporting material Table S4) are consistent with the results of deconvoluted interdomain (inter lobe) motion types (Supporting material Figure S9). Further, we found overall percent contribution of each of the orthogonal motions towards the dynamics for individual protein by considering data from both runs ( Figure 6). In contrast to NMA, this analysis reveals a major contribution (31-56%) of clamshell motion in dynamics of most of the Spt16N structures. On the other hand, the dynamics of M24 peptidase has only 25% contribution from the clamshell interdomain motion. Moreover, dynamics of M24 peptidase has a large contribution from the sideways interdomain motion (46%) while the dynamics of most of the Spt16N structures has a lesser contribution (13-39%) from this motion-type ( Figure 6).
Hence, based on the above analysis we can infer that the clamshell interdomain motion was found to be conserved in Spt16N homologs.

Discussion
Dynamics is a crucial aspect of protein function yet very few studies have been performed to understand its evolutionary conservations. Previous studies that have attempted to study protein dynamics with respect to homologs suggest a direct relation between the sequence identity of two proteins and their dynamics (Bhabha et al., 2015;Maguid et al., 2006;Marcos et al., 2010;Wako & Endo, 2017). Detailed studies focused on specific families have also found significant correlation between dynamics of the proteins from the same family (Liu & Bahar, 2014;Skjaerven et al., 2014). Additionally even the studies that involved thousands of protein structures/sequences from various families have concluded that dynamics is more correlated at family level as compared to super family. While dynamics at the super family level is again more correlated then that from unrelated proteins (Maguid et al., 2008).Yet the fact remains that almost all previous studies often rely on approximate metrics like B-Factors and NMA experiments (Maguid et al., 2006(Maguid et al., , 2008M€ unz et al., 2010) whereas in the majority of cases any detailed comparison of dynamics is typically restricted to a couple of homologs at a time.
In this study, we have employed all-atom MD simulations for studying protein dynamics, which is more reliable than Bfactors and NMA, while also using more than a handful of proteins with varying sequence identity scores in order to sample the diversity of the protein family.

Evolutionary conservation patterns in backbone fluctuations
To the best of our knowledge, this is the first study to report the correlation between backbone fluctuations and sequence identity of homologs using all-atom MD simulations. The model system for this study was chosen very carefully to include multiple homologs with varying degrees of sequence identity scores (15-62%). Moreover, while five of the homologs are involved in protein-protein interactions, the more distant homolog is an enzyme (M24 peptidase). Further, we added a non-homologous structure (sequence identity 3%) of similar size from a different fold as a control data point for our analysis.
This study conclusively shows that, for Spt16N family, the pattern of backbone fluctuations as determined by MD simulations has a high correlation (Figure 3(a), (q BR ) correlation coefficient: 0.86) with sequence identity among homologs and that this data follows a linear trend. Data from crystallographic B-factors and mobility scores from normal mode analyses do not display any such trend (Figure 3(b,c)). Our studies also show that the pattern of backbone fluctuations, as estimated by MD simulations, are correlated (Figure 4(a)) with Ca deviations (structural conservation) among structural homologs. However, such trends were not observed in the analyses conducted using B-factors (Figure 4(b)) or in NMA mobility scores (Figure 4(c)).It is worth noting that B-factors are not highlighting the correlation between the homologs as indicated by RMSF and hinted by NMA mobilities; as Bfactors are the result of a number of factors it is difficult to speculate the reason for the same. Hence, our study corroborates previous reports by showing that backbone fluctuations are evolutionarily conserved (Francis et al., 2013;Halle, 2002;Liu & Bahar, 2012;Maguid et al., 2005Maguid et al., , 2006Maguid et al., , 2008Micheletti et al., 2004). We also show that similarity in backbone fluctuations as estimated from MD simulations can be linearly correlated with sequence identity scores and RMSD values. It would be interesting to see in future studies if this trend is observed in other protein families.

Evolutionary conservation patterns in inter domain motions
Low frequency motions like loop motions and interdomain motions can be studied using NMA of static structures or PCA of MD simulations. The MD runs include larger conformational sampling for predicting conformational dynamics and hence it is expected that PCA results will have a better accuracy. Interestingly, previous NMA studies of a large number of single domains have suggested that the overall protein dynamics tend to be evolutionarily conserved (Maguid et al., 2008). Further, reports that quantitatively compare NMA and MD based methods suggest that the deformation space obtained by both approaches is quite similar to each other (Rueda et al., 2007). When it comes to the functional importance, reports on dynamics of two domain proteins suggest display of stretching, clamshell, twist, and sideways motions between the domains, and highlight that differences in these orthogonal interdomain motions could be of help in the identification of functional regions of the protein (Marcos et al., 2010).
In the two-domain (two-lobe) Spt16N system of our current study, the low frequency modes have significant contribution from interdomain motions. The top normal modes calculated from iMODS server were found to be consistent not only amongst the Spt16N homologs with DNA binding function, but also with M24 peptidase which has a catalytic function. Top modes of PCA analysis, on the other hand, were more diverse and it was difficult to find any pattern in them via visual inspection. These motions when deconvoluted into three orthogonal interdomain motions showed that interdomain (inter lobe) motions of the Spt16N structures had a higher contribution from the clamshell motion whereas that of M24 peptidase had a higher contribution from the sideways motion. This indicates a difference in interdomain motions between two structurally similar yet functionally distinct proteins. Hence we can conclude that PCA of long MD simulations reveal subtle differences in dynamics there by enabling it to differentiate between SPT16N molecules and M24 peptidase whereas the NMA of coarse-grained model fails to bring out these modulations and classifies all the structural homologs (SPT16N's and M24 peptidase) in similar domain motions.

Conserved protein dynamics provide functional insights
Multiple studies have shown that Spt16N of FACT complex binds to histones H3 and H4 Marcian o & Huang, 2016;Stuwe et al., 2008), however, very little is known about the binding interface. The structure of the FACT complex bound to the nucleosome was reported recently (Liu et al., 2020), however, Spt16N was not observed in the reported structure. A previous experimental study has shown that residues Ser83 and Lys86 of Sp-Spt16N, corresponding to Ser77 and Lys80 of Sc-Spt16N, contribute towards binding of H4 tail (Stuwe et al., 2008). These residues were shown to be 100% conserved in an alignment of six Spt16N sequences (supporting information Figure S1). Furthermore, the same study showed that other than the H3 tail the globular domains of H3-H4 also bind to Spt16N, however, mutational analysis failed to identify the binding region.
In our current study, we have made an MSA of 200 Spt16N sequences. Our sequence conservation data alongside RMSF dynamics provide insights on the residues that could be critical for histone binding. Ser and Lys residues implicated in H4-tail binding were found to be conserved (ConSurf score of 9) in this alignment as well. The position corresponding to Ser77 of Sc-Spt16N (supporting information Figure S10) is occupied by Ser in over 65% of the sequences and Thr in over 25% of the sequences. Hence, the hydroxyl group of the sidechain at that position is conserved in over 90% of the sequences. Also, the positive charge on the position corresponding to Lys80 is highly conserved with Lys occupying the position in over 95% of the sequences. Further, results of our Consurf study shows that Lys108, which is close to Ser77 and Lys80 in structure, has high conservation scores despite their dynamical nature ( Figure 2) and surface-exposed location (Figure 1 (raw data not shown)). So, on the basis of the above data we propose that these residues could also be involved in binding of H4 tail.
It has been shown previously that the electrostatic surface potentials of the Spt16N structures from various origins have some conserved patterns . It was speculated that conserved acidic patches on the Spt16N surface might be important for binding the overall positively charged histones. The Ag-Spt16N structure reported in this study also has a similar surface electrostatic potential pattern where the two acidic and two basic patches identified in the previous study  are conserved (supporting information Figure S11). Moreover, we have shown that these patterns, specifically acidic patches that were speculated to be of importance, are conserved even during lobe motions over the course of 500 ns MD simulations (supporting information Figure S12). The first acidic patch (patch 1) is located primarily on the B-Lobe (residues 253 to 255) with some contribution from the A-Lobe (residues 40 to 43). Interestingly, this region of the A-Lobe has high structural conservation (Ca dev 0.5 Å) and also high mobility (average RMSF $2.3 Å). The second acidic patch (patch 3) is located at the C-terminal region of Spt16N (residues 282 to 284, 308, and 447) (supporting information Figure S13a). The consistent negative potential at the C-terminal region supports a recently published report (Jiang et al., 2019) where the extended region from the C-terminal was proposed as a probable histonebinding site. Residue 284 of this second patch is highly conserved, being occupied by Gly in 80% of the Spt16N sequences. Surprisingly, despite being occupied by a residue known for its flexibility, this region was stable during the MD simulations with average RMSF $0.7 Å, indicating functional importance of the residues in Spt16N.
Observing from inter domain (inter lobe) motions we inferred prominence of clamshell movement in Spt16N structures. Since a peptidase with similar structure displayed sideways motion as its prominent low frequency movement, we speculate that the clamshell motion might be of importance to the function of Spt16N. Mapping all the residues perceived to be important for histone-binding on the Spt16N structure suggests that the concave face of the protein could be responsible for the binding function of Spt16N (Figure 7) while the A-lobe is most likely to be involved in H4 tail binding.

Associated content
Supporting Figures S1 to S13; Tables S1 to S3.