Deciphering the structural and functional impact of missense mutations in Egr1-DNA interacting interface: an integrative computational approach

Abstract Early growth response-1 (Egr1) is a zinc-finger transcription factor that plays a critical role in controlling cell growth, proliferation, differentiation, angiogenesis, and apoptosis. Egr1 is induced by many growth factors, cytokines, and stress signals and is also known to be involved in several pathological conditions like cancer, neurological and ocular disorders. The DNA binding domain of Egr1 is a highly conserved Cys2His2 (C2H2) zinc finger (ZNF) domain which specifically binds to GC-rich consensus sequence GcG (G/T) GGGCG and activates transcription. As the C2H2 domain specifically recognizes its DNA target, the mutations spanning this region shall perturb DNA recognition and may hinder transcription of target genes. Therefore, in this study, the missense mutations occurring specifically at the DNA binding domain (DBD) of Egr1 were probed by computational approaches involving in silico screening of pathogenic and functional mutants coupled with extensive molecular dynamics simulations, to determine the mutants that affect its structural stability and interactions with DNA. From the pathogenicity analysis of 38 missense mutations spanning Egr1-DBD, 17 were predicted as pathogenic, and 7 amongst these were found to have functional impact on Egr1. On combined analysis of molecular dynamics simulation, Residue interaction analysis and Egr1-DNA interaction analysis results, the mutants R371C and R375C showed least impact, whilst, H382R tend to increase the structural stability, whereas R360H, H390R, E393V, and H414Y conferred greater impact by altering the structural stability and DNA interactions. Hence, this study exposes the prospects of considering these 4 deleterious mutations for clinical significance, but needs further experimental validation. Highlights Egr1’s DNA binding domain is a highly conserved Cys2His2 (C2H2) zinc finger domain that specifically recognizes its DNA target. Mutations spanning in the DNA binding domain shall perturb DNA recognition and may hinder transcription. Among the missense mutations, mutants R360H, H390R, E393V, and H414Y were inferred to have a greater impact on Egr1 by altering the structural stability and DNA interactions.


Introduction
Zinc-finger-containing proteins constitute the most abundant protein superfamily and are known to be involved in a variety of cellular activities such as development, differentiation, and tumor suppression (Abbehausen, 2019;Cassandri et al., 2017;Ravasi et al., 2003). These proteins also act as transcriptional regulators due to their association with DNA-binding motifs (Bird et al., 2003;Emerson & Thomas, 2009). Early growth response-1 (Egr1) also known as NGFI-A, Zif268, Krox24, and Tis8, is a serum-inducible zinc finger transcription factor that plays a critical role in controlling cell growth, proliferation, differentiation, angiogenesis, and apoptosis (Gashler & Sukhatme, 1995;O'Donovan et al., 1999). Egr1 is induced by many growth factors, cytokines, and stress signals. Egr1 is also known to be involved in a number of pathological conditions such as cancer, neurological disorders, and ocular disorders (Duclot & Kabbaj, 2017;Pagel & Deindl, 2012). Moreover, Egr1 is known to comprise a wellconserved DNA binding domain composed of three zinc fingers that bind to specific GC-rich consensus sequence GcG(G/T) GGGCG (Hamilton et al., 1998;Russo et al., 1993). It also activates transcription by binding to DNA as a monomer (Carman & Monroe, 1995;Crosby et al., 1991).
The DNA binding domain of Egr1 is a highly conserved Cys2His2 zinc finger domain. In particular, the Cys2His2 zinc finger (C2H2-ZNF) domain is a highly conserved sequence specific DNA-binding region that play a crucial role in gene regulation. The C2H2 domains are also well confirmed on the key role of the amino acids at these positions for the specific binding to DNA (Fedotova et al., 2017;Namuswe & Berg, 2012;Wolfe et al., 2000). The sequence specificities of C2H2-ZNF proteins are highly variable, with many human C2H2-ZNF proteins having a unique set of DNA-contacting specificity residues (Eom et al., 2016;Garton et al., 2015). C2H2-ZNF proteins also often harbor one or more of a small number of effector domains. An individual C2H2-ZNF domain contains a well-conserved DNA-binding structural interface and specifically recognizes its DNA target via amino acids occupying four key 'canonical' positions of an alpha-helix (Kluska et al., 2018;Yella et al., 2018). Therefore, mutations in amino acids spanning this region may interfere with DNA binding and could alter the functionality of the Zinc-finger proteins (ZNFs), which highly aids in regulation of wide range of cellular processes. Thereby, the alterations in ZNFs are known to be involved in cancer progression and metastasis, neurodegeneration, etc (V egran et al., 2013). Hence, it is essential to probe the mutations occurring at the DNA binding domain of Egr1 that affects its structural stability and its interaction with DNA.
In the human genome, variations caused by single base pair changes known as SNPs, amongst these, missense SNPs(in specific, non-synonymous SNPs (nsSNPs)) contribute to towards significant functional and structural variations . Thus, missense SNPs might be involved in various genetic disorders as they result in structurally and functionally deformed proteins. Moreover, various earlier studies have provided insights on the impact of the mutations occurring specifically at the DNA binding domain in different proteins Nixon et al., 2004;Xiao et al., 2020;Yoneda et al., 2012). Hence, this study is aimed to investigate the nsSNPs of Egr1 specifically at the DNA binding domain (DBD), and to probe their effects on its structure and function. Therefore, an extensive in silico analysis of nsSNPs was implemented in prioritizing the mutants based on their pathogenicity, functionality impact, and conservation ( Figure  1). Moreover, advanced computational methods, such as essential molecular dynamics and structural impact analysis were also performed to decipher the impact of DNA binding due to Egr1's (DBD) structural changes upon mutation.

Screening of deleterious nsSNPs
The shortlisted Egr1's DBD nsSNPs were initially examined for their deleterious or pathogenic effect with sequence and evolutionary-based and machine learning-based approaches . Thereby the servers namely, SIFT, Polyphen2, PANTHER-PSEP, PMut, MutPred2, were used to screen the pathogenic mutants that might have a structural or functional impact on Egr1's DBD. Among the servers used, SIFT (Sorting Intolerant from Tolerant) prediction is based on a presumption that certain amino acid positions in protein sequences are conserved through evolution. Consequently, a substitution in the conserved regions that has an impact on protein functionality is considered deleterious (Sim et al., 2012). The SIFT output score ranges from 0 to 1 where substitutions 0.05 indicate a deleterious effect and those variants greater than 0.05 are tolerant. While, PANTHER-PSEP (Position-Specific Evolutionary Preservation) predicts potentially pathogenic or deleterious nsSNPs based on the measure of evolutionary preservation of each amino acid of the target, where the amino acids that are preserved for a long in the evolutionary timescale were considered deleterious (Tang & Thomas, 2016). PolyPhen-2 (Polymorphism Phenotyping v2) utilizes BLAST to identify homologues of the input missense variant. A matrix is then constructed from the aligned sequences to calculate the PSIC (Position-Specific Independent Count) scores ranging from 0 to 1, from which phylogenetic information is derived (Adzhubei et al., 2013). With sequence alignments, phylogenetic and structural data, the effect of a particular point mutation in the protein is predicted by a naïve Bayesian classifier, which classifies the nsSNPs as 'Possibly Damaging' and 'Probably Damaging' (PSIC > 0.5) or 'Benign' (PSIC < 0.5). It also includes structure-based characteristics such as solvent accessibility, changes in accessibility for the buried residues, and B-factor. PMut (Prediction of pathological Mutation on proteins) prediction is based on the neural network to predict the pathogenicity of amino acid substitutions. The scores are accompanied by a reliability index ranging from 0 to 10, where 0 is the most unreliable and 10 is the most reliable (L opez-Ferrando et al., 2017). MutPred (Mutation Prediction) is a random-forest classifier that analyses the changes caused by amino acid substitutions on a molecular level. It provides a 'g' score which signifies the probability for the extent to which a variant is deleterious. MutPred classifies nsSNPs as deleterious (p > 0.75), disease-associated (0.5 < p < 0.75), or neutral (p < 0.5) by taking into account structural and functional aspects, conservation of the protein sequence, and the conformational dynamics of the molecule. Finally, the mutants above the threshold scores across all the servers were considered deleterious and were further probed for their impact on functionality (Li et al., 2009).

Functionality analysis of deleterious mutants
The nsSNPs have a higher impact on both structural and functional aspects of proteins. Therefore, the resultant deleterious mutants from the earlier screening process were prioritized further based on their functionality impact using the Mutpred2 server. P-score of MutPred (Mutation Prediction) (Li et al., 2009) represents the impact on the structural and/ or functional properties such as gain or loss in helical structures, intrinsic disorder, and phosphorylation, metalbinding disorder, loss and gain of Post-translational modifications, etc. Since the loss of ZNF features has a greater impact on DNA binding which in turn leads to its functionality related disorders. Based on the P-score and the mutants that have a higher impact on the metal-binding site and loss of ZNF helices were considered for their destabilization analysis by MD simulation.

Structural stability analysis of WT and mutant forms by molecular dynamics simulation
The WT-Egr1 DBD in complex with DNA (PDB ID: 4R2A) (Hashimoto et al., 2014) (Kemme et al., 2019;Li et al., 2008), whereas cysteine residues coordinates metal ions in their deprotonated form (Godwin et al., 2017;Pace & Weerapana, 2014). Thereby both the WT and mutants were pre-processed and minimized using Prepwizard module of Maestro with appropriate protonation states of Histidines and cysteines. For both WT and mutant forms, the MD simulations were performed using the Gromacs 5.1.4 simulation package with Amber99sb-ParmBSC0 (P erez et al., 2007) as a force field. Initially, the complexes were centered in a cubic environment of 1 nm and were solvated using the TIP3P water model. Further, the solvated systems were then neutralized using sodium counter ions and energy minimized using the steepest descent algorithm for 50,000 steps. During these steps, the coulombs and Vander Waals interactions were maintained at a cut-off of 0.9 nm and 1.0 nm, respectively. Later, the system was equilibrated under NVT and NPT ensemble using Leapfrog integrator for 100 picoseconds (ps) while simultaneously maintaining Berendsen temperature coupling (Berendsen et al., 1984) and pressure coupling at 300 K and 1 bar, respectively. Also, the long-range electrostatics were maintained by Particle-Mesh Ewald (PME) and short-range electrostatics maintained by coulombs and Vander Waals at a cut off 1.0 nm each at a time-step of 2 femtoseconds (fs) (Darden et al., 1993). LINCS algorithm was used to maintain the geometry of molecules (Hess et al., 1997). A well-equilibrated system was then subjected to MD simulations for 100 nanoseconds (ns) each at 300 K without any constraints, where the conformations were sampled at an interval of 2 ps. Finally, the trajectories were analyzed using GROMACS utilities to probe the conformational changes, fluctuations, compactness, and stability of the protein-DNA complexes. Moreover, to estimate the global concerted movement which represents the functional behavior of the protein, Principal Component Analysis (PCA) and free energy landscape (FEL) analysis were performed (Maisuradze et al., 2010;Nagarajan & Vetrivel, 2018. Further, from the resulting minimum energy cluster of the proteins, representative stable near-native conformations were used for further studies. The schematic diagrams of protein-DNA interactions in the pre-and post-MD simulated complex were deduced using Nucplot (https://www.ebi.ac.uk/thornton-srv/ software/ NUCPLOT/) (Luscombe et al., 1997). Further, the pivotal interactions of protein-DNA complexes were also analyzed by PLIP (Protein-Ligand Interaction Profiler) (Salentin et al., 2015). In addition, the electrostatic potential calculations of conformations were carried out using PBEQ Solver module of CHARMM-GUI server (Im et al., 1998;Jo et al., 2008).

Residue interaction network analysis
Residue Interaction Network (RIN) analysis provide the graphical network of the inter-residue interactions of a protein, where the residues are considered as nodes and their corresponding physicochemical interactions between them such as hydrogen bond, van der Waals, Ionic, p-cation, and p-p stack interactions as edges to determine the connectivity of the residues. In the case of mutations in protein, the topological changes based on RIN infer the loss of covalent and non-covalent bonds that may alter the protein network. In this study, RIN was generated by Residue Interaction Network Generator (RING) v2.0.1 (Piovesan et al., 2016) for the conformations generated for every 200 frames of Egr1's DBD, while the DNA and the hetero molecules were not considered. The resulting XML files were imported into Cytoscape v3.8.2 (Shannon et al., 2003), the plugin RINalyzer (Doncheva et al., 2011) was used to visualize the residue interaction network. To visualize the three-dimensional structure of proteins, PDB files were imported to UCSF Chimera v1.13 (Pettersen et al., 2004).

Retrieval of SNPs
Among 838 SNPs of Egr1, 532 were found to be missense mutations. However, only 38 missense mutations out of 532 were found to span the DNA binding domain (DBD) of Egr1. Therefore, these 38 mutations of Egr1's DBD region were considered for further analysis to probe the deleterious effects and their intensity.

Screening of pathogenic mutants
Amongst the 38 mutations occurring at Egr1's DBD region, SIFT reported only 28 mutations to be highly deleterious out of 38 mutations (73.7%). While Polyphen showed 25 deleterious mutations (65.8%) and Mutpred predicted 26 deleterious mutations (68.4%) among the 38 missense SNPs. The other tools namely, Panther_subspec and pMut showed 24 mutations (63.1%) and 21 mutations (55.3%), respectively. On overall consensus analysis of deleterious mutations among all the servers, only 17 mutations (Table S1) were found to be deleterious or pathogenic and were further considered for functionality analysis.

Functionality impact analysis of pathogenic mutants
On analyzing the 17 shortlisted pathogenic mutations (Table 1), only 7 mutations were found to be involved in altered metal binding (zinc-binding), while the remaining 10 were found to lead to altered disorder interface which doesn't have a major impact on the DNA binding. Therefore, the 7 mutations which showed a higher probability of altering the metal binding at the ZNF regions were considered, as the distortion of metal coordination may lead to the improper conformation of the ZNF residues, thus leading to the lack of proper recognition of DNA.

Deciphering the effect on conformational changes and compactness of native and mutants of Egr1
The Wild type ( Figure 2) and the resultant 7 mutations ( Figure 3) that were found to be pathogenic and has major impact on Egr1's functionality were considered for molecular dynamics simulation studies for 100 ns in order to infer its structural destabilization effect. From the RMSD analysis ( Figure 4a and Table S2), WT was observed to have least deviations of about $0.45 nm till 80 ns yet has attained major deviations ($ 0.6 nm) in the last 20 ns. While amongst the mutants, the mutants R360H, H390R, R375C have shown deviations of about $0.55-0.6 nm than the WT and have tried to maintain stability after 40 ns till the end of MD production run. In the case mutant E393V, unstable deviations were observed within 40-80 ns, whereas mutants, H382R and H414Y were observed to have the least deviations than that of the WT. Whereas, the RMSF plot (Figures 4b and S1) infers that except for the N-terminal and C-terminal residues, E343 ($0.6 nm), R371 ($0.4 nm), were found to be highly fluctuating in mutants H390R, H382R, R375C, respectively. While D399 ($0.4 nm) was observed to show higher fluctuation in both WT and all mutants except for the mutants (H382R, R371C). From the radius of the gyration plot (Rg) (Figure 4c and Table S2), it was inferred that except for mutant R375C, all other mutants and the WT were found to have the least compactness.

Analyzing the distortion of zinc coordinates
On analysing the distance of zinc coordinating residues (Figures 5, S3 and S4) along each Zinc finger regions, it was observed that among the Zinc fingers (ZNFs), Zinc finger3 (ZNF3) [Phe396-His418] was inferred to have higher deviations and deteriorations of zinc coordinates in all the mutants than the WT. This drastic deviation at ZNF3 was due to the loss of a helix of ZNF3 which was transformed as loops. In Zinc finger1 (ZNF1) [Tyr338-His362], it was observed that only His362 was shown to have higher deviations than the WT in almost all mutants except for R371C. In the case of Zinc finger2 (ZNF2) [Phe368-His390], the residue His390 was found to be highly deviating in all mutants, while Cys373 in H382R, H390R, and E393V, respectively. Among the ZNF3 Zn 2þ coordinating residues, His414 and His418 were found to be completely deteriorated in all the mutants, while the Cys398 and Cys401 showed drastic deviations in all mutants except for mutants R371C and R375C than WT.

Exploring the secondary structural modifications of Egr1
The secondary structural analysis ( Figure 6) showed that the b À turn À b (Gly364-Arg379) in the mutants R360H, R371C, R375C, and a helix (Ser408-Lys416) to be completely lost in all mutants. These loops transformed secondary structures are known to play a crucial role in maintaining the stability of the zinc coordinating residues of the ZNFs. Thereby the loss of the secondary structures, on being transformed as loops attains the higher flexibility, which in turn contributes to the disorientation of the zinc coordinating residues leading to the deteriorations of ZNF.

Structural rearrangement triggered by the mutation
The interaction between the Arg360 and Gln369 aids in close contact of ZNF1 and ZNF2 and also maintains its intra stability and hydrogen bond interactions between Cys370 and Arg375 which contributes to the stabilization of b sheets that were observed to be lost in mutants R360H and R375C, respectively. Thereby loss of these interactions had a greater impact on the secondary structural loss and also the displacement of the ZNFs as shown in Figures 6 and 7. From the distance plots of ZNFs (Figure 7 and Table S2), it was inferred that the distance of ZNF1 with ZNF2 is found to be deviating by about 0.4 nm in mutant R371C and R360H while the distance of ZNF3 with ZNF1 and ZNF2 was observed to have deviations in almost all mutants specifically in H390R, H414Y mutants. The distal placement of the ZNF3 may be due to the secondary structure loss and higher flexibility of the a-helix-3 (H3) that has been transformed as loops.
Besides, it was noted that the conserved hydrophobic residues, namely Cys370, His386 which interacts with each other, and Phe377, Leu383, Cys373 on interacting with His390 to form a strong hydrophobic core that maintains the intramolecular stability and orients the zinc coordinating residues The changes in zinc fingers metal-binding sites are given as bold.   were found to be lost in H390R mutant, thereby distorting the hydrophobic environment. Moreover, except for the mutants H414Y and R371C, the other mutants were observed to have lesser intra hydrogen bonds than the WT, yet have maintained about $50 hydrogen bonds throughout the simulation ( Figure S2b and Table S2). Thus, this infers that these mutations alter the hydrophobic core and indirectly affects the structural stability by increasing the flexibility and decreasing the intramolecular interactions, leading to the varying conformational states of the mutants than the WT.  From the Residue interaction network analysis (Figures 8,  9, and S5-S12; Table S3), it was inferred that all mutants to have fewer intra hydrogen bonds, whereas the van der Waals interactions were found to be highly increased in the mutants than the WT. It was also inferred that only one mutant R371C has attained the p-cation interaction, while ionic interactions have increased in mutant H382R whereas the other mutants except for R371C and R375C has lacked to form ionic interactions when compared to the WT. Apart from that, it was also observed that p-p stacking interactions to be maintained only in mutants R371C, R375C as that of WT, while the other mutants have decreased p-p stacking interactions. Overall from the residue-based network interaction analysis, it was inferred that apart from R371C and R375C, other mutants have a major impact in decreasing the structural stability. Moreover, on comparative structural analysis of the initial and near-native complexes ( Figure S13) of wild and mutants ( Figures S14 and S15), it was observed that the major conformational changes be in ZNF2 and ZNF3 in all the mutants except for the WT. In the case of ZNF2, the loss of b sheets was found to alter the orientation of the zinc coordinating residues namely His390, Cys373, while in ZNF3 the flexibility of the loop transformed a helix and the sparsely deteriorated b sheets have led to the drastic orientation changes of the Zn 2þ coordinating residues that may impact the stability of the ZNFs.

Egr1-DNA interaction analysis
On analysing the intermolecular hydrogen bond interactions ( Figure S2a) between the Egr1's DBD and DNA, it was observed that the WT showed a lesser H-bonds ($27) initially, yet has tried to increase the stability of the DNA interaction by gradually increasing the H-bonds to about $41 after 30 ns, however, the WT has maintained an average of $35 H bonds throughout the MD simulation run. On comparative analysis of the intermolecular hydrogen bonds ( Figure S2a and Table S2), it was inferred that the mutants R371C and R375C to have maintained $30 H-bonds, while other mutants to have maintained lesser H bonds than that of WT of nearly $25-26 throughout the simulation. This indicates that the mutants R371C and R375C doesn't have any major impact in altering the DNA binding and interactions with Egr1's DBD. From the 2 D interaction analysis plot of Egr1's DBD and DNA complexes (Figures S16-S19 and Table 2), it was observed that the residues that are known to form interactions with DNA base pairs namely, ZNF1:R351, D353, R357; ZNF2: R379, D381, H382; ZNF3: R407, D409, R413, were found to be interacting in the WT and mutant complexes, except for mutant H382R. Moreover, the interactions of R413 with the base pair were found to be observed in the near-native complexes than the pre-simulated complexes. The higher incidence of the pi-cation interactions was observed only in the pre-simulated complexes, contributed highly by the H382 (ZNF2), R407 (ZNF3), whereas, in the case of mutant H382R, the pi-cation interaction of H382 is completely lost. In post simulated complexes, the pi-cation interactions were completely lost, however, the stability of the complexes was noted to be maintained by the increased salt bridge interactions than that of the pre simulated complexes. Among these residues, R357 (ZNF1), R413 (ZNF3), and H382 (ZNF2) that are known to form stable interactions with the guanine moiety were also observed to have stable interactions in both the pre-and post-simulated complexes, whilst in the case of R413 the interactions were observed only in the post simulated complexes. In addition, even the electrostatic interaction calculations of Egr1-DNA (Table S4) were found to corroborate well with the interaction analysis, except for the mutants R371C and R375C, all the other mutants were found to have drastic changes in the electrostatic interaction energies when compared to the WT.

Conclusion
Early growth response-1 (Egr1) is a zinc-finger transcription factor that plays a critical role in many pathological conditions such as cancer, neurological disorders, and ocular disorders. As mutations occurring at the DNA binding domain of the Egr1 region deteriorates DNA recognition and may hinder transcription, this study was implemented to determine the pathogenic and functionality affecting mutants that may potentially alter the structural stability and interactions with DNA. Overall, from the 38 missense SNPs, the resulting 7 SNPs were known to have an impact on structural stability and functionality loss. It was inferred from the MD analysis and the residue interaction network analysis only the mutant R371C and R375C have the least impact in altering the where the p-p stacking (Red) and Ionic interactions (Blue) as sticks, while the hydrogen bonds (Cyan) and Van der Waals (brown) as lines. structural stability than the other mutants compared to WT. Moreover, it was also inferred that the mutants R360H, H382R, H390R, E393V, and H414Y to have a greater impact in altering the structural stability and DNA interactions. Since these mutations were observed to cause the secondary structure loss, they are transformed as loops with increased flexibility and may lead to the dis-orientation of Zinc coordinating residues and also deterioration of the crucial interactions that stabilize the intra and inter stability of Egr1-DNA complexes. However, the insights from this study needs to be confirmed by further validation using biophysical methods like NMR and Cryo-EM techniques in order to confirm the applicability.

Disclosure statement
No potential conflict of interest was reported by the authors.