In silico and structure-based assessment to classify VUS identified in the α-helical domain of BRCA2

Abstract Breast cancer type 2 susceptibility (BRCA2) protein plays a crucial role in DNA double-strand breaks repair mechanism by homologous recombination. Pathogenic mutations in the BRCA2 gene confer an increased risk of hereditary breast and ovarian cancer (HBOC). Different missense mutations are identified from a larger cohort of patient populations in the BRCA2. However, most missense mutations are classified as ‘Variants of Uncertain Significance’ (VUS) due to a lack of data from structural, functional, and clinical assessments. Therefore, this study focused on assessing VUS identified in the α-helical domain of h-BRCA2 using different in silico tools and structure-based molecular dynamics simulation. A total of 286 identified VUS were evaluated using Align-GVGD, PROVEAN and PANTHER servers and 18 variants were predicted to be pathogenic. Further, out of 18 variants analyzed using the ConSurf server, 16 variants were found to be evolutionary conserved. These 16 conserved variants were submitted to PremPS and Dynamut server to assess the effect of the mutation at the protein structure level; 12 mutations were predicted to have a destabilizing effect on the native protein structure. Finally, molecular dynamics simulations revealed 5 variants BRCA2 Cys2646Tyr, Asp2665Val, Trp2619Arg, Trp2619Ser and Tyr2660Cys can alter the folding pattern and need further validation using in vitro, structural and in vivo studies to classify as pathogenic. Communicated by Ramaswamy H. Sarma


Introduction
Breast cancer type 2 susceptibility (BRCA2) protein plays an essential role in maintaining genomic integrity via homologous recombination, cell-cycle control and replication fork protection (Venkitaraman, 2014;Holloman, 2011).Pathogenic mutations in the BRCA2 gene predispose to hereditary breast and ovarian cancer (HBOC) and also cancers of the pancreas, prostate and other organs (Breast Cancer Linkage Consortium, 1999).Since the association of BRCA2 mutations with cancer predisposition and genetic counselling, extensive efforts have been made to determine the mutation spectrum of BRCA2 variants identified from a larger cohort of high-risk families (Frank et al., 2002).With the advancement in genetic testing and awareness of genetic counselling, different unique BRCA2 (MIM: 600815) variants have been identified in individuals with a family history of breast and/or ovarian cancers.The identified variants are classified using a widely accepted five-classification system which includes Benign, Likely Benign, Variant of Uncertain Significance (VUS), Likely Pathogenic and Pathogenic (Richards et al., 2015).The variants classified as Benign, Likely Benign, Pathogenic and Likely pathogenic are of clinical significance, while the variants classified as VUS are of utmost concern for scientists and clinicians to evaluate their clinical significance.The clinical relevance of most VUS could not be determined because the correlation between functional and structural assessment and data from the family pedigree is still not much explored.More than 6400 missense BRCA2 variants are deposited in the ClinVar database (Landrum et al., 2018), where �85% of these reported variants are classified as VUS, and � 10% of variants are categorized as unclassified due to conflicting interpretations of pathogenicity.Therefore, a large number of VUS present a serious obstacle in the clinical management of BRCA2 related cancers.
The pathogenicity of reported BRCA2 variants can be evaluated by analyzing its segregation, co-occurrence, and availability of family pedigree (Tavtigian et al., 2008;Eggington et al., 2014;Agata et al., 2020).Homology-directed DNA break repair (HDR) assay is the most extensively used functional assay to assess the pathogenicity of variants independently or in combination with computational approaches (Guidugli et al., 2013).However, HDR assay is a tedious process, and a sheer number of identified VUS makes it challenging to annotate all VUS by HDR or other functional assays (Guidugli et al., 2014;Toland & Andreassen, 2017).The combination of allele frequency, sequence conservation, and physiochemical nature of amino acid substitutions has been reported in different studies to evaluate the pathogenicity of variants (Tavtigian et al., 2006;Abkevich et al., 2004).However, these methods were found to be less successful for BRCA2, VUS (Lee et al., 2008); therefore, multi-model approaches are required for substantial conclusions.
Alternatively, analyzing the effect of mutations on the protein structure is also considered an effective approach to evaluating the structural based functional aspects of VUS (Williams & Glover, 2003;Studer et al., 2013).We recently published a biophysical and molecular dynamics-based approach to characterize an unlcasifed BRCA2 variants, Arg2502Cys located in the C-termianl domain (Khan et al., 2022).The C-terminal DNA binding domain of BRCA2 is well conserved among mammalian species such as monkey, dog, mouse, and rat and is also a hotspot for missense mutations.In the crystal structure of mouse and rat -BRCA2, the C-terminal domain comprises an a-helical domain, three oligonucleotide binding domains and a tower domain (Yang et al., 2002) (Supplementary Figure 1a).In human-BRCA2, the a-helical domain from 2478 to 2670 residues has 88% similarity and 80% identity with the crystal structure of Rat-BRCA2 (PDB ID: 1IYJ).The a-helical domain is also known to interact with DSS1 (deleted in spilt hand/spilt foot) protein which is required for the structural stability of BRCA2 (Li et al., 2006).
Therefore, considering the structural and functional importance of the a-helical domain, the human-BRCA2 (2478-2670) was modelled, and 286 VUS reported were retrieved for the a-helical domain.Furthermore, structurebased in silico prediction tools such as Align-GVGD (Tavtigian et al., 2006), PROVEAN (Choi et al., 2012;Choi & Chan, 2015), PANTHER (Tang & Thomas, 2016), Consurf (Berezin et al., 2004;Ashkenazy et al., 2016), PremPS (Chen et al., 2020) and Dynamut (Rodrigues et al., 2018) were explored to assess the pathogenicity and evolutionary conservation of variants.VUS predicted to be pathogenic using in silico tools were further simulated by GROMACS to determine the effect of mutation on the structure and dynamics of the wild-type protein (Figure 1).

Evaluation of pathogenicity of VUS
To assess the pathogenicity of the identified VUS, a combination of the following sequence and structure-based in silico tools were utilized.
Align-GVGD (http://agvgd.hci.utah.edu) is a web-based program that predicts missense substitution to be deleterious or neutral based on the combination of biophysical characteristics of the amino acid and multiple sequence alignment (Tavtigian et al., 2006).
PROVEAN (http://provean.jcvi.org) is a web-based server that predicts the functional effect of variants using a protein sequence-alignment-based score.Protein sequence and missense mutations were submitted to the PROVEAN server.The server performs a BLAST search to collect homologous sequences, and then the alignment-based score for each mutation is calculated.The PROVEAN scores were set to À 2.5 to discriminate deleterious substitutions with neutral variants (Choi et al., 2012;Choi & Chan, 2015).

PANTHER
(http://www.pantherdb.org/tools/csnpScoreForm.jsp)(Protein ANalysis THrough Evolutionary Relationships) is a curative database of protein families, subfamilies, and functions.It measures PSEP (position-specific evolutionary preservation) of input missense variants which concern protein sequences to calculate the probability of deleterious effect (Pdel).Based on the Pdel scores, the variants are classified as probably benign, probably damaging and possibly damaging (Tang & Thomas, 2016).
ConSurf (https://consurf.tau.ac.il) server estimates evolutionarily conserved amino acid positions in a protein molecule based on the phylogenetic relations among homologous sequences.Conserve positions of amino acids are essential for protein structure and function.ConSurf server calculates the conservation scores on a scale of nine different grades.The positions with grade 9 indicate the most conserved sites, whereas positions with grade 1 indicate the most variable sites.
PremPS (https://lilab.jysw.suda.edu.cn/research/PremPS)server predicts the effect of single amino acid substitution on the stability of the protein structure by calculating the changes in unfolding Gibbs free energy (DDG).The point mutations with DDG > 0 show a destabilizing effect on the structure, while the mutations having DDG < 0 show stabilizing effect on the protein structure (Chen et al., 2020).Dynamut (http://biosig.unimelb.edu.au/dynamut/)web server predicts the impact of point mutation on protein stability using the Normal Mode Analysis (NMA) approach.It combines the scores of change in protein stability, dynamics from vibrational entropy between the wild-type and mutant protein (Rodrigues et al., 2018).

Molecular Dynamics simulation
Molecular Dynamics Simulation (MDS) studies investigate the structural stability and dynamics of wild-type and mutant protein structures (Karplus & McCammon, 2002).MDS was carried out using GROMACS 2018.1 with the implementation of OPLS-AA/L force field (Hess et al., 2008;George et al., 1976).The system was solvated using the TIP3P water model in a cubic box with periodic boundary conditions, and counter-ions Na þ & Cl -were added to neutralize the system.Energy minimization was carried out using the steepest descent algorithm with a tolerance of 1000 kJ/mol/nm.The system was equilibrated by applying positional restraints on the structure using NVT (constant Number of particles, Volume, and Temperature) followed by NPT (constant Number of particles, Pressure, and Temperature) ensemble for 100 ps each.The temperature of 300 K was coupled by the Berendsen thermostat (Ryckaert et al., 1977) with the pressure of one bar.The equilibrated system was subjected to 100 ns of the production run with time-step integration of 2 fs.The trajectories were saved at every 2 ps and analyzed using Gromacs 2020.4.Furthermore, RMSD, RMSF, radius of gyration (Rg), SASA (Eisenhaber et al., 1995) and DSSP (Kabsch & Sander, 1983) were analyzed.

Principal component analysis
Principal components analysis (PCA) (Amadei et al., 1993) or essential dynamics (ED) observes the global collective motion of protein structure using molecular dynamics simulations.The movement of protein structure in a multidimensional space is identified by the most vital eigenvectors projection in Cartesian trajectory coordinates.The rotational and translational movements are removed from the simulation trajectory by constructing a covariance matrix of backbone C-a atoms.Cross-correlation for PCA was performed using gmx covar and gmx anaeig function.

Screening of deleterious VUS using evolutionary conservation of sequences
Evolutionary information about a protein is essential to understanding the variations which could be deleterious to its function.The evolutionary conservation of an amino acid highly affects the protein's overall structure and function.Therefore, the ConSurf server was utilized to test the potential pathogenicity of selected missense VUS by observing the degree of evolutionary conservation.ConSurf server calculates the conservation scores on a different scale from one to nine grades.The conservation scale in the range of 1 to 3 is considered variable, 4 to 6 is considered average, and the scale from 7 to 9 is considered conserved.Based on the ConsSurf results, VUS having a conservation score of 6 or above were selected for characterization (Figure 2).Out of 18, 16 VUS tested for highly evolutionary conservation Pro2532His, Asp2583Val, Pro2589Leu, Gly2596Val, Cys2605Tyr, Pro2608Leu, Trp2619Arg, Trp2619Ser, His2623Leu, Trp2626Arg, Trp2629Cys, Pro2639Phe, Cys2646Tyr, Cys2646Trp, Tyr2660Cys, and Asp2665Val (Table 2).These VUS are present in the highly exposed or buried positions, responsible for the structural and functional properties associated with the native protein.

Screening of potentially deleterious VUS based on the stability analysis
The 16 potentially pathogenic VUS obtained from 286 VUS were further screened to study structural stability using PremPS (Chen et al., 2020) and Dynamut (Rodrigues et al., 2018).The structural coordinates of modelled BRCA2, a-helical domain (2478-2670 amino acids) were submitted as an input and the list of 16 VUS.The PremPS server predicted 14 VUS out of 16 as deleterious with unfolding Gibbs free energy change (DDG > 0 kcal/mol).The Dynamut server predicted 13 VUS to be potentially destabilizing based on variation in Gibbs free energy (DDG < 0.0 kcal/mol).Hence, both the servers predicted 12 common missense VUS having the potential to destabilize the structure of native protein and, therefore, could be deleterious (Table 2).These 12 VUS Asp2583Val, Pro2589Leu, Cys2605Tyr, Pro2608Leu, Trp2619Arg, Trp2619Ser, Trp2629Cys, Pro2639Phe, Cys2646Tyr, Cys2646Trp, Tyr2660Cys, and Asp2665Val were further selected for molecular dynamics studies to understand the structural impacts of a missense mutation in the native protein structure.

Molecular dynamics simulations
In order to better understand the dynamic characteristics of the deleterious mutations, molecular dynamics (MD) simulations were explored to analyze the differences between native and mutant protein structures.BRCA2, a-helical domain (2478-2670 amino acids) and all the 12 VUS were simulated for 100 ns by Gromacs.The trajectory files were generated after the MD simulation, root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), solvent-accessible surface area (SASA), and PCA analysis for the native and the mutant structures were carried out.MD simulations for five reported pathogenic variants Leu2510Pro, Leu2653Pro, Arg2659Gly, Ser2670Leu and Tyr2660Asp) and five reported benign variants Ile2490Thr, Arg2502His, Thr2515Ile, Leu2512Phe and Met2634Ile were also carried out (Supplementary Figure 3).

Structural stability analysis
To assess the conformational stability of the wild-type and missense VUS, the RMSD value of the protein backbone was calculated (Supplementary Figure 4).Notable differences were observed between the RMSD profile of the wild-type and 8 VUS (Pro2589Leu, Pro2608Leu, Trp2619Arg, Trp2619Ser, Cys2646Tyr, Cys2646Trp, Tyr2660Cys, and Asp2665Val) (Figure 3).The backbone RMSD profile of the wild-type a-helical domain showed two spikes at 12 and 22 ns and the structure attained stability after 35 ns of simulation.On the other hand, no stabilisation was observed in the RMSD profile of Trp2619Arg, Trp2619Ser, Cys2646Tyr, Tyr2660Cys and Asp2665Val.Higher fluctuations in the RMSD profile of missense variants indicate lesser stability of the mutants' protein structure conformations.Therefore, it can be concluded that the selected VUS are more flexible and conformationally less stable as compared to the wild-type.

Structural flexibility analysis
RMSF values were determined to evaluate the effect of VUS on the dynamic characteristics of the protein (Supplementary Figure 5).As observed from the RMSF profile, out of 12 VUS, 7 variants (Pro2608Leu, Trp2619Arg, Trp2619Ser, Cys2646Tyr, Cys2646Trp, Tyr2660Cys, and Asp2665Val) exhibited higher RMSF values compared to the wild-type (Figure 4).Comparative residual RMSF value calculated over trajectories of wild-type and VUS showed that the fluctuations were mostly present in the region of BRCA2 (2510-2550) and BRCA2 (2633-2648).Together, these observations suggest that the selected missense VUS affects the flexibility of the whole protein.

Structural compactness analysis
The radius of gyration (Rg) is defined as the mass-weighted root mean square distance of collected atoms from their centre of mass.Rg values determine the compactness of protein structure during simulation.Rg value was plotted for wild-type and all missense VUS (Supplementary Figure 6).The Rg value for the wild-type was found to be decreasing till 40 ns and thereafter, a spike was overserved at 66 ns followed by stabilization of Rg value at 1.78 nm.Out of 12 VUS, 7 variants BRCA2 Pro2608Leu, Trp2619Arg, Trp2619Ser, Trp2629Cys, Cys2646Tyr,  Tyr2660Cys, and Asp2665Val were found to have higher Rg value than the wild-type protein structure (Figure 5).Also, the time-averaged Rg values for these 7 variants were higher than the wild-type (Table 3).Rg profile clearly explains that the mutant proteins structures are more dynamic and less compact than the wild-type structure.

Sasa analysis
Solvent Accessible Surface Area (SASA) measures the solventexposed area in a protein structure.A lower SASA value indicates a compact structure, whereas the higher values of SASA reflect an open or diffused structure.SASA values were plotted for wild-type and all missense VUS (Supplementary Figure 7).Out of 12 VUS, 7 variants (Asp2583Val, Trp2619Arg, Trp2619Ser, Pro2639Phe, Cys2646Tyr, Tyr2660Cys, and Asp2665Val) were found to have higher SASA values than the wild-type (Figure 6).The time average SASA values for these VUS were also higher than the wild-type (Table 3).Higher SASA values of VUS indicate that the mutation has destabilized the mutant protein's hydrophobic core; therefore, mutant proteins are less compact.Furthermore, these simulations were repeated three time and the variants showing high RMSD, RMSF, Rg and SASA values were chosen for principal component analysis.Out of 12, only 5 variants, Trp2619Arg, Trp2619Ser, Cys2646Tyr, Tyr2660Cys and Asp2665Val, were predicted to have compromised structural integrity and conformational dynamics by altering the overall compactness of the BRCA2 a-helical domain (2478-2670 residues) (Supplementary Figure 8 and Supplementary Table 2).

Principal component analysis
Principal component analysis (PCA) is an extensively used method to predict the dynamic characteristics of proteins during molecular dynamics simulations.PCA was performed on simulation generated trajectories to identify large-scale collective motion of the BRCA2, a-helical domain and five variants Trp2619Arg, Trp2619Ser, Cys2646Tyr, Tyr2660Cys and Asp2665Val.The eigenvalues of the wild-type and potentially pathogenic variants were plotted against the corresponding eigenvector index for the first ten modes of motion (Figure 7a).Only the first few eigenvectors had larger values that played a significant role in the overall motion of wild-type and mutant protein structures.
Furthermore, projections of trajectories of eigenvectors 1 and 2 in phase space have been analyzed (Figure 7b-f).In all projection plots, the wild-type protein covered a smaller  region of phase space as compared to the Trp2619Arg, Trp2619Ser, Cys2646Tyr, Tyr2660Cys and Asp2665Val variants.
The larger region of phase space indicated that the mutants have more structural flexibility than the wild-type.Therefore, to our conclusion, the results of molecular dynamics simulations further aid to narrow down the variants which could be tested under in vitro and in vivo conditions.

Discussion
Here, we have correlated the findings observed from publicly available in silico tools along with molecular dynamics simulations.It can be stated here that in silico tools alone often predict an unacceptable proportion with erroneous results (Ernst et al., 2018).Therefore, we validated our workflow by  subjecting 13 known pathogenic and 5 known benign missense mutations to in silico servers.Known pathogenicmutations were s predicted to be pathogenic by 3 or more in silico servers (Supplementary Table 3).Furthermore, to make selection criteria more stringent, six dissimilar in silico tools having different algorithms to predict the pathogenicity associated with the variants were applied.Using multiple in silico tools to characterize VUS may provide some confidence to the prediction.However, the in silico tools are still only considered a single piece of evidence and require clinical correlation.Most of the VUS do not have any clinical data or family history.Therefore, we applied MD based approach to evaluate the structural dynamics.Recently, MD simulationbased approaches have shown to be helpful for the characterization of BRAC1 and BRCA2 VUS (Sinha & Wang, 2020;Sinha et al., 2022).We have further performed MD simulations thrice and compared the results obtained (Supplementary Figure 8-9).Based on multi-modal analysis, it can be concluded that five VUS missense substitutions in BRCA2 Cys2646Tyr, Asp2665Val, Trp2619Arg, Trp2619Ser and Tyr2660Cys could be considered for in vitro, structural and in vivo analysis to classify likely pathogenic/pathogenic.The potentially pathogenic Trp2619Ser variant has also been previously reported to be non-functional in homology-directed repair (HDR) assay (Richardson et al., 2021).The Asp2665 is reported to be one of the DSS1 binding residues, and therefore, its mutation to Valine could result in partial or complete loss of DSS1 interaction (Yang et al., 2002).Comparative in vitro and structural studies such as differences in secondary and tertiary structure of protein could further stablish the role of these potentially pathogenic mutations.Functional studies such as protein-protein interactions and HRD assay can further aid in exploring the mechanistic impact of these mutations.Overall, the study reported here will provide a scientifically rigorous and fast alternative to identify pathogenic missense variants associated with BRCA1/2.The amalgamation of in silico tools, advanced computational power, and protein structure information could prove to be a fast and reliable alternate for missense VUS analysis.The computational tools could identify potentially pathogenic variants, but their results should be correlated with clinical finding.Nevertheless, such analysis could help to understand the mechanistic insights of protein for basic as well as translational research.

Figure 1 .
Figure 1.Workflow to assess the pathogenicity of missense VUS identified in the h-BRCA2 a-helical domain.

Figure 2 .
Figure 2. Evolutionarily conserved regions of BRCA2 a-helical domain predicted by the ConSurf server.

Figure 7 .
Figure 7. Principal Component Analysis for wild-type BRCA2 a-helical domain (WT) and potentially pathogenic variants.(a) The eigenvalue for the first ten modes of motion of WT and variants, (b-f) Comparative Eigenvector projection profile of WT and potentially pathogenic variants, Trp2619Arg, Trp2619Ser, Cys2646Tyr, Tyr2660Cys and Asp2665Val respectively.

Table 2 .
Evolutionary conservation and mutation-induced changes in protein stability of BRCA2 VUS as predicted by ConSurf, PremPS and Dynamut servers.Variants highlighted in red were not chosen for further analysis.

Table 3 .
Time-averaged structural properties calculated for BRCA2 a-helical domain (WT) and missense variants.Variants highlighted in bold have RMSD, RMSF, Rg and SASA values above reported benign variants and hence, considered pathogenic.