Dynamics of the personalities of PCSK9 on missense variants (rs505151 and rs562556) from elderly cohort studies in Brazil

Abstract The Proprotein Convertase Subtilisin/Kexin Type 9 (PCSK9) promotes the degradation of the low-density lipoprotein receptors (LDLR). Gain-of-function (GOF) variants of PCSK9 significantly affects lipid metabolism leading to coronary artery disease (CAD), owing to the raising the plasma low-density lipoprotein (LDL). Considering the public health matter, large-scale genomic studies have been conducted worldwide to provide the genetic architecture of populations for the implementation of precision medicine actions. Nevertheless, despite the advances in genomic studies, non-European populations are still underrepresented in public genomic data banks. Despite this, we found two high-frequency variants (rs505151 and rs562556) in the ABraOM databank (Brazilian genomic variants) from a cohort SABE study conducted in the largest city of Brazil, São Paulo. Here, we assessed the structural and dynamical features of these variants against WT through a molecular dynamics study. We sought fundamental dynamical interdomain relations through Perturb Response Scanning (PRS) and we found an interesting change of dynamical relation between prodomain and Cysteine-Histidine-Rich-Domain (CHRD) in the variants. The results highlight the pivotal role of prodomain in the PCSK9 dynamic and the implications for the development of new drugs depending on patient group genotype.


Introduction
The Proprotein Convertase Subtilisin/Kexin Type 9 (PCSK9) negatively regulates the low-density lipoprotein receptors (LDLR) through its lysosomal degradation (Cunningham et al., 2007).Variants that promote gain-of-function (GOF) of PCSK9 affects significantly lipid metabolism by raising the plasma low-density lipoprotein (LDL), which entails the development of coronary artery disease (CAD) (Seidah et al., 2014).The GOF phenotype is linked to familial hypercholesterolemia (FH), a genetically inherited disease that is responsible for early CAD events (Berberich & Hegele, 2019).Meanwhile, the PCSK9 variants have gained clinical interest due to the discovery of loss-in-function (LOF) variants that were protective against CAD (Berge et al., 2019).As well, that discovery started a wide field for targeting PCSK9 as lipid-lowering therapy for FH patients (Costet et al., 2008).Hence, the PCSK9-variants assessment through whole-genome sequencing (WGS) has pivotal importance for FH understanding and control (Berberich & Hegele, 2019).The populational variants represented by whole-genome sequencing (WGS) orientate functional studies, drug discovery strategies, and precision medicine (Vicini et al., 2016).
Large-scale genomic studies have been conducted worldwide to provide the genetic architecture of populations for the implementation of precision medicine actions (Rocha et al., 2020).Regardless of the noteworthy advances in genomic studies, non-European populations were underrepresented in public genomic data banks (Popejoy & Fullerton, 2016).In the Brazilian context, the ABraOM (Brazilian genomic variants) repository contains WGS data from the SABE study (Naslavsky et al., 2017), a sample of 1171 elder individuals from São Paulo, Brazil's largest city.The occurring variants of this study may represent the Brazilian main frame of variants (Naslavsky et al., 2022) and might contribute to the understanding of more prevalent PCSK9 variants in São Paulo city and Brazil.Additionally, the composition of populational variants might affect the drug binding mode and impair therapy efficacy (Wan et al., 2021).Thereby, the continuous evaluation of variant structural features is required, either to overcome these possible issues or unravel the complex structural features of proteins through their variants.
The PCSK9 presents a self-inhibited proconvertase structure (Hampton et al., 2007), which compromises a prodomain (PD: 31-152) containing a segment 31-50 that is a transient alpha helix (THA), a catalytic domain (CD: 153-404), and a 3-module C-terminal domain (CM1: 457-530, CM2: 534-601, and CM3: 608-692) linked by an exposed hinge region (405-452) (Cunningham et al., 2007).The catalytic domain is responsible for binding the epidermal growth factor A (EGF-A) from LDLR (Sundararaman et al., 2021).Some of the critical mutations that lead to the FH occur in this domain, promoting increased affinity and posterior degradation in the late lysosome due to the lower pH (Cunningham et al., 2007).On the other hand, the C-terminal does not interact directly with EGF-A, but a set of variants in that domain would increase LDLR degradation (S� anchez-Hern� andez et al., 2019) and interfere with the PCSK9 secretion (Saavedra et al., 2012).There are mainly two clusters of variants occurring in the C-terminal structure: The GOF variants that lie on the CM1 loops and beta-sheets (R469W, R499H (S� anchez-Hern� andez et al., 2019), E482G, A514T, F515L, and A522T); The LOF CM3 variants (P616L, S668R, and C679X) (Saavedra et al., 2012).Nonetheless, the CM2 is unique among the C-terminal modules that are required for PCSK9 extracellular activity (Saavedra et al., 2012).Connecting the CD to the Cterminal, the hinge domain plays a critical role in PCSK9 folding and secretion (Deng et al., 2020).Previously reported, the hinge domain variant R434W results in a misfolded PCSK9 with �70% decreased extracellular pathway activity (Dubuc et al., 2010).An extended observation showed that R424W leads to a lack of intracellular activity on the LDLR (Saavedra et al., 2012), suggesting that the hinge domain is critical for PCSK9 active fold both intracellular and extracellular.
As mentioned before, the public genomic databanks underrepresent the Brazilian population due to the singular admixed characteristics (Rodrigues de Moura et al., 2015).In this way, the assessment of the tangle PCSK9 mechanism by its domain synergies is the paramount importance to getting insights into the genetic disease and guiding new high-cover drug discovery approaches.Finally, we aimed to select the more frequent PCSK9 variants from the ABraOM databank and explore their structural behavior through molecular dynamics simulations, precisely, the molecular collective motions, domain interchange, and interactions between PCSK9 and EGF-A domain.

Variant selection
The search for the most prevalent PCSK9 variants was done on the web server ABraOM (https://abraom.ib.usp.br/).We searched for the key gene 'PCSK9' in the Genome reference SABE-WGS-1171 (hg38) using a quality filter for GATK pass and Exonic or Splicing/CDS variants only.The frequencies were obtained as a standard output of the ABraOM request, representing the ratio between allelic number and allelic count.Therefore, the higher frequency variants were selected for molecular dynamics analysis.

Structure preparation
The variants rs505151 and rs562556 were created by Maestro edition tool.The WT and variants structures were prepared by adding hydrogen atoms and fixing missing side chains using the Biopolymer tool of Sybyl-X 2.0 (TRIPOS Inc.) and PROPKA (Bas et al., 2008) code (Github code: https://github.com/jensengroup/propka).

Molecular dynamics simulations
The prepared PCSK9 variants and WT structures were simulated through Molecular Dynamics (MD).The engine used was the software Desmond (Bowers et al., 2006) with the OPLS-2005 force-field (Banks et al., 2005).The system was constructed with a protein-ligand complex, a predefined solvent water model (TIP3P (Jorgensen et al., 1983)), charge neutralizing counterions (Na þ or Cl -), and periodic boundary conditions (PBC).The system box was constructed considering at least the distance of 13 Å from the protein atoms to the box edges.Short-range coulombic interactions were set at a cut-off value of 9.0 Å and the integration time step calculated every 2 fs, while long-range coulombic interactions were estimated using the Smooth Particle Mesh Ewald (PME) method (Darden et al., 1993).Each system was subjected to at least 1 ls and all trajectory and interaction data are available on the Zenodo repository (DOI: 10.5281/zenodo.6612263).
Atomic interactions, distances, and secondary structure elements were determined using the Simulation Event Analysis pipeline as implemented in Maestro 2019v.4(Schr€ odinger LCC).The criteria for protein-ligand H-bond are 2.5 A˚ distance between the donor and acceptor atoms (D-H���A); �120 � angle between the donor-hydrogen-acceptor atoms (D-H���A); and �90 � angle between the hydrogen-acceptorbonded atoms (H���A-X).The protein-water and water-ligand H-bonds were considered when 2.8 A˚ (D-H���A); �110 � (D-H���A); and �90 � (H���A-X).Non-specific hydrophobic interactions are defined by the presence of a hydrophobic side chain within 3.6 A˚ of the ligand's aromatic or aliphatic carbons.p-p interactions are recorded when two aromatic groups are stacked face-to-face or face-to-edge and within 4.5 A˚ of distance.The MD trajectories were visualized in PyMol v.2.2.3 (Schr€ odinger LCC, New York, NY, USA).The distance and angles analyses were used to assess the movement of structures.The distance of selected regions was calculated by the center of mass distance using the script (trj_asl_distance.py)available on Schrodinger package 2019v.4.

Principal component analysis and essential dynamics
The Principal Component Analysis (PCA) was used to study the main features of WT PCSK9 against its mutations.The backbone of domains was extracted and centered to the initial frame of WT simulation (trj_selection_dl.py and trj_center.py)from Schrodinger package 2019v.4.The simulations were aligned and merged using gmx_trjconv from GROMACS [22] 2022.The covariance matrix was calculated by gmx_ covar and the eigenvectors were analyzed through gmx_ anaeig.

Perturb response scanning (PRS) analysis
The PRS analysis was carried out by the Python package ProDy v.2.0 (Zhang et al., 2021).The PRS matrix was generated by the function calcPerturbResponse (Atilgan & Atilgan, 2009).The function was fed using the covariance matrix output from essential dynamics analysis.The PRS analysis was generated to PCSK9 WT and variants individually.
Given the importance of EGF-A affinity in the GOF phenotype (Bottomley et al., 2009), we investigated the interactions between PCSK9 and EGF(A) domain in our simulations.Our results show that PCSK9 interacts with the EGF(A) domain more frequently in variant missense simulations than in WT simulations (Figure S3A).We also recorded the total number of hydrogen bonds for each frame (Figure S3B) and observed that the variants had higher fluctuations in the number of hydrogen bonds compared to WT.We further investigated these fluctuations by analyzing the center of mass distance between EGF(A) and PCSK9 (Figure S3C) and the Root Mean Square Fluctuation of EGF(A) domain (Figure S3D).
PCA revealed that PC1 on prodomain captured 58.16% of the total variance of simulations (Figure 2A).This amount of variation suggested an extensive conformational variation of the WT prodomain than p.(V474I) and p.(G670E) (Figure 2A.cyan and purple, respectively).Further, PC2 showed a conformational space overlap among simulations represented by 20.95% of total variation (Figure 2B).
We found that the segments 31-50 from the WT prodomain showed more than 70% secondary structure prevalence when compared to the mutated PCSK9 (Figure 2C).Interestingly, the TAH segment was more labile in the WT simulations (Figure 2D).To explore the structural features involved, representative frames of the simulations were selected from PCA clusters (Figure 2E-G).In all those frames we notice interdomain contacts (prodomain-CHRD).The most important interaction was seen for R469 in WT and R476 in both variants (Figure 2H-J).Hence, the panel of interactions between prodomain and CHRD was shown for hydrogen bonding (Figure 2K) and water-bridge bonding (Figure 2L) across the systems.The variants showed interdomain interactions in higher extension when compared to WT.

PRS analysis showed the effectiveness of mutated residues and the overall residues
To address the dynamical interchange of prodomain and CHRD, a multidomain essential dynamics analysis was generated to perform the PRS analysis.The PRS is based on a linear response theory where the perturb forces applied via covariance matrix could determine the effectors (by effectiveness) in allosteric signal transduction (Dutta et al., 2015).Therefore, the PRS is useful to determine how single residues or regions can affect protein conformational changes.Here, we applied that analysis to assess qualitatively how the variant residues interfere with the overall dynamics of PCSK9.Firstly, we sought the PRS effectiveness caused by residues 474 and 670 among the systems (Figure 3A-D).Given this, the perturb of WT residue position 474 (WT: V474) acts as a prodomain THA segment effector (Figure 3A) while the perturb of variant 474 (Variant: I474) does not (Figure 3B).Furthermore, the perturb of WT residue position 670 (WT: G670) (Figure 3C) presented more effectiveness on prodomain residues than variant 670 (Variant: E670) (Figure 3D).These two variants showed less perturb in an important region of prodomain when compared to WT residues.Further, the overall PRS analysis of the prodomain (chain P) against the catalytic and CHR domains (chain A) identified a chain A region that acts as a direct effector for the prodomain (Figure 3E).The effectiveness of these regions was stronger in the WT simulations, suggesting that these CHRD mutations could disperse and diminish the allosteric control of the prodomain through chainA-chainP interface regions.
The crucial regions of chain A were identified as: I. Two helices (a6, a7) and two loop segments (206)(207)(208)(209)(210)(211)(212)(315)(316) from the catalytic domain (Figure 3E and F), and II.The beta sheets from CM1 (Figure 3E and G).Region I of WT showed high effectiveness to the prodomain residues mainly in the segment 50-152, while region I of variants showed lower effectiveness.Region II of WT affected mainly the TAH segment, while region II of variants presented a low effect on TAH residues.Despite this, region II from variants presented a diffuse effectiveness behavior across prodomain residues, indicating that the variants could perturb the allosteric regulation of CHRD.

Unraveling the affinity of EGF(a), interdomain interactions, and dynamics: Insights into the relationship between variants and pathogenicity
The ABraOM databank variants V474I and G670E were reported previously among other populations (Evans & Beil, 2006;Ferreira et al., 2020;Gai et al., 2021;Li et al., 2020;Mohamed et al., 2021).The V474I variant is associated with raised circulating LDL, blood glucose, body mass index (BMI), mean platelet volume (MPV), and red blood cell distribution width (RDW), suggesting pleiotropic effects more than affecting lipid metabolism (Gai et al., 2021).Therefore, the p.(V474I) variant is in the second beta-sheet of CM1, which is connected to the hinge loop (Figure 1).The CM1 is a module that was suggested to control allosterically the hinge domain (Deng et al., 2020).As mentioned before, the CM1 region compromises a well-described set of GOF variants.
A meta-analysis study has revealed that the G670E variant is associated with increased circulating levels of triglycerides and LDL, as well as a higher risk of cardiovascular disease (Qiu et al., 2017).However, in East Asian populations, the G670E variant is associated with extremely low levels of LDL-C, even among individuals with common variants in both APOB and PCSK9 genes (Lee et al., 2017).This suggests that variants in APOB may offer better protection than PCSK9 variants in this scenario.Additionally, in elderly populations in  Brazil, certain characteristics related to miscegenation may have been misunderstood compared to other populations around the world.
An increased affinity to the LDLR's EGF(A) is pointed to as the main factor of GOF phenotype such as seen in D374Y classic mutation (Bottomley et al., 2009;Mousavi et al., 2011;Pandit et al., 2008).However, the frequencies of interactions observed in the N-terminal region of the EGF(A) domain in variant systems alone were insufficient to account for GOF phenotype exhibited by these variants.In summary, the results indicate an increase in the flexibility of the EGF(A) Nterminal region, which has resulted in changes in the distribution of hydrogen bonds, rather than an increase in the number of interactions.
We found an essential dynamics difference between WT and mutated proteins, mainly related to the prodomain and CHRD interchange.The segment THA from the prodomain could be the key to understanding the intricate relationships between PCSK9 domains.In this way, we hypothesized that the formation of alpha helix is a transient event that is triggered by medium content and THA lability.Accordingly, the work of Sarkar et al. (2020) demonstrated the formation of the transient alpha helix could be triggered by the increase of n-dodecyl phosphocholine (DPC) micelles in the medium and it is dependent on the integrity of the THA segment.Additionally, we propose that the lability behavior contributes to the alpha-helix formation and this event is controlled allosterically by the other domains.As seen in the PRS analysis (Figure 3E), the CHRD mutations could interfere with the allosteric regulation of the prodomain, promoting a decrease in the alpha helix formation.To this extent, the TAH formation could be the central key that allows LDL particles to bind on PCSK9 and promote its regulation (Kosenko et al., 2013).Some CHRD mutations could diminish (R469W and F515L) or abolish (R496W) the affinity of PCSK9 by LDL particules (Sarkar et al., 2020).It has been previously reported that more than one-third of circulating human PCSK9 is bound to LDL-C, and this fraction is predominantly present in a less-active monomeric form of PCSK9.Conversely, the unbound fraction can self-associate and generate a multimeric form of PCSK9, which has a higher affinity for LDLR and is more efficient in promoting its catabolism (Fan et al., 2008).In this context, the relationship between the prodomain and LDLR binding may represent the underlying mechanism of CHRD mutations, which desensitize PCSK9 to LDLR binding and favor a multimeric state over a monomeric state.Moreover, these findings have implications for drug discovery, particularly regarding the interdomain interactions.For instance, antibodies targeting the CM1 region have been shown to reduce LDL-C levels (Schiele et al., 2014), and considering the results of this study, this could be attributed to the loss of THA flexibility and immobilization of region II (Figure 3G).

Conclusion
In conclusion, we used the molecular dynamics simulations and PRS to extract regions that might exhibit unique contributions to prodomain dynamics.PRS analysis showed that the missense variants on PCSK9 reveals the conformational changes that could affect the PCSK9 structure than WT.Hence, it allows for the understanding of the dynamics of important CHRD variants found in Brazilian elderly people and might propose new drug targets for drug discovery based on genotype group.The results highlighted the pivotal role of prodomain in the PCSK9 dynamic and the implications for the development of new drugs depending on patient group genotype.

Figure 2 .
Figure 2. PCSK9 prodomain dynamics and interchange with CHRD.(A, B) represent the density of displacement along eigenvectors 1 and 2. The complete data of PCA is shown in Figure S2 (C) The secondary structure elements (SSE%) along with the simulation for the segment TAH. (D) The distances of TAH from prodomain a-2 calculate by the Center of Mass.(E, F, G) Representative structure of PCSK9 and its variants selected by PCA clusters of prodomain.A highlight of the TAH structure shows the interdomain contact.(H, I, J) Show the principal interactions between prodomain and CHRD that match with figures E, F, and G, respectively.The prevalence of interdomain interactions is shown in Figure S3.

Figure 3 .
Figure 3.The effectiveness of mutated residues and the overall effectiveness of the prodomain.(A, B) The pair of effectiveness for the WT and V474I variant for position 474, respectively.(C, D) The pair of effectiveness for the WT and G670E variant for position 670, respectively.(E) The PRS matrix of chain A (catalytic domain and CHRD) against prodomain.The highlights show the regions of more effectiveness in the prodomain and the related structures from chain A. The full PRS matrix is shown in Figure S4.(F, G) Represent the structure of the first and second highlights of Figure E, respectively.The secondary structures nomenclature index used in these images is represented in Figure S5.