Coexpression network analysis of platelet genes in sickle cell disease.

Abstract Platelets play important roles in vascular health. Activation of platelet may contribute to coagulation and inflammation. Evidence suggests circulating platelets are chronically activated in sickle cell disease (SCD) patients with steady state and further activated in vaso-occlusive crisis. However, the molecular basis of sickle platelet dysfunction remains obscure. Here, we used weighted gene coexpression network analysis combined with differentially expressed genes (DEGs) analysis to further investigate this basis. We found 57 DEGs were closely related to platelet dysfunction in SCD. Enrichment analysis showed that these 57 genes were mostly related to protein synthesis, adenosine triphosphate (ATP) synthase activity and inflammation, suggesting a hyperactivation status of platelets in SCD. We identified six hub genes from the 57 DEGs according to their Gene Significance value ranking, including CRYM, CCT6P1, SUCNR1, PRKAB2, GSTM3 and FCGR2C. Altogether, our results offered some new insight into platelet activation and identified novel potential targets for antiplatelet therapy in SCD.


Introduction
Sickle cell disease (SCD) is the most common inherited blood disorder, caused by inheritance from both parents of an altered β-globin chain gene of which one is at least sickle hemoglobin [1]. Treatment options for SCD are still limited. Many patients require chronic red blood cells (RBCs) transfusion regimens, which however may result in iron overload and alloimmunization [2]. SCD is characterized by painful vascular occlusions, chronic hemolysis and inflammation, resulting from complex interactions between platelets, sickled RBCs, endothelial cells and plasma proteins.
A hypercoagulable and inflammatory state is known to contribute to the occurrence of vascular occlusions in SCD. Platelets are thought to be the primary mediators of hemostasis and thrombosis and also play an important role in inflammation [3]. Circulating platelets of patients with SCD have been reported to be in a higher activation than those of healthy controls and this heightened reactivity may further increase in vaso-occlusive crisis [4]. However, the exact mechanism by which platelets are activated in SCD is not clear. In addition, there may be a role for antiplatelet therapies in reducing the incidence and consequences of vascular occlusions in SCD.
To investigate the gene expression network associated SCDrelated platelet activation and find the key genes, we used weighted gene coexpression network analysis (WGCNA) in the present analysis. WGCNA is a useful method to link clustered genes to phenotypic traits. Based on this method and a microarray dataset from Gene Expression Omnibus [4], we construct a gene coexpression network involved in platelet activation in SCD. From the constructed coexpression network, we further identified several key genes including CRYM, CCT6P1, SUCNR1, PRKAB2, GSTM3, FCGR2C. These novel hub genes can be potential targets for antiplatelet therapy in SCD.

Data Preprocessing and Differentially Expressed Genes Screening
Raw data are available at the Gene Expression Omnibus database under the accession number GSE11524 (GSM290396-GSM 290425) [4]. Data for reanalysis comprise 18 SCD samples and 12 control samples (Table I). The samples were collected in steady-state condition, and none of the controls or patients were taking antiplatelet medication [4]. We normalized the data using the Robust Multi-array Average (RMA) method in R package "affy" [5]. The R/Bioconductor package "limma" was used to screen differentially expressed genes (DEGs) [6]. Genes with a false discovery rate (FDR) of below 0.05 were considered differentially expressed.

Weighted Gene Coexpression Network Analysis
By calculating the median absolute deviation (MAD), we selected the top 5000 most variant genes to generate a weighted coexpression network using R package "WCGNA" [7]. To weight highly correlated genes, the soft thresholding power (β) was set as 7, and the minimal module size was set as 30. To define clusters of genes in the dataset, the adjacency matrix was used to calculate the topological overlap matrix (TOM), which shows the degree of overlap in shared neighbors between pairs of genes in the network. 1-TOM was used as the dissimilarity measure for hierarchical clustering and module detection. Modules of clustered genes were then selected using the Dynamic Tree Cut algorithm within WGCNA. To identify modules that are significantly associated with the measured clinical traits, expression profiles of each module were summarized by the module eigengene (ME) and the correlation between the module and the trait was calculated. The associations of individual genes with SCD phenotype were quantified by Gene Significance (GS) value. For each module, module membership (MM) was defined as the correlation of the ME and the gene expression profile.

Gene Ontology and KEGG Pathway Enrichment Analysis
Gene Ontology (GO) covers three domains: molecular function (MF), cellular component (CC) and biological process (BP). GO and KEGG pathway enrichment analyses were done by Enrichr (http://amp.pharm.mssm.edu/Enrichr/) [8], which is a web-based enrichment analysis tool providing various types of visualization summaries of selected genes.

Protein-Protein Interaction (PPI) Network Enrichment Analysis
Secreted genes were submitted to STRING version 10.5 (https:// string-db.org/) to predict the interactions among proteins encoded by those genes. The PPI network was subsequently visualized by Cytoscape (version 3.5.1.) [9].

Data Normalization
Gene expression data GSE11524 [4] were downloaded from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). The detailed information regarding the samples is tabulated in Table I. Raw data before and after normalization were shown in Figure 1. The horizontal black lines represent the median gene expression value for each sample. Compared to the raw data, the normalized data showed uniform distribution of the expression levels and were appropriate for further analysis.

Weighted Gene Coexpression Network Analysis
To investigate the relationship between gene expression changes and disease status, we applied WGCNA. The normalized gene expression data should be filtered prior to WGCNA since lowexpressed or non-varying genes usually represent noise. Filtering genes by differential expression will completely invalidate the scale-free topology assumption and cause the failure of WGCNA. Filtering genes/probes by variance have been used in most WGCNA studies, so we filtered 5000 probes by variance using the MAD method. We first clustered these 30 samples based on their Euclidean distance and found two obvious outliers (samples GSM290405 and GSM290419) (Supplementary Figure 1). The two outlying samples were removed, leaving 28 samples (SCD, n = 17; Control, n = 11) for further analysis. The sample dendrogram showed the hierarchical cluster of the remained 28 samples (Figure 2(a)). A total of 5000 probes across the 28 samples were hierarchically clustered based on topological overlap (Figure 2(b)). WCGNA identified 12 modules ranging in size from 104 to 1312 probes with a median of 231.5 probes (Supplementary Table 1). Altogether, 3688 probes were assigned to modules and 1312 probes were not classified into any modules  (designated as gray) (Figure 2(b)). The ME represented the expression of all probes classified into that module. We then assessed ME relationship to the SCD phenotype and identified one upregulated module (MEturquoise) that was significantly (p < 0.05) correlated with SCD (Figure 2(c)). It is also demonstrated that GS.SCD and MM were strongly correlated in the MEturquoise module, suggesting a high correlation between the SCD phenotype and genes in the MEturquoise module (Supplementary Figure 2a). In contrast, genes in the MEgray module showed only weak correlations (Supplementary Figure  2b). To visualize the weighted network, we plotted a heatmap that depicted the Topological Overlap Matrix (TOM) among those probes (Figure 2(d)). Next, we quantified module similarity by eigengene correlation to explore the relationships among modules ( Figure 3). The dendrogram indicated that the MEpink module was highly related with the MEturquoise module, suggesting that probes/genes in these modules may have similar biological functions.

Enrichment Analysis of DEGs in MEturquoise Module
The MEturquoise module contained 848 Affymetrix probes. After removal of duplicates, 649 human genes remained. Genes filtered via WGCNA method were more likely to be of functional importance. These 649 genes were closely related to SCD phenotype and ranked according to their GS.SCD value (Supplementary  Table 1). However, not all of them were differentially expressed in the present dataset. To screen genes which were differentially expressed between SCD and normal control in the present dataset, we used "Limma" package in R/Bioconductor software with the cutoff criteria of FDR <0.05. We finally obtained 383 DEGs, including 265 up-regulated and 118 down-regulated DEGs (Figure 4(a) and Supplementary Table 2).
Next, we merged DEG analysis (Limma) results and the previous WGCNA results to identify genes which were both differentially expressed and also tightly correlated with SCD. Finally, 57 genes were obtained for the enrichment analysis (Figure 4(b) and Supplementary Table 3). Results of GO and KEGG pathway enrichment analysis of the 57 genes were shown in Figure 5. The GO and KEGG pathway categories identified were mostly related to the granule membrane or ribosome, the function of endoplasmic reticulum, ATP synthase activity and the phagosome pathway. These categories are primarily associated with the protein synthesis and microvesicle formation process, suggesting that there is a hyperactive protein synthesis and microvesicle formation status in SCD platelets compared with the healthy control.  Indeed, circulating platelets are significantly activated even under steady conditions in SCD patients and are further activated in vaso-occlusive crisis [10]. Upon activation, platelets release plenty of proteins and non-coding RNAs to plasma from their granules [11], which can aggravate the inflammation and thrombosis in the disease.
We also conducted a PPI network enrichment analysis for the above DEGs. The 57 genes were loaded into String database for analysis and a network consisted of 15 nodes connected via 17 edges was constructed (Figure 6). In a PPI network, a node represents a protein and an edge represents an interaction between proteins. Obviously, the network can be divided into three subnetworks (the red dashed line, Figure 6), with no interaction between each one. Genes in different subnetworks showed distinct biological functions/process, including functions related to ribosome/protein synthesis (Figure 6(a)) and ATP synthase activity (Figure 6(c)), and pathways related to inflammation (Figure 6(b)). In general, the whole PPI network reflected three aspects of platelet activation in SCD and could contribute to the development of innovative therapies. Perturbation of this SCD-causing network is probably able to improve the disease.

Hub Gene Identification
GS is a WGCNA parameter defined by the correlations between individual genes and clinical trait (here is SCD). Genes with their GS value ranked at top 10% in the 57 DEGs were defined as hub genes, including CRYM, CCT6P1, SUCNR1, PRKAB2, GSTM3, FCGR2C (Table II). Notably, among the six hub genes and even among all 57 DEGs, FCGR2C was the most significant DEG with the highest fold change value.

Discussion
SCD is a complex disease with diverse clinical complications. WGCNA is a systems biology method for revealing the higherorder relationships of genes and has been widely applied in studying complex diseases [12]. Key genes screened via WGCNA were more likely to be of functional importance. In the present study, we combined the WCGNA method and DEG analysis to investigate the pathological platelet activation in SCD and its molecular basis.
We identified 12 modules and found that the module MEturquoise consist of 649 genes was significantly correlated with SCD phenotype. After integrated with limma results which identified 384 DEGs, we finally obtained 57 genes. Enrichment analysis indicated that these 57 genes were mostly related to protein synthesis, ATP synthase activity and inflammation. Consistent with previous studies [13], our results revealed a hyperactive protein synthesis status in SCD, which could partially reflect a hyperactive platelet activation status. Activated platelets may promote the adhesion of sickle erythrocytes to human vascular endothelium and may also contribute to thrombosis and pulmonary hypertension in SCD [14]. Activated platelets may also release microparticles into the plasma [15]. Our results also showed that DEGs were enriched in the granule membrane, endoplasmic reticulum function and the phagosome pathway, suggesting a robust production of microparticles in SCD platelets. We also constructed a SCD-causing network from the 57 genes, which showed the interaction between these genes and could also reflect SCD-related platelet activation more directly.
According to GS ranking, we defined several key genes including CRYM, CCT6P1, SUCNR1, PRKAB2, GSTM3, FCGR2C. Among these genes, FCGR2C was the most significant DEG. FCGR2C is a kind of Fc-gamma receptors (FcγRs) encoding FcγRIIc (CD32c) [16]. FCGR2C appears to be the product of an unequal crossover of the FCGR2A and FCGR2B genes encoding the activating FcγRIIa (CD32a) and inhibitory FcγRIIb (CD32b), respectively [17]. FCGR2C gene is located on chromosome 1 and contains eight exons, which are highly homologous to exons 1-4 from FCGR2B and exons 5-8 from FCGR2A. FCGR2C are expressed on various cell types, such as RBCs, platelets, macrophages, B lymphocytes and so on. More recently, it has been found FCGR2C mutation is associated with protection from RBC  alloimmunization in SCD [18]. This mutation results in either an open reading frame (FCGR2C-ORF) or the more common stop codon (FCGR2C-Stop). SCD patients with the FCGR2C-ORF polymorphism have more than a threefold lower risk for RBC alloimmunization compared with patients without this mutation [18]. Mutations of FCGR2C determine the functional expression of this gene. Nonetheless, the biological function of FCGR2C in SCD or platelet activation remains unknown. Altered expression levels of FCGR2C on the cell surface can alter the balance between activating and inhibitory signals and therefore disrupting the homeostasis of many inflammatory responses. Since it is well known that platelets play a critical role in inflammation and various studies have shown that platelets have contributory roles in vasoocclusions and provide a link between thrombosis and inflammation in SCD [10], we suspected that the significant upregulation of FCGR2C in SCD platelets in the present study was likely to contribute to the inflammatory and coagulable status in SCD. GSTM3 belongs to the glutathione S-transferase (GST) family, which is known for their ability to catalyze the conjugation of the reduced form of glutathione to xenobiotic substrates for the purpose of detoxification [19]. Several previous studies have shown that GST polymorphisms were closely related to manifestations of SCD [20,21]. Recently, it has been reported variations of GSTM3 were associated with the phenotypic heterogeneity in SCD [22]. Another two identified hub genes, SUCNR1 and PRKAB2 (also known as AMPK beta 2), are also found expressed in platelet and associated with platelet activation and aggregation [23][24][25].
Overall, these hub genes could possibly play important roles in platelet activation in SCD. Our analysis was based on a public dataset derived from a previous study [4]. The raw data were available at the Gene Expression Omnibus database. Studies have shown that hydroxyurea or transfusion therapy can affect platelet function in various ways in SCD [26,27]. Some clinical information about hydroxyurea or transfusion therapy of patients was not found in the original study. However, it is noteworthy that none of the controls or patients were taking antiplatelet therapy in the original study and the samples were collected in the steady-state condition [4]. Therefore, the putative hub genes identified in the present study were meaningful and helpful for future study.
The most important limitation lies in the fact that the lack of enough resources to validate our findings. Further experimentation needs to be carried out in order to explore in detail the role of these hub genes in the pathology of SCD. Notwithstanding the limitations, this work offers valuable insights into the potential molecular bases associated with platelet activation in SCD.