Analysis and identification of ferroptosis-related genes in ulcerative colitis

Abstract Background Previous studies have shown that ferroptosis is associated with the pathogenesis of ulcerative colitis (UC). Therefore, this study aimed to identify key ferroptosis-related genes (FRGs) associated with the diagnosis of UC. Methods UC-related expression datasets were downloaded from the Gene Expression Omnibus (GEO) database. First, Weighted Gene Co-expression Network Analysis (WGCNA) was used to identify UC-related genes (UCRGs). Differentially expressed genes (DEGs) between normal and UC groups were screened in GSE87466, and DEGs were subjected to an intersection analysis with FRGs and UCRGs to obtain ferroptosis-related DEGs (FR DEGs). Then a protein-protein interaction (PPI) network was constructed for FR DEGs. The hub genes were extracted based on the degree, Maximum Neighborhood Component (MNC), closeness, and Maximal Clique Centrality (MCC). Biomarkers with diagnostic values were screened by support vector machine (SVM) and the least absolute shrinkage and selection operator (LASSO) algorithms. Next, the infiltration of immune cells was compared between UC and normal groups, and the correlation between different immune cells and diagnostic genes was analyzed. The biological functions, classical pathways, and intermolecular interaction networks of diagnostic genes were characterized utilizing ingenuity pathway analysis (IPA). Finally, a TF-mRNA network was constructed and potential small-molecule compounds were screened. Results Thirty-six FR DEGs were obtained, and these were enriched in biological processes such as positive regulation of cytokine production, cytokine-mediated signalling pathway, long-chain fatty acid-CoA ligase activity, etc. Among 18 hub genes, five genes (ALOX5, TIMP1, TNFAIP3, SOCS1, DUOX2) were captured with diagnostic values for UC, and they displayed significant differences between UC and normal groups. Sixteen immune cell infiltrates were significantly different between UC and normal groups, such as activated dendritic cells and resting dendritic cells. TNFAIP3 and ALOX5 were positively correlated with neutrophils, and TIMP1, SOCS1, ALOX5, and DUOX2 were negatively correlated with M2 macrophages. IPA showed that diagnostic genes were related to 43 function modules and activated 17 pathways. The constructed TF-mRNA regulatory network comprised three diagnostic genes and 17 differentially expressed TFs. Potential small-molecule compounds including helveticoside and cymarin were identified. Conclusion Our findings yielded several promising FRGs for UC, providing a scientific reference for further studies on the pathogenesis of UC.


Background
Ulcerative colitis (UC) is an inflammatory bowel disease [1].The symptoms are related to the frequency and severity of the disease.Bloody diarrhea is the most common early symptom, while others include rectal urgency, tenesmus, mucopurulent exudate, nocturnal defecation, crampy abdominal pain etc [2].Approximately 15% of patients experience an aggressive course.Some patients develop dysplasia and colorectal cancer (CRC) [3].Studies have shown that individuals with UC have a 2-3 times higher risk of developing CRC than the general population [4].However, the pathogenesis of UC is not fully understood.Previous research has shown that environmental factors, dysfunctional immune responses, genetic susceptibility, and intestinal epithelial barrier defects may underlie UC's pathogenesis [5,6].Therefore, exploring the etiological factors and pathogenesis of UC is of significance for the diagnosis and treatment of the disease.
Currently, the meaning that cannot be ignored of ferroptosis inhibitors in models of UC was revealed [7].Xu et al. showed that, in UC, phosphorylated NF-κBp65 inhibited the ferroptosis of intestinal epithelial cells (IECs) mediated by endoplasmic reticulum stress through the interaction with elF2α [8].And meanwhile, Chen et al. suggested that targeting C/EBPβ-PAK6 and Nrf2/HO-1 axis-mediated ferroptosis may be a potential therapeutic approach for UC [9,10].On the other hand, a variety of machine learning algorithms have been employed to identify potential individual or gene-based diagnostic models for the diagnosis of UC based on the known ferroptosis-related genes (FRGs), such as STAT3, LCN2, MUC1, and TIMP1 [3,[11][12][13].
Considering the importance of mucosal biopsy in the early diagnosis of UC, and inspired by the above research, we further performed bioinformatics analysis to obtain potential ferroptosisrelated biomarkers with diagnostic value for UC, and the gene-based nomogram was constructed for clinical utilisation.Besides, the transcription factor (TF) and small molecule compounds targeting key biomarkers were predicted to provide new perspectives of UC research.

Data extraction
UC-related datasets (GSE87466, GSE75214, and GSE38713) were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/).The GSE87466 dataset containing 21 normal and 87 UC samples was used as a training set, and the GSE75214 dataset containing 11 normal and 97 UC samples was used as an external validation set.Besides, considering the imbalance of the patients with UC and controls, GSE38713 which contains 13 normal and 15 UC samples was employed to further verify gene expression.Moreover, 264 FRGs were obtained from the FerrDb database (http://www.zhounan.org/ferrdb/).

Identification of UC-related genes
Weighted gene co-expression network analysis (WGCNA) was used to identify UC-related genes (UCRGs).First, the samples were clustered to remove outliers, and the soft threshold (β) of data was screened to ensure that the interactions of genes conformed to scale-free distribution to the maximum extent.Thereafter, the adjacency matrix was established and converted into a topological overlap matrix (TOM), and the gene dendrogram and module color were obtained based on the degree of dissimilarity.The genes of modules with the highest correlation coefficients were considered as UCRGs for subsequent analysis.

Identification and Functional analysis of ferroptosisrelated differentially expressed genes (FR DEGs)
The differences in mRNA expression levels between UC and normal groups in the GSE87466 dataset were compared using the "limma" package in R (version 3.52.4)with p < 0.05 and |log 2 (fold change)| >0.5 as the threshold criteria [14].The volcano map and the heatmap of up-and down-regulated differentially expressed genes (DEGs) were plotted to understand the overall distribution of DEGs.The intersection of the FRGs with the DEGs and UCRGs was obtained using the Venn tool to acquire.Finally, to assess the biological functions of the FR DEGs, the "clusterProfiler" package in R (version 4.4.4) was utilized for functional analysis of the FR DEGs, comprising Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.GO categories included biological process (BP), molecular functions (MF), and cellular components (CC).

Construction of a protein-protein interactions (PPI) network of FR DEGs and identification of hub genes
The Search Tool for the Retrieval of Interacting Genes (STRING) (http://string-db.org/)database was used to construct a PPI of FR DEGs.Subsequently, the PPI network was visualized using Cytoscape (version 3.8.2),and the hub genes were obtained through the intersection of the top 20 genes calculated based on the degree, Maximum Neighborhood Component (MNC), closeness, and Maximal Clique Centrality (MCC).

Screening diagnostic genes
First, two machine learning algorithms, support vector machine (SVM) and least absolute shrinkage and selection operator (LASSO) were applied for screening characteristic genes.Second, receiver operating characteristic (ROC) curves and the areas under the curve (AUCs) were used for estimating the diagnostic efficacy.Next, using the "rms" package, a nomogram based on characteristic genes was established, where the scores were generated according to the expression levels of the five characteristic genes, and the scores of each factor were added to the corresponding total score (TotalPoint) to predict the probability of UC.The calibration curve was drawn for evaluating the predictive performance of the nomogram.

Immune infiltration analysis
To evaluate the differences in the infiltration of immune cells between UC and normal groups in the training set, the Cell Type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT) algorithm was used to compute the proportions of 22 immune cells across all samples [15].Differences between UC and normal groups were analyzed by Wilcox.test and a boxplot was plotted using the "ggplot2" R package (version 3.3.6).The correlation between diagnostic genes and differential immune cells was analyzed by Pearson analysis.

Ingenuity pathway analysis (IPA) of diagnostic genes
For exploring the biological functions of diagnostic genes, IPA was performed, where the classical pathways with Z-score >2 were considered as the significant activation state and Z-score < −2 as the inhibition activation state.Moreover, IPA was used to analyze the upstream regulatory and downstream target genes and the possible pathways of their action.

Construction of TF-mRNA network for diagnostic genes
To better understand the mechanisms related to the diagnostic genes in UC, we constructed a TF-mRNA network.The TFs of diagnostic genes were predicted using CHIP-X Enrichment Analysis 3 (ChEA3) (https://amp.pharm.mssm.edu/chea3/).Finally, the TF-mRNA network was visualized using the Cytoscape software (version 3.8.2).

Prediction of small molecule compounds
Based on the FR DEGs, potential small-molecule compounds were predicted using the Connectivity Map (CMap, https:// clue.io/).The candidate small-molecule compounds were determined with an |enrichment score| >90.

Expression profiles of diagnostic genes in external validation datasets
To further confirm the reliability of our results, we compared the levels of expression of the diagnostic genes in UC and normal groups in the GSE75214 and GSE38713 datasets for external validation.

Acquisition of UCRGs
The results of sample clustering indicated no outliers in GSE87466 (Figure 1(A)).With regard to the WGCNA parameter setting, β was 10 (Figure 1(B)), and the minimum gene number in the gene module was 100.Next, by setting MEDissThres as 0.5, a sum of 10 modules was confirmed through the dynamic tree-cutting algorithm (Figure 1(C)).Furthermore, the MEdarkgreen module was markedly correlated with UC (Figure 1(D)).Ultimately, 4516 genes were identified as UCRGs for further analysis (Figure 1(E)).

Acquisition and Functional Enrichment of FR DEGs
A total of 3119 DEGs (1801 up-regulated and 1318 down-regulated) between UC and normal groups in GSE87466 (Figures 2(A and B)) were screened.Thirty-six FR DEGs (Table   S2-3) (Figure 2(D and E)) were significantly related to FR DEGs, including positive regulation of cytokine production, cytokine-mediated signalling pathway, long-chain fatty acid-CoA ligase activity, HIF-1 signalling pathway, and NOD-like receptor signalling pathway.

Identification of hub genes
The subset of all FR DEGs (36 nodes and 79 edges) was included in the PPI network, and four proteins were interacting with multiple proteins, namely IL6, TLR4, IL1B, HIF1A (Figure 3(A)).And meanwhile, the 18 genes common to four algorithms were used as hub genes (Figures 3(B-F)).

Capture of diagnostic genes
Six characteristic genes were identified through the LASSO algorithm, including ALOX5, CD82, DUOX2, SOCS1, TIMP1, and TNFAIP3.For the SVM algorithm, we obtained eight eigengenes, namely ALOX5, TIMP1, TNFAIP3, SOCS1, DUOX2, ZEB1, IL6, CTSB.Following the intersection analysis, five characteristic genes common to LASSO and SVM algorithms were identified (ALOX5, TIMP1, TNFAIP3, SOCS1, DUOX2).The diagnostic accuracy of the five characteristic genes was evaluated by ROC curve analysis in GSE87466 and GSE75214.The AUC values of the five genes were all greater than 0.7 in both datasets (Figures 4(A and B)).Besides, a nomogram based on five characteristic genes was drawn by the "RMS" package, and the calibration curve exhibited accurate predictability for UC (Figure 4(C)), suggesting that all characteristic genes had diagnostic utility for UC.

Immune infiltration analysis between UC and normal groups
The percentage abundance of 22 immune cell infiltrates between UC and normal samples is shown in the bar chart (Figure 5(A)).Sixteen immune cells were significantly different between UC and normal groups (Figure 5(B)), including activated dendritic cells, resting dendritic cells, and macrophages M0.Finally, correlation analysis between diagnostic genes and differential immune cells showed TNFAIP3 and ALOX5 had the highest positive correlation with neutrophils (Figure 5(C)), while TIMP1, SOCS1, ALOX5, and DUOX2 showed the highest negative correlation with M2 macrophages (Figure 5(C)).

IPA of diagnostic genes
Biological functions analysis showed that the diagnostic genes were related to 43 function modules; for instance, haematological system development and function, cellular movement, cell-to-cell signaling and interaction, immune cell trafficking, cellular growth and proliferation, cardiovascular system development and function, etc. (Figure 6(A)).Classical pathway analysis showed that the diagnostic genes were related to 31 pathways, and 17 of these pathways were activated, including the Th1 pathway, macrophage classical activation signaling pathway, macrophage alternative activation signaling pathway, leukocyte extravasation signaling, acute phase response signaling, hepatic fibrosis signaling pathway, etc. (Figure 6(B)).The molecular interaction network is shown in Figure 6(C).SOCS1 could inhibit STAT6 but STAT6 potentially activated DUOX2.

Construction of a TF-mRNA network
Sixty TFs corresponded to predicted diagnostic genes.Seventeen differentially expressed TFs were obtained by taking the intersection of DEGs and TFs.The TF-mRNA regulatory network was visualized using the Cytoscape software (Figure 7).In the TF-mRNA regulatory network, all three TFs (BHLHE40, FOSL1, HNF4A) could regulate SOCS1 and TNFAIP3.Both MXI1 and RUNX3 could regulate TIMP1 and SOCS1.Three TFs could regulate TNFAIP3 and TIMP1, including EGR1, IRF1, CEBPB.

mRNA expression of diagnostic genes
Based on the visualized data, the expressions of ALOX5, TIMP1, TNFAIP3, SOCS1, and DUOX2 were evaluated.Figure 9 shows the expression of the five diagnostic genes in GSE75214.There were significant differences in the levels of expression of the five diagnostic genes between UC and normal samples.The expressional trends of the five genes were consistent in GSE87466 and GSE38713 (except for TNFAIP3), suggesting that these five genes were of high diagnostic value.

Discussion
Many drugs have been shown to regulate UC progression through mechanisms involved in mediating ferroptosis inhibition [16,17].The deposition of several iron ions in the intestine leads to severe oxidative stress, and the Fenton reaction promotes the generation of ROS, thereby triggering ferroptosis, which further stimulates damage-related molecular patterns, and causes intestinal immune and inflammatory responses [11].Therefore, exploring the ferroptosis-related marker genes in UC has an important role.In this study, we obtained five biomarkers related to ferroptosis, namely ALOX5, DUOX2, SOCS1, TIMP1, and TNFAIP3.ALOX5, can catalyze arachidonic acid to generate pro-inflammatory cytokine leukotrienes and mediates lipid peroxidation to generate lipid peroxides [18].In normal physiological conditions, the level of ALOX5 expression is low, however, under pathological conditions, the expression increases, indicating that ALOX5 is related to the occurrence and development of diseases [19].Mohan et al. showed that ALOX5 is involved in UC, Crohn's disease, and arthritis [20].Zhou et al. suggested that ALOX5 is a potential target of dried ginger, Huangqin, Huanglian, and ginseng decoction used in the treatment of UC [21].In our study, ALOX5 was found to be a biomarker and target for the diagnosis and treatment of UC.
The intestinal innate defense system maintains intestinal balance and resists the attack of pathogens through the release of active oxygen by nicotinamide-adenine dinucleotide ester (NADPH) oxidase in the mucosal barrier [22].DUOX2 is a member of the NOX/DOUX oxidase family of proteins.In active UC, DUOX2 and DUOXA2 form an enzymatic system to generate ROS.In the process, the expression of DUOX2 increases [23].In our study, the expression of DUOX2 was increased in the intestinal mucosal tissue of UC patients.Derakhshani et al. demonstrated that the expression of DUOX2 was increased in the tissues of patients with CRC and inflammatory bowel disease, consistent with the results of our study [24].
SOCS1 is a member of the suppressor family of cytokine signaling pathways, and it is a critical negative regulator of inflammatory processes [25].SOCS1 deficiency leads to dysbiosis of the gut microbiota, creating a pro-colitis-generating environment [26].In Xia's study, SOCS1 was found to be the target of miR-222-3p, and its expression was down-regulated in UC.Silencing the SOCS1 gene can eliminate the inhibitory effect of the down-regulation of miR-222-3p on the inflammatory response [27].However, in our study, the expression of the SOCS1 gene was up-regulated in UC, consistent with the result of Fernandes et al. [28].This may be due to the difference in the results within the tested population.
TIMP1, a member of the TIMP family, encodes a matrix metalloproteinase.Huang et al. showed that TIMP1 is consistently upregulated in UC-associated CRC and may be a potential biomarker for poor prognoses of these patients [29].Multiple studies have shown that TIMP1 is up-regulated in the intestinal mucosa of UC patients and is a potential biomarker of UC [30,31], consistent with our research.
TNFAIP3, a gene that is rapidly induced by tumor necrosis factor, encodes a protein with ubiquitin ligase activity and deuterium protein activity.This protein is involved in cytokine-mediated immune and inflammatory responses.TNFAIP3 plays an essential role in maintaining normal NF-κB activation and inhibiting inflammation in the intestinal environment.Knockout of TNFAIP3 in mouse models leads to severe intestinal inflammation [32,33].In Majumdar's study, the expression of TNFAIP3 was found to be related to the severity of UC disease, and the change in the expression of this gene may be a link between UC and CRC [34].
In our study, CIBERSORT analysis revealed that the infiltration of neutrophils was significantly increased but that of M2 macrophages was significantly decreased in the intestinal mucosa of UC patients.Pearson correlation analysis showed that TNFAIP3 and ALOX5 were strongly correlated with neutrophils.The TIMP1, SOCS1, ALOX5, and DUOX2 were strongly negatively correlated with M2 macrophages.IPA results indicated that the pathways that were significantly enriched and activated with diagnostic genes included the classical activation signaling pathway of macrophages and the alternative activation signaling pathway of macrophages.In the UC's immune landscape in Penrose's study, it was found that M2 macrophage populations reduced and neutrophils increased in UC patients [35].Neutrophil infiltration is a central event in the early stage of UC [36].Disruption of intestinal macrophage homeostasis leads to neutrophil infiltration [37].Macrophages are divided into M1 and M2 phenotypes.Their plasticity is based on cytokine secretion patterns and pro-inflammatory and immunomodulatory activities [38].plays an important role in the polarization of M1 macrophages.Animal experiments show that the knockout of negative regulators of the NF-κB signalling pathway can increase the susceptibility to colitis [39,40].M2 macrophages can inhibit inflammatory responses by highly expressing Arg1, yM-1, IL-10, and other cytokines to alleviate the symptoms of UC [41,42].Ginsenoside Rg1 can inhibit the activation of macrophages in DSS-induced colitis mice.It can regulate the polarization balance between M1/M2 macrophages and improve the composition of intestinal flora to improve the disease state [37].Curcumin exerts therapeutic effects in colitis mice by regulating M1/M2 macrophage polarization and the TLR signalling pathway [43].We also predicted the regulatory factors of diagnostic genes and small molecule compounds targeting UC but through which pathway and how these act on UC, necessitates further experimental studies.
Although five ferroptosis-related biomarkers as well as the gene-based nomogram showed a better predictive performance for UC outcomes through bioinformatics analysis,  some limitations our study must be We performed bioinformatics analysis only and experimental validation is lacking.Further analysis is needed to confirm our findings using more samples.Research on relevant pathways is required.
In conclusion, using the GEO datasets and two machine learning methods, five ferroptosis-related genes, ALOX5, DUOX2, SOCS1, TIMP1, and TNFAIP3, were identified as potential biomarkers for UC.Considering the importance of mucosal biopsy in the early diagnosis of UC, the nomogram that converted the expression of the five key FRGs into a total score for clinical utilisation exhibited excellent performance as well.we make the case that our research provides supplementary ideas and approaches for the diagnosis of UC.

Figure 1 .
Figure 1.identification of 4516 key module genes related to ulcerative colitis (uc).(A) Sample clustering dendrogram in GSe87466.(B) analysis of the scale-free fit index (left) and the mean connectivity (right) for screening optimal soft-thresholding power.(C) cluster dendrogram based on similar expression patterns of all genes using dynamic tree-cutting algorithm.(D) correlations heatmap between modules and clinical traits (uc vs normal).(E) Scatter diagrams for the relationship of gene significance (GS) and module membership (MM) in the Me dark green module.

Figure 2 .
Figure 2. exploration of 36 ferroptosis-related differentially expressed genes (fr deGs).(A) volcano plot of differentially expressed genes (deGs) between uc and normal samples in GSe87466 dataset.(B) Heatmap of differential gene expression in two groups.(C) venn diagrams for 36 fr deGs by overlapping module genes, deGs, and 264 ferroptosis-related genes (frGs).(D) Bar chart for gene ontology (Go) and (E) circle diagram for Kyoto encyclopedia of genes and genomes (KeGG) analysis of 36 fr deGs.

Figure 4 .
Figure 4. collection of five characteristic genes with diagnostic values for uc.(A) receiver operating characteristic (roc) curves for the predictive performence of five characteristic genes distinguishing the uc and normal samples in GSe87466.(B) roc curves of five characteristic genes in GSe75214.(C) five characteristic genes-based nomogram for clinical utilize and the calibration curve.

Figure 6 .
Figure 6.ingenuity Pathway analysis (iPa).(A) heatmap for the activited or inhibited canonical pathways.(B) Bar chart of enriched canonical pathways, where orange represents activation, blue represents inhibition.(C) the significantly enriched pathway maps targeting to characteristic genes.

Figure 8 .
Figure 8. the action map of small molecule compounds targeting key biomarkers.