Prognostic value and potential mechanism of long non-coding RNA Lnc-SMIM20-1 in acute myeloid leukemia

ABSTRACT Objectives Acute myeloid leukemia (AML) is a common hematologic malignancy with high heterogeneity and poor prognosis. Although long non-coding RNAs (lncRNAs) have been used as biomarkers for tumors, the clinical relevance of numerous lncRNAs in AML remains to be investigated. Research design and methods Differentially expressed lncRNAs between AML and normal peripheral blood samples were identified using DESeq2. Pan-cancer analysis was performed by GEPIA tool. Kaplan–Meier survival curve was applied for prognosis analysis. KEGG pathway analysis and GSEA were used for functional enrichment. The ceRNA network was constructed by GDCRNAtools. Results Lnc-SMIM20-1 was most highly expressed in AML and up-regulated in the TCGA-AML cohort compared to normal tissues. Patients with high expression of Lnc-SMIM20-1 had poor overall prognosis both in the TCGA adult AML cohort and the TARGET pediatric AML cohort, no matter whether they were treated with chemotherapy or allo-HSCT. Lnc-SMIM20-1 might participate in cancer-associated signaling pathways and immune-related signaling pathways by interacting with four microRNAs and 20 mRNAs. Conclusion Lnc-SMIM20-1 was up-regulated in AML acting as a stable poor prognostic factor. The prognostic impact of Lnc-SMIM20-1 cannot be overcome by allo-HSCT. Our findings provide insight into the clinical relevance of Lnc-SMIM20-1 in AML; aiming to progress the development of novel therapeutics.


Background
Acute myeloid leukemia (AML) is a clonal and malignant proliferative disease of the hematopoietic system with poor prognosis and high mortality, and it is the most common acute leukemia in adults. Currently, many prognostic molecular markers for AML have been well established [1]. For example, mutations of the NPM1 gene, encoding for a nucleolar multifunctional protein, occur in over 30% of patients with AML. 2017 European Leukemia Net (ELN) have declared that AML with NPM1 mutations was associated with good prognosis [2,3]. FLT3-ITD is a common driver mutation that presents with a high leukemic burden and confers a poor prognosis in patients with AML [4,5]. AML patients with biallelic mutations of CEBPA display a favorable clinical outcome and are defined as a unique entity in the 2016 World Health Organization classification [6,7]. Besides, Somatic or germline mutant RUNX1 is associated with poorer outcomes in AML [8][9][10]. In recent years, various studies demonstrated that long non-coding RNAs (lnc-RNAs) act as oncogenes or tumor suppressors in cancer [11,12]. In leukemia, numerous studies have illuminated that lncRNA PVT1 played a crucial role in hematologic malignancy cell proliferation and apoptosis [13]. LncRNA CCAT1 was a regulator in AML by modulating miR-155 availability [14]. LncRNA HOTAIR was overexpressed in AML and associated with poor prognosis [15,16]. LncRNA MALAT1 regulates migration, proliferation, and apoptosis by sponging miR-146a to regulate CXCR4 expression in AML cells [17]. However, the oncogenic roles and the potential mechanism of plenty of lncRNAs remain to be elucidated. In this study, we found that the expression of lncRNA Lnc-SMIM20-1 was dysregulated in AML and associated with the poor prognosis of AML. We preliminary unveiled that Lnc-SMIM20-1 might be related to multiple clinically actionable signaling pathways in AML, including the PI3K-Akt signaling pathway, JAK-STAT signaling pathway, and MAPK signaling pathway. Our findings provide insightful understanding of Lnc-SMIM20-1 in AML and may contribute to AML diagnosis and the development of therapeutics.

Data collection
The complete transcriptome profile, microRNA expression profile, and clinical data of 151 AML patients were downloaded from The Cancer Genome Atlas (TCGA) database (https://por tal.gdc.cancer.gov Data released in 2019). Besides, the transcriptome profiles of other 32 types of cancer were also downloaded from the TCGA database. As verification, 187 patients with pediatric Leukemia from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database were also enrolled in this study, and the transcriptome profile and clinical data were available on the TARGET website (https://ocg.cancer.gov/programs/target Data released in 2019). The transcriptome profile of 70 normal peripheral blood samples was downloaded from the Genotype-Tissue Expression Project (GTEx) database (https:// gtexportal.org/ Data released in 2016).

Selection of prognostic lncRNA in CN-AML
Firstly, DESeq2 [18] was used to perform differentially expressed genes analysis between TCGA-AML cohort and the normal peripheral blood samples in the GTEx dataset [19][20][21], and 8594 up-regulated lncRNAs were identified for further analysis (Supplementary Table S1). Secondly, the expression of lncRNA was performed in pan-cancer, and 6847 lncRNAs were selected as the most highly expressed lncRNAs in AML across 33 types of cancer (Supplementary Table S2). And then, we overlapped the lncRNAs between two sets of lncRNAs mentioned above, and 4194 lncRNAs were extracted (Supplementary Table S3). Thirdly, univariate analysis was performed in both the TCGA-AML cohort and the TARGET-AML cohort to identify the prognostic lncRNAs and 102 lncRNAs were identified (Supplementary Table S4). Subsequently, we divided the TCGA-AML cohort into two groups based on the different treatments (chemotherapy and allo-HSCT). And then, a univariate analysis was performed to identify the prognostic lncRNAs in both patients who received chemotherapy and underwent allo-HSCT. Finally, only Lnc-SMIM20-1 (ENSEMBL: ENSG00000250541) was selected (Supplementary Table S5). Therefore, we considered that Lnc-SMIM20-1 could be the potential prognostic indicator in AML. A flow chart of this process was displayed in Supplementary Figure S1.

Gene Expression Profiling Interactive Analysis (GEPIA)
The GEPIA database (https://gepia.cancer-pku.cn) is a comprehensive resource for the systematic analysis of gene expression [22]. This database includes 9736 tumor and 8587 normal tissue samples from the TCGA and the Genotype-Tissue Expression (GTEx) projects. In the function module, we used the Ensembl Gene ID of Lnc-SMIM20-1: ENSG00000250541 to perform the single-gene analysis and the threshold was selected as the default value.

Gene expression analysis
The log counts per million (logCPM) that were transformed from raw counts by R voom package were used to measure the expression of genes. Comparison of Lnc-SMIM20-1 expression between the TCGA-AML cohort and GTEx cohort was analyzed by DESeq2(|FC|≥1.5, p ≤ 0.01). All statistical analyses were performed using R version 4.0.

Prognosis analysis
After rejecting 15 M3 patients, a total of 136 patients were divided into two groups (Lnc-SMIM20-1 high and Lnc-SMIM20 -1 low ) based on the median expression of Lnc-SMIM20-1. Kaplan-Meier analysis and log-rank test were performed to compare the overall survival and the event-free survival between the two groups. Multivariable analysis was conducted to demonstrate whether the expression of Lnc-SMIM20-1 was an independent factor for prognosis in AML. The p-value <0.05 was considered significant.

Competing endogenous RNAs (ceRNA) network construction and analysis
Differentially expressed microRNAs and mRNAs selected from the DEGs were used to construct the Lnc-SMIM20-1 ceRNA network, which was constructed by GDCRNAtools [24] and visualized by 'igraph' R pacakge. In the ceRNA network, three criteria are used to determine the competing endogenous interactions between lncRNA-mRNA pairs by GDCRNAtools: the lncRNA and mRNA must share a significant number of miRNAs, expression of lncRNA and mRNA must be positively correlated, and those common miRNAs should play similar roles in regulating the expression of lncRNA and mRNA. So ceRNA network was constructed satisfying the following requirements: hyperPValue > 0.01, corPValue > 0.01, regSim = 0. miRcode was used to collect predicted and experimentally validated lncRNA targets. StarBase v2.0 was used to predict miRNA-mRNA interactions. Next, functional analysis based on the ceRNA was performed and noted using Metascape (https://metascape.org/) [25]. The least absolute shrinkage and selection operator (LASSO) Cox regression was conducted to establish the prognostic model based on the Lnc-SMIM20-1 related ceRNA network by using the R package glmnet [26,27]. The performance of the prognostic model was evaluated in the TCGA-AML cohort by Kaplan-Meier OS curve analysis and receiver operating characteristic (ROC) analysis.

Immune infiltrate analysis
The CIBERSORT tool (https://cibersort.stanford.edu/) is an analytical tool to provide an estimation of the abundances of member cell types in a mixed cell population by using gene expression data. We estimated the immune cell infiltration in the TCGA-AML cohort by using the CIBERSORT. Pearson's correlation analysis was used to evaluate the relationship between Lnc-SMIM20-1 and immune cells.

Lnc-SMIM20-1 overexpression in AML
To figure out whether lnc-SMIM20-1 play important role in AML specifically, GEPIA online tool was applied to perform the pan-cancer analysis. The results showed that Lnc-SMIM20 -1 was majorly over-expressed in Esophageal carcinoma (ESCA) tumor tissues, Stomach adenocarcinoma (STAD) tumor tissues, and Acute Myeloid Leukemia (LAML). Notably, Lnc-SMIM20-1 was the most highly expressed in AML (Figure 1(a)). Subsequently, the expression of Lnc-SMIM20-1 was up-regulated in the TCGA-AML cohort compared with the normal peripheral blood samples in the GTEx cohort ( Figure 1 (b), p < 0.0001). And then, AML patients were stratified into favorable, intermediate, and adverse groups using the ELN stratification system (2017 version). The significant difference of expression level of lnc-SMIM20-1 between these three groups was found by ANOVA ( Figure 1(c), p=3e-06). And then, we further identified that the favorable group was associated with comparatively low expression of Lnc-SMIM20-1 using the Tukey HSD test method, which was validated using the Least -Significant Difference (LSD) test method (Supplementary Table S6). Next, we evaluated Lnc-SMIM20-1 expression among TCGA-AML patients with cytogenetics subtypes including RUNX1-RUNX1T1, CBFβ-MYH11, KMT2A rearrangements, PML-RARA, BCR-ABL1, and complex, as well as the normal karyotype. The significant difference of expression level of lnc-SMIM20-1 between these seven groups was also found by ANOVA (Figure 1(d), p = 0.00016). And then, we further found that expression of Lnc-SMIM20-1 was significantly low in patients with CBFβ-MYH11 comparing to those in the other subgroup, while patients with KMT2A rearrangements showed significantly high expression of Lnc-SMIM20-1 comparing to those with RUNX1-RUNX1T1, CBFB-MYH11, PML-RARA, or normal karyotype using the Tukey HSD test method, which were also validated using the LSD test (Supplementary  Table S7). In addition, we performed the same process in the TARGET pediatric AML cohort. The result showed that there is significant difference between the expression level of lnc-SMIM20-1 in patients with different cytogenetics subtype (Supplementary Figure S2, p = 2.6e-06). Pediatric AML patients with CBFB-MYH11 had significantly lower expression of Lnc-SMIM20-1 comparing to those with normal karyotype or other, while patients with KMT2A rearrangements had significantly higher expression of Lnc-SMIM20-1 compared to those with CBFB-MYH11 or t(8;21) (Supplementary Table S8), which was consistent with the results in the TCGA-AML cohort.

The prognostic value of Lnc-SMIM20-1 in AML
AML patients were separated into Lnc-SMIM20-1 high group and Lnc-SMIM20-1 low group based on the median expression of Lnc-SMIM20-1. Subsequently, Kaplan-Meier survival curves showed that Lnc-SMIM20-1 high group had to decrease OS (Figure 2(a), p = 0.000628) and EFS (Figure 2(b), p = 0.0022) compared with Lnc-SMIM20-1 low group. As validation, another independent cohort including 187 patients with pediatric Leukemia in TARGET was employed. The survival curve showed no difference in the outcomes between the TARGET-AML cohort and the TCGA-AML dataset. The OS and EFS of the TARGET AML patients with high expression of Lnc-SMIM20-1 were worse than that of patients with low expression of Lnc-SMIM20-1 (Figure 2(c,d); p = 0.00204, p = 0.0469).
Afterward, multivariate Cox regression analysis showed that the HR of the expression of Lnc-SMIM20-1 and 95% confidence interval (CI) were 1.68 and 1.07-2.65 (Figure 3(b), p = 0.025); meanwhile, CBFB-MYH11 mutation was also associated with prognosis in AML (Figure 3(b), HR = 0.30, 95%CI = 0.09-0.99, p = 0.049). The details were showed in Supplementary Table S10. Besides, owing to the above result indicating that AML with KMT2A rearrangement had higher expression level of lnc-SMIM20-1 compared to other subtypes of AML, we included KMT2A rearrangement in the multivariate analysis to further validate the independence of lnc-SMIM20-1 on the prognosis in AML and the result also indicated that expression level of lnc-SMIM20-1 was an independent poor prognostic factor in AML (Supplementary Figure S3).

Comparison of the Lnc-SMIM20-1 expression in AML patients received different treatment
Subsequently, the TCGA-AML cohort was divided into the chemotherapy-only group including 70 AML patients who only received chemotherapy, and the allo-HSCT group including 66 AML patients who underwent allo-HSCT. In the chemotherapyonly group, AML patients with high Lnc-SMIM20-1 expression were associated with worse OS (Figure 4(b), p = 0.00841) and EFS (Figure 4(b), p = 0.00107). In the allo-HSCT group, the OS of AML patients with high expression of Lnc-SMIM20-1 was shorter than that of AML patients with low expression of Lnc-SMIM20-1 (Figure 4(c), p = 0.0118); while the EFS of AML patients had no significant difference between high-and low-expression of Lnc-SMIM20-1 (Figure 4(d), p = 0.115). In addition, we found 16 patients in the TCGA-AML cohort that received decitabine or azacitidine which were hypomethylating agents. Hypomethylating agent (HMA) strategies are also often recommended and used as frontline treatment approaches in elderly patients [28,29], so we divided the AML patients that received hypomethylating agents into high and low-expression groups based on the median expression level of lnc-smim20-1 after excluding 2 patients without expression data. And then, we compared the overall survival between the two groups using Kaplan-Meier survival analysis. The result showed no significant difference between the overall survivals of the two groups (Supplementary Figure S4). The possible reason for this result was the small size of patients (n = 14).

Potential Mechanism of Lnc-SMIM20-1 in AML
Differentially expressed genes (DEGs) analysis was performed between Lnc-SMIM20-1 high group and Lnc-SMIM20-1 low group and 361 up-regulated genes and 157 down-regulated genes were found ( Figure 5(a), Supplementary Table S11). KEGG pathway analysis showed that the DEGs were enriched in Hematopoietic cell lineage, T cell receptor signaling pathway, Th1 and Th2 cell differentiation, Focal adhesion, Rap1 signaling pathway, Th17 cell differentiation, Cell adhesion molecules, Primary immunodeficiency, and NF-kappa B signaling pathway ( Figure 5(b)). Besides, GSEA results further indicated that high expression of lnc-SMIM20 -1 was associated with the activation of PI3K-Akt signaling pathway, JAK-STAT signaling pathway, Cytokine-cytokine receptor interaction, T cell receptor signaling pathway, Th1 and Th2 cell differentiation, and Th17 cell differentiation ( Figure 5(c-h), Supplementary Table S12). Notably, many enriched signaling pathways were closely related to tumor occurrence and development, such as NF-kappa B signaling pathway, PI3K-Akt signaling pathway, and JAK-STAT signaling pathway. Besides, some enriched pathways were related to immunoregulation, such as the T cell receptor signaling pathway, Th1 and Th2 cell differentiation, Th17 cell differentiation, and primary immunodeficiency. Moreover, immune infiltrate analysis was also performed to further indicate the association between Lnc-SMIM20-1 and immune cell infiltration in AML. The result showed that high expression of Lnc-SMIM20-1 was associated with less infiltration of B cell memory, T cell gamma delta, and myeloid dendritic resting cell. Besides, the expression of Lnc-SMIM20-1 was positively correlated with T cell CD4 memory activated (Supplementary Figure S5).

Discussion
LncRNAs are members of non-coding RNA longer than 200 nt in length, which is present in large numbers in the genome. lncRNAs have been thought to be only 'transcriptional noise' because many lncRNAs are expressed very low [30]. Along with increasing studies about the function of lncRNAs, lncRNAs were found to act as activators, decoys, guides, or scaffolds for their interacting protein, DNA, and RNA to participate in almost all the biological processes. Besides, lncRNAs may be involved in almost all human cancers by impacting cancer cell proliferation, invasion, and metastasis [31]. Hence, lncRNAs have the potential of bringing new insights in the clinic for diagnosis, prognosis, and therapy. In this study, we found that a novel lncRNA Lnc-SMIM20-1 that had not been investigated before was up-regulated in AML, indicating inferior prognosis in both the TCGA adult AML cohort and the TARGET pediatric AML cohort. Besides, we also found that lnc-SMIM20-1 was the most highly expressed in AML across 33 types of cancer, which indicated that lnc-SMIM20-1 might be a unique indicator for the prognosis in AML. Nowadays, some clinical characteristics have been clarified as prognostic indicators clinically. For example, RUNX1-RUNX1T1 is one of the most common genetic abnormalities in AML that have been clarified as a good prognostic factor for AML [32,33]. AML with inv(16)(p13.1q22) resulting in CBFB-MYH11 fusion is associated with a favorable prognosis [34]. Rearrangements of the KMT2A gene are found in 10% of adult AML which had poor prognosis [35]. Besides, multiple risk stratification systems have been established [36]. ELN stratification system is the latest and widely accepted system for AML [37]. Our result showed that AML patients classified as 'favorable' by ELN, AML with RUNX1-RUNX1T1 and AML with CBFB-MYH11 had lower expression of lnc-SMIM20-1 while AML with KMT2A rearrangements had higher expression of lnc-SMIM20-1, which further indicated that lnc-SMIM20-1 could be a promising prognostic indicator in AML. Moreover, the multivariable analysis also indicated that lnc-SMIM20-1 expression was an independent unfavorable prognosis factor in AML.
It is well acknowledged that chemotherapy and allo-HSCT are two major therapies for patients with AML. Chemotherapy is the main treatment for most of AML patients and allo-HSCT remains the first-line treatment for poor and very-poor-risk patients and yields a high rate of curability for AML [38,39]. Therefore, our further analysis showed that the prognosis impact of Lnc-SMIM20-1 cannot be eliminated by allo-HSCT, which indicated that the expression level of lnc-SMIM20-1 could be a stable prognostic indicator for AML even after treatment with allo-HSCT. To further explore the potential function of lnc-SMIM20-1 in AML, we performed functional analysis and we found that Lnc-SMIM20-1 may participate in the PI3K-Akt signaling pathway, the JAK/STAT signaling pathway, and the NF-kappa B signaling pathway in AML. Numerous studies have demonstrated that the PI3K-Akt signaling pathway [40], the JAK/STAT signaling pathway [41], and the NF-kappa B signaling pathway [42] are frequently activated in acute myeloid leukemia (AML) patient blasts and strongly contribute to proliferation, survival, and drug resistance of these cells. These three signaling pathways  have been the target for AML treatment and corresponding inhibitors, such as JAK inhibitors, can be the potential candidate combination therapies for precision medicine in AML [43]. Therefore, our finding suggested that the expression level of lnc-SMIM20-1 might be the potential predictor for the response of some inhibitors, such as JAK inhibitors in AML.
In addition, immunotherapy is a promising therapy for cancer patients, but the clinical activity for immunotherapy in AML is very limited. Thus, many immunotherapy-based combined therapies have been developed. In this study, we found that Lnc-SMIM20-1 might participate in multiple immune-related signaling pathways, such as Th1 and Th2 cell differentiation and Th17 cell differentiation. Th1, Th2, and Th17 are three important CD4 + T helper cells in immune regulation. CD4+ helper T cells promote priming and both the effector functions and the memory functions of cytotoxic T lymphocytes (CTLs) and help CTLs to overcome negative regulation, so Enhancing CD4 + T cell help to improve CTL responses in cancer immunotherapy [44]. Therefore, our findings suggested that the expression of Lnc-SMIM20-1 might be associated with the activation of immune-related pathways, which further indicated that lnc-SMIM20-1 might be the potential target for combined therapy helping the improvement of the clinical application of immunotherapy in AML.
LncRNAs regulate gene expression through various mechanisms including sponging microRNA. To further explore how lnc-SMIM20-1 plays its role in AML, we constructed the lnc-SMIM20-1 centric ceRNA network. We found that Lnc-SMIM20-1 interacted with four microRNAs (miR-190b, miR-20b, miR-150, let-7b) directly and was associated with 20 mRNAs, such as RBFOX2, ZNF831, PTK2, and HIVEP3. A study has demonstrated that high expression of miR-20b was associated with poor prognosis in AML [45]. Numerous studies have suggested that downregulation of miR-150 was required for leukemogenesis [46,47]. Additionally, a study has revealed that let-7b played important role in t(8;21) leukemogenesis and maintenance by downregulating AML1-ETO oncogene expression [48]. Although miR-190b and the mRNAs in the network have been reported as the promoter for the proliferation and progression in multiple tumors, the role of these genes in AML has not been reported yet. Our findings indicated that these genes might also play important role in AML, which provided ideas for further study of the potential role of these genes in AML. Besides, according to the ceRNA hypothesis, our findings indicated that lnc-SMIM20-1 might be the target for miR-190b, miR-20b, miR-150, and let-7b, which provided new clues for the molecular mechanism of these microRNAs in AML as well.
Even so, there are some limitations in our study. For example, this study is retrospective research and still needs prospective analysis. In addition, we preliminary unveiled the potential mechanism of Lnc-SMIM20-1 in AML, but the biological mechanism has not been fully elucidated. Therefore, these shortcomings may be overcome through external experiments in the future. In conclusion, we found that lncRNA Lnc-SMIM20-1 that was highly expressed in AML can be a potential biomarker of the prognosis of AML. We also found that Lnc-SMIM20-1 was associated with clinically actionable signaling pathways and immune-related pathways, providing new ideas for the diagnosis and treatment of AML.

Conclusion
High expression of Lnc-SMIM20-1 was found in AML and associated with poor prognosis in AML. The prognostic impact of Lnc-SMIM20-1 cannot be overridden by allo-HSCT, which provides potential new directions for AML therapy.