Systems biology approach delineates critical pathways associated with disease progression in rheumatoid arthritis

Abstract Rheumatoid Arthritis (RA) is a chronic systemic autoimmune disease leading to inflammation, cartilage cell death, synoviocyte proliferation, and increased and impaired differentiation of osteoclasts and osteoblasts leading to joint erosions and deformities. Transcriptomics, proteomics, and metabolomics datasets were analyzed to identify the critical pathways that drive the RA pathophysiology. Single nucleotide polymorphisms (SNPs) associated with RA were analyzed for the functional implications, clinical outcomes, and blood parameters later validated by literature. SNPs associated with RA were grouped into pathways that drive the immune response and cytokine production. Further gene set enrichment analysis (GSEA) was performed on gene expression omnibus (GEO) data sets of peripheral blood mononuclear cells (PBMCs), synovial macrophages, and synovial biopsies from RA patients showed enrichment of Th1, Th2, Th17 differentiation, viral and bacterial infections, metabolic signalling and immunological pathways with potential implications for RA. The proteomics data analysis presented pathways with genes involved in immunological signaling and metabolic pathways, including vitamin B12 and folate metabolism. Metabolomics datasets analysis showed significant pathways like amino-acyl tRNA biosynthesis, metabolism of amino acids (arginine, alanine aspartate, glutamate, glutamine, phenylalanine, and tryptophan), and nucleotide metabolism. Furthermore, our commonality analysis of multi-omics datasets identified common pathways with potential implications for joint remodeling in RA. Disease-modifying anti-rheumatic drugs (DMARDs) and biologics treatments were found to modulate many of the pathways that were deregulated in RA. Overall, our analysis identified molecular signatures associated with the observed symptoms, joint erosions, potential biomarkers, and therapeutic targets in RA. Communicated by Ramaswamy H. Sarma


Introduction
Rheumatoid arthritis (RA) is a progressive chronic autoimmune disease of the joint synovium. RA affects about 1% of the population worldwide (Almutairi et al., 2021). Despite considerable efforts, a cure has been elusive for RA, and could only be managed (Shao et al., 2017;Smolen et al., 2018;Wasserman, 2011). The inflammatory response resulting in joint destruction due to proliferation of synoviocytes and destruction of bone and cartilage is orchestrated by cells of innate and adaptive immunity through an array of cytokines (Edilova et al., 2020;Fang et al., 2020;Goldring & Gravallese, 1999;Guo et al., 2018;Karmakar et al., 2010).
The disease is multi-systemic and heterogeneous, and various genetic, epigenetic, environmental and lifestyle factors contribute to the risk for RA (Filippo et al., 2011;Scott et al., 2010). Studies have helped to include antibodies against citrullinated self-proteins in the diagnosis of RA (Niewold et al., 2007;Schoels et al., 2011). The current treatment regimen is restricted to modulating inflammation, patient assessment of disease activity, and functional disability (Demoruelle & Deane, 2012). It also correlates with a favourable prognosis (Parisi et al., 2020).
The inflammatory response in RA and the involvement of different cytokines were studied as a function of disease activity and drug response (Mateen et al., 2016(Mateen et al., , 2017Moon et al., 2012;Yap et al., 2018). Cytokines play a combinatorial role in mediating the overlapping responses of innate and adaptive immunity orchestrating stromal response, disease onset, persistence, and prognosis (Kany et al., 2019;Nevius et al., 2016). The role of TNFa and IFNG in RA is reiterated by the benefit accrued upon their inhibition (Ogata et al., 2019;Vasanthi et al., 2007). Similarly, inhibition of granulocytemacrophage colony-stimulating factor (GM-CSF) in phase II clinical trials is as effective as TNF blockade (Jimeno et al., 2015;Lotfi et al., 2019). An overlapping role for IL-17 with TNF and IL6 has been conceived, which also correlates with disease activity (Al-Saadany et al., 2016;Mateen et al., 2017). Despite the presence of functionally active IL-1 family members, inhibition of IL-1 has not been clinically effective (Mertens & Singh, 2009). Type I interferons are potentially useful as prognostic markers in RA (Muskardin & Niewold, 2018). In addition, the B cells can also secrete cytokines like TNF, IL-6, IL-1, and RANKL (Fillatreau, 2018).
The emergence of omic techniques has helped to gain a holistic understanding of RA by studying patient samples, in vitro cell culture, and animal models of diseases (Carlberg et al., 2017;Manzoni et al., 2018;Whitaker et al., 2015). Transcriptomic studies have mainly focused on understanding the contributions of different cells such as PBMCs or the cell subsets like CD4þ, B cells, Neutrophils, etc., or tissues to disease pathology as well as the effect of different treatments (Casamassimi et al., 2017;Tasaki et al., 2018). Transcriptomic analysis has helped to discern perturbations in various pathways as well as the association of type I interferon signature genes with clinical response (poor prognosis) to DMARDs which include TNF-alpha inhibitor, tocilizumab, and rituximab (de Jong et al., 2019;Rodr ıguez-Carrio et al., 2018;Sellam et al., 2014) However, gene signatures from transcriptomics limits its use in a clinical setting as the response like type I IFN is common to many autoimmune diseases (Rodr ıguez-Carrio et al., 2018).
Systems analysis of RA patients stratified based on Chinese medicine have conjured biomarkers for subtyping RA (van Wietmarschen et al., 2009;. A study by Tavasolian et al. (2021), with a systems biology approach, showed a correlation between negative expression of Vit-D receptor and the miR-125b in RA. Studies have also focused on the development of platforms that help to systematically and quantitatively predict responses to drugs and their raking (M. Zhang et al., 2019). Integrated multi-omic analysis and validation have been used to understand other diseases like avascular necrosis of the femoral head, glaucoma, amyotrophic lateral sclerosis, and liver disease (Naik et al., 2020;Narayanan et al., 2017;Pulukool et al., 2021Pulukool et al., , 2022Saha et al., 2021;Sai Swaroop et al., 2022). An integrated analysis of all the omic data sets concomitant with its correlation with pathology could help identify markers for patient stratification, prognosis, and potentially newer therapeutic targets in disease.
Herein, we have carried out an integrated data analysis of SNP, transcriptomic, proteomic, and metabolomic data of RA patients, model systems, and cell culture models. In particular, the significant genes from SNP analysis, significant differential genes in transcriptomics or proteins from proteomic data sets, or metabolites from metabolomic data sets are grouped into pathways. Further, for SNP analysis we have validated the functional implication of SNP on gene function. The various transcriptomic networks generated using significant differentially expressed genes have been used to predict important kinases and transcription factors. The results are discussed in the light of their implication for the disease process, identification of biomarkers, and therapeutic targets.

Methodology
We have employed a systems biology approach with an integrated analysis by different software-based and online tools to analyze SNPs associated with RA, publicly available data of transcriptomics, proteomics, and metabolomics to target complex diseases like RA and arrive at the possible biomarkers and therapeutic targets. Here is an overview of the methodology followed as a schematic ( Figure 1).

SNP analysis
RA is associated with various single nucleotide polymorphisms (SNPs) which contribute to its pathophysiology. In the current study, we have collated exonic and non-synonymous SNPs associated with RA from a repository that integrates all the RA-related SNPs from 686 publications called the database of rheumatoid arthritis-related polymorphisms (RADB) (http://www.bioapp.org/RADB/) for SNP analysis of RA . SNPs with rs IDs were taken and analysis was performed for 147 SNPs with amino acid change.
SIFT, a web tool that predicts the impact of coding variants on protein function was used for SNP analysis. In the current study, we analyzed a list of 147 known SNPs with aminoacid substitution curated from RADB to run in SIFT as a batch. The impact of the SNP is labeled as tolerant (if the SIFT score is <0.05), or damaging (deleterious) (if the SIFT score is >0.05) in the result. Next, polyphen-2, a web tool that predicts whether an SNP is damaging to the function of the protein by comparing it with the wild type, was used. Polyphen-2 analyses and assigns that an SNP is benign, possibly damaging, and probably damaging. The polyphen-2 score ranges from 0.0 (tolerated) to 1.0 (deleterious). Variants with scores of 0.0 are predicted to be benign. Values closer to 1.0 are more confidently predicted to be deleterious. The score can be interpreted as follows: 0-0.15benign; 0.15-1.0possibly damaging; 0.85 to-1.0damaging. The functional implications of the exonic SNPs were also validated using a literature mining approach. Mutation assessor tool (http://mutationassessor.org/r3/) was used to determine the functional impact of the exonic SNPs. This tool predicts the functional impact based on the evolutionary conservation of the amino acid change in proteins. dbSNP id or the protein symbol with amino acid change as two columns is given as input and the result of the analysis gives a functional impact score. Functional impact combined score ranges from À5.2 to 6.5 with impact represented as low, medium and high.
Further, we analyzed the impact of SNPs using SNPeffect, a database interface that allows users to search SNPs by filtering on molecular phenotypic effects, mutation type, disease, UniProt identifier, dbSNP identifier, and gene name. Molecular phenotypic effects include changes in aggregation tendency (dTANGO), amyloid formation (dWALTZ), chaperone binding (dLIMBO), and structural stability change upon mutation (ddG). A mutation is classified as 'deleterious' if the dTANGO, dWALTZ and dLIMBO scores are above 50 and for the ddG if it is more positive or negative. Finally, for SNP analysis, we used SNP Nexus, a web-based computational tool that provides functional annotation for both novel and known SNPs. Possible effects on the transcriptome and proteome levels are characterized and reported from five major annotation systems providing the most extensive information on alternate splicing. Additional information on HapMap genotype and allele frequency overlap with potential regulatory elements, or structural variations, as well as genetic diseases, can be retrieved. The SNP Nexus can take individual or batch SNPs for analysis. SNP Nexus is the only database that provides a complete set of functional annotations. Furthermore, we probed the association of various SNPs with various clinical parameters associated with RA.

Transcriptomics data analysis
Transcriptomics datasets were curated from the public GEO database with a search for specific cell types such as "macrophages, chondrocytes, synoviocytes, fibroblasts, osteoblasts, and rheumatoid arthritis" and only those GEO datasets were selected where the differential gene expression analysis could be carried out by GEO2R tool for gene expression data. We have analyzed the gene expression datasets with GSE IDs -GSE15573 (RA PBMCs) (Teixeira et al., 2009), GSE97779 (Kang et al., 2017), and GSE10500 (RA synovial macrophages) (Yarilina et al., 2008), and GSE77298 (synovial biopsies) (Broeren et al., 2015).

GEO2R analysis
GEO2R tool was used to compare two or more sample groups in a GEO dataset based on their fold change to identify the differentially expressed genes (DEGs) across different experimental or clinical conditions. In the current study, GEO datasets such as GSE15573 (18 RA patients & 15 healthy controls), GSE97779 (5 control macrophages and 9 RA synovial macrophages), GSE10500 (3 healthy donors macrophages and 5 RA patients synovial macrophages) and GSE77298 (7 microarrays of synovial biopsies and 16 microarrays of end-stage RA synovial biopsies) were analyzed and the samples are defined into two or more experimental or clinical groups (Table 1). For a two-group comparison, typically it is appropriate to define the test group first, and then define the control group. The fold change between the groups follows the convention of positive for genes upregulated and negative for genes downregulated in test samples when compared to controls. After the samples have been assigned to the groups, running the GEO2R analysis with default parameters gives fold changes of the top 250 DEGs between Figure 1. Overview of the systems approach used for identification of potential biomarkers and therapeutic targets in RA the test group and the control group ranked based on their p-value. The results with columns such as gene ID (ID) in the instrument, adjusted P-value (adj.P.Val), P-Value, relative fold change (logFC) of expression levels, and the respective gene symbol, were downloaded as a CSV (comma-separated values) format and the DEGs are sorted based on the cut-off with adjusted p-value less than 0.05 and fold change of gene expression -logFC >2 were taken for further analyses.

Enrichment analysis
Gene set enrichment analysis (GSEA) by WEBGSTALT tool Gene set enrichment analysis (GSEA) was carried out using the web-based gene set analysis toolkit (WEBGESTALT) (http://www.web gestalt.org/), a web-based tool to analyze functional enrichment of the differentially expressed genes through over-representation analysis (ORA), gene set enrichment analysis (GSEA), and network topology-based analysis (NTA) . For GSEA-based analysis, we applied basic parameters such as organism-human, method of interest as GSEA, and the functional database as KEGG. The data input was given as the list of gene names and the fold changes of the respective genes as two columns. GSEA analysis was performed with default parameters such as the number of genes for a category ranging from 5 to 2000, significance level as top 10 genes, 1000 permutations, mean as collapse method, enrichment statistic as 1, and categories expected from set cover as 10.

Gene enrichment analysis by Enrichr
Gene enrichment analysis was carried out using enrichr (https://maayanlab.cloud/Enrichr/), for the list of differentially expressed genes that are significant (with adjusted p-value 0.05) from the GEO2R analysis of the GEO datasets (Chen et al., 2013). Enrichr analyses these genes based on the backend databases such as KEGG, WIKI and reactome pathway databases, disgenet database, and gene ontology (GO) database for biological process and identify the enriched pathways and diseases these genes belong to. The results were based on the p-value ( 0.05) and are shown as a bar graph with the pathways, biological processes, and diseases and then by a clustergram with a mention of the genes and the expression of the respective pathway as a heatmap.
iLINCS server iLINCS (http://www.ilincs.org/ilincs/) is an integrated web platform to analyze gene expression data and signatures from GEO Datasets. This platform gives the option to analyze the gene enrichment networks, associated transcription factors, and kinases from the input of gene expression data. We used the X2K analysis tool of iLINCS to identify the critical transcription factors and kinases involved or associated with the DEGs of selected GEO datasets. Target genes of these transcription factors and kinases were also collated through X2K analysis and the potential pathways that they are involved in were identified using the network analyst tool (https://www.networkanalyst.ca/).

Proteomics data analysis
To understand the dynamics at the protein level, pathway analysis of the published datasets of proteomic analyses in RA patients was carried out. Two published serum proteomics studies (Mun et al., 2018;Vasko et al., 2016) of RA patients were selected for the identification of potential pathways that can contribute to the pathophysiology of RA. These two datasets were identified as they represented the analysis of serum samples of RA patients without any subcategories of the disease or any treatment. Significant proteins with a fold change greater than 2.0 and p-value 0.05 were for analysis. 40 differentially expressed proteins (DEPs) in RA patients from proteomics data of the study by Mun et al. and 25 DEPs in RA patients from the study of Vasko et al. were used for the analysis. Pathway analysis was carried out using the network analyst with protein-protein interaction (PPI) networks. A list of proteins was given as input to the network analyst, with homo sapiens as an organism, and was analyzed through an IMEx Interactome database as a reference. The minimum order network was used for visualization and the KEGG database was used for the identification of significant pathways.

Metabolomic data analysis
RA Serum metabolomics studies can give very critical insights into the pathophysiology of RA as the metabolites are the collective termini of all the changes in genomics, transcriptomics, and proteomic profiles. Metabolomic studies with metabolite levels in patients could be a better tool to predict potential biomarkers and therapeutic targets for RA. We analyzed three published serum metabolomic datasets of RA patients (J. Li et al., 2018;Madsen et al., 2011;Zhou et al., 2016). These three datasets were identified as they represented the analysis of serum samples of RA patients without any secondary complications of the disease or any treatment. These studies showed similar pathways irrespective of the study setting, population, and techniques used for the study. Metaboanalyst 4.0, a web-based tool (Chong et al., 2018) was used to analyze the pathways the metabolites from these studies were involved in, and also metabolite enrichment analysis was carried out to identify the critical metabolic pathways associated with RA.

Analysis of SNPs associated with RA shows critical pathways associated with disease progression
The genes harbored SNPs in promoter, exonic, Intronic, 5' or 3' UTR regions formed an interaction network and ClueGO gene ontology analysis identified pathways in which these genes were involved (Supplementary Table 1). The pathways that resulted from ClueGO analysis of genes with SNPs include metabolic, signaling, disease pathways, and of the immune system, and so on (Figure 2(a)). Similar pathways were obtained from the analysis of exonic SNPs associated with RA which are provided (Supplementary Table 2). The biological processes that emerged after BINGO analysis include those from immune system process and response, regulation of cytokines and their production, pathogen-associated molecular patterns, host immunity, and so on (Supplementary Figure 1). Major molecular functions include protein, carbohydrate and receptor binding and activity, and enzyme activities (Figure 2(b)). The SNPs mapped onto different chromosomes with their predicted effects such as probably deleterious, potentially deleterious, and benign (Supplementary Figure 2). Our SNP analysis shows the association of SNPs with WBC, platelet and RBC count, CRP and ACPA levels, mean corpuscular volume, hematocrit, hemoglobin levels, homocysteine level, blood protein level, and Ig levels ( Figure 2(c)). The details of the SNPs involved in each of the clinical outcomes are provided (Supplementary Figure  3). Results from mutation assessor analysis present SNPs on genes TLR3, MBL2, and PIT4 to be of high impact while another 22 SNPs on other genes were found to have a medium impact (Supplementary Figure 4). Through literature mining we found that certain SNPs had strong functional implications that can contribute to RA pathophysiology. SNPs such as rs2070600 (G82S) of the RAGE gene with higher ligand affinity promoted the production of pro-inflammatory proteins, and pulmonary fibrosis, and increased the risk of RA (Bowman et al., 2002). RA patients with SNP rs2397084 (E126G) of IL17F gene showed longer disease duration than that of wild type (Pawlik et al., 2016). RA patients with SNP rs17561 (A114S) of the IL1A gene will be processed 100-fold more effectively by the calpain-mediated cleavage which removes the precursor peptide to yield the active IL-1A protein. This SNP can lead to higher production levels of active IL1A (Kawaguchi et al., 2007). SNPs such as rs2372536 (T116S) of the PUR9 gene showed decreased response to methotrexate treatment in individuals possessing this SNP (Owen et al., 2013). The details of the functional implications of other exonic SNPs available from literature are provided in Supplementary Table 3.

Transcriptomic analysis of GEO data sets
Analysis of transcriptomics data sets such as GSE15573, GSE97779, GSE10500, and GSE77298 showed significant genes that are differentially expressed (upregulated and downregulated) in RA patients compared to healthy controls in respective cells and tissues ( Figure 3). The top 10 genes of the GSE15573 dataset of PBMCs of RA patients were FAM110A, RNASE2, S100A8, S100A9, ANXA7, TEX261, ARHGAP17, NUP62, GNE, MYL6B. The top 10 genes of transcriptomics data sets of RA synovial macrophages such as the GSE97779 dataset are IGF1, CRIP1, NQO1, APOBEC3A_B,   GSE15573: PBMCs of RA patients GSEA analysis of transcriptomic data of PBMC (GSE15573) from RA patients shows deregulation of Th1 and Th2 differentiation, enrichment of various viral infections, huntington's, alzheimer's, parkinson's, and non-alcoholic fatty liver disease, purine metabolism, oxidative phosphorylation, and thermogenesis ( Figure 4a). Analysis using Enrichr with the KEGG pathway showed pathways like splicing, Th1, Th2 and Th17cell differentiation, pyruvate metabolism, oxidative phosphorylation, Huntington's disease, and so on (Figure 4b). Enrichr analysis with WIKI pathways shows changes in mRNA processing, one-carbon metabolism, metabolic reprogramming in colon cancer, methylation pathway, adipogenesis, and so on. Enrichr analysis with reactome shows all aspects of mRNA processing and maturation. The biological process includes neutrophil-mediated immunity, its activation, and degranulation, mRNA metabolism, and processing. The disgenet showed asthma, leukemia, lymphoma, granuloma, and so on. Prediction of metabolites shows NAD, NADH, L-serine, methylmalonate semialdehyde, and so on (Supplementary Figure 5).
Later X2K analysis from the iLINCS database of the GSE15573 transcriptomics data resulted in significantly expressed (based on the p-value 0.05) TFs and kinases that can be major role players in RA pathophysiology (Supplementary Figure 6).
Further, a comparison of the significant pathways of the PBMCs dataset of RA patients (GSE15573) to the PBMCs dataset of RA patients treated with methotrexate (GSE35455 MTX) resuted in overlapping pathways that are regulated by the methotrexate treatment ( Figure 4c). 24 pathways are unique to the RA PBMCs dataset and 71 pathways to RA PBMCs dataset with methotrexate treatment (Supplementary Table 4). And there were 16 common pathways between the datasets, of which phagocytosis was one of the most significant pathways which can be altered by the methotrexate treatment. Similarly, between RA PBMCs (GSE15573) dataset and RA PBMCs dataset treated with tocilizumab (GSE35455 TCMB) (Figure 4d), we found 19 pathways that are unique to the GSE15573 dataset and 83 pathways to the GSE35455 TCMB dataset (Supplementary Table 5). And 21 pathways are common between both datasets, of which apart from phagocytosis we could identify Th1 and Th2 cell differentiation, and prion disease were the significant pathways that are regulated by the treatment with tocilizumab.
GSE97779: synovial macrophages of RA patients GSEA analysis of synovial macrophages (GSE97779) showed enrichment of cytokine-cytokine receptor interaction, NODlike receptor signaling pathway, Th17 cell differentiation, JAK-STAT signalling pathway, cell adhesion molecule, etc. The enriched metabolic pathways include glycosaminoglycan biosynthesis, cholesterol, retinol and drug metabolism, and so on (Supplementary Figure 7a).
Enrichr analysis with the KEGG pathway showed pathways like Th1, Th2, and Th17 cell differentiation, osteoclast differentiation, fluid shear stress and atherosclerosis, JAK-STAT and MAPK signaling pathway, and so on (Supplementary Figure  7b). Enrichr analysis with WIKI pathways includes oxidative phosphorylation pathway, IL-2, IL-4, IL-7 signaling, vitamin D receptor, oncostatin signalling, aryl hydrocarbon receptor pathway, etc. Enrichr analysis using reactome showed enrichment of platelet activation, degranulation, and platelet response to elevated cytosolic calcium. In addition, signalling by interleukin and leptin were also enriched. The GO biological process included neutrophil-mediated immunity, activation, and degranulation, cytokine response, stimulus, and signaling, T cell response, response to IL-21, and so on. The disgenet showed degenerative polyarthritis, arteriosclerosis, sepsis, septicemia, atherosclerosis, RA, and asthma as significant disease pathways. Prediction of metabolites showed isoforms of sn-glycero-3-phosphoethanolamine as the major ones (Supplementary Figure 8).
Later X2K analysis from the iLINCS database of the GSE97779 transcriptomics data resulted in resulted in significantly expressed (based on the p-value 0.05) TFs and kinases which can be major role players in RA pathophysiology (Supplementary Figure 6).
Further, a comparison of the significant pathways of synovial macrophages dataset of RA patients (GSE97779) to the PBMCs dataset of RA patients treated with methotrexate (GSE35455 MTX) resulted in overlapping pathways that are regulated by the methotrexate treatment (Supplementary Figure 7c). 69 pathways are unique to RA synovial macrophage dataset and 44 pathways to RA PBMCs dataset with methotrexate treatment. And there were 43 common pathways between the datasets, of which lysosome, epstein-barr virus infection, and pathways in cancer were the most significant pathways which can be altered by the methotrexate treatment (Supplementary Table 6). Similarly, between RA synovial macrophages (GSE97779) dataset and RA PBMCs dataset treated with tocilizumab (GSE35455 TCMB) (Supplementary Figure 7d), we found 60 pathways that are unique to the GSE97779 dataset and 52 pathways to the GSE35455 TCMB dataset. And 52 pathways are common between both datasets, of which we could identify similar pathways such as with methotrexate treatment even by the treatment with tocilizumab (Supplementary Table 7). GSE10500: synovial macrophages of RA patients GSEA analysis of synovial macrophages (GSE10500) shows Th1 and Th2 cell differentiation, rheumatoid arthritis, complement, and coagulation, enrichment of various infectious diseases, and so on (Supplementary Figure 9a).
Enrichr analysis with KEGG pathway showed pathways like osteoclast differentiation, apoptosis, pathways associated with various viral infections as well as cancers, and so on (Supplementary Figure 9b). Enrichr analysis with WIKI pathways shows hypertrophy model, apoptosis modulation and signaling, signaling pathways in cancer (glioma and breast cancer), and so on. Enrichr analysis with reactome shows TP53 regulates transcription of caspase activators and caspases, tRNA aminoacylation, pathways associated with cell cycle, immune system, and so on. The GO biological process includes regulation of apoptosis, negative regulation of programmed cell death, positive regulation of protein kinase activity, extrinsic apoptotic signaling pathway, mast cellmediated immunity, activation and degranulation, T cell activation and differentiation, and so on. The disgenet analysis grouped genes into various disease pathologies such as carcinoma of the lung, leukemia, glioblastoma, carcinogenesis, B-cell lymphomas, and malignant neoplasm of the prostrate. The prediction of metabolites showed selenium, L-glutamine, NAP, NADPH, and glycine as the major ones (Supplementary Figure 10).
Later X2K analysis from the iLINCS database of GSE10500 transcriptomics data resulted in significantly expressed (based on the p-value 0.05) TFs and kinases which can be major role players in RA pathophysiology (Supplementary Figure 6).
Further, a comparison of significant pathways of another synovial macrophages dataset of RA patients (GSE10500) to the PBMCs dataset of RA patients treated with methotrexate (GSE35455 MTX) resuted in overlapping pathways that are regulated by the methotrexate treatment (Supplementary Figure 9c). 32 pathways are unique to RA synovial macrophage dataset and 69 pathways to RA PBMCs dataset with methotrexate treatment. And there were 18 common pathways between the datasets, of which epstein-barr virus infection, human papillomavirus infection, and pathways in cancer were the most significant pathways which can be altered by the methotrexate treatment (Supplementary Table  8). Similarly, between RA synovial macrophages (GSE10500) dataset and RA PBMCs dataset treated with tocilizumab (GSE35455 TCMB) (Supplementary Figure 9d), we found 28 pathways that are unique to the GSE10500 dataset and 82 pathways to the GSE35455 TCMB dataset. A total of 22 pathways were found to be common in both datasets. These pathways were similar to those obtained during methotrexate or tocilizumab treatment (Supplementary Table 9).
GSE77298: synovial biopsy of RA patients GSEA analysis of synovial biopsy (GSE77298) shows enrichment of metabolic pathways and pathways in cancer (Supplementary Figure 11a). Analysis using enrichr with KEGG pathway showed pathways like adipocytokine, chemokine, and prolactin signalling pathways, other glycan degradation, etc. (Supplementary Figure 11b) Analysis with WIKI pathways showed pathways like ncRNAs involved in STAT3 signaling in hepatocellular carcinoma, osteoclast signalling, macrophage markers, lung fibrosis, NAD biosynthesis, lipid metabolism, and so on. Reactome pathway analysis resulted in activation of PGC-1alpha, hyaluronan biosynthesis, and export, TNF receptor superfamily members mediating noncanonical NF-kB pathway, downregulation of ERBB4 signalling, AMPK inhibits chREBP transcriptional activation activity, scavenging of heme from plasma, etc. The biological processes involved are positive regulation of lipid transport, bone resorption, and bone remodeling, negative regulation of smooth muscle cell migration, microtubule depolymerization, skeletal muscle fiber development, cerebral cortex cell migration, and histone serine phosphorylation. Predicted metabolites included 2-deoxycytidine diphosphate and potassium as major metabolites (Supplementary Figure 12).
Later X2K analysis from the iLINCS database of GSE77298 transcriptomics data resulted in significantly expressed (based on the p-value 0.05) TFs and kinases that can be major role players in RA pathophysiology (Supplementary Figure 6).
Further, a comparison of the significant pathways of the synovial biopsy dataset of RA patients (GSE77298) to a dataset of PBMCs collected from RA patients treated with methotrexate (GSE35455 MTX) resulted in pathways that are regulated by the methotrexate treatment (Supplementary Figure 11c). 58 pathways are unique to RA synovial biopsy dataset and 39 pathways to RA PBMCs dataset with methotrexate treatment. A total of 48 common pathways were observed between the datasets, of which lysosome, and epstein-barr virus infection were the most significant pathways which can be altered by the methotrexate treatment (Supplementary Table 10). Similarly, between the RA synovial biopsy (GSE77298) dataset and RA PBMCs dataset treated with tocilizumab (GSE35455 TCMB) (Supplementary Figure 11d), we found 52 pathways that are unique to the GSE77298 dataset and 50 pathways to the GSE35455 TCMB dataset. And 54 pathways are common between both datasets, of which we could identify the epstein-barr virus infection with methotrexate treatment even by the treatment with tocilizumab. Interestingly, Epstein-Barr virus infection leads to citrullination of cyclic peptides/proteins which is associated with RA (Supplementary Table 11).
Overall the common biological processes that were involved in all the GEO datasets (GSE15573, GSE97779, GSE10500 and GSE77298) were leukocyte aggregation and negative regulation of T-helper cell differentiation.
Analysis of published proteomic data of RA patients shows deregulated pathways with potential implications for RA Analysis using Enrichr with KEGG database shows deregulation of pathways like complement and coagulation cascade, ECM receptor interaction, Focal Adhesion, PI3K-Akt signalling pathway, phagosome, and so on ( Figure 5). Analysis with WIKI pathways shows pathways like complement and coagulation cascade, selenium micronutrient network, vitamin B12, and folate metabolism, inflammatory response pathway, focal adhesion-PI3K-Akt-mTOR signalling pathway, and so on. Analysis with reactome shows pathways like platelet activation and degranulation, response to elevated platelet cytosolic Caþþ, complement cascade, scavenging of heme from plasma, innate immune system, and so on. Analysis of published metabolomic data shows deregulation of metabolic pathways with potential implications for RA Pathway analysis of published LC-MS serum metabolic data resulted in significant pathways such as aminoacyl tRNA biosynthesis, arginine biosynthesis, alanine aspartate, glutamate metabolism, etc. The pathway enrichment analysis of significant metabolites shows the metabolism of various metabolites (phenylalanine tyrosine tryptophan biosynthesis, phenylalanine metabolism, nitrogen metabolism, arginine biosynthesis, glutamine and glutamate metabolism, alanine aspartate and glutamate metabolism) porphyrin and chlorophyll metabolism, butanoate metabolism, etc. The enrichment analysis of top significant metabolites shows aminoacyl tRNA biosynthesis, arginine biosynthesis, glutamine and glutamate metabolism, alanine aspartate and glutamate metabolism, and so on (Figure 6a).
Further analysis of published GC-MS metabolomics data (Zhou et al., 2016) with MetaboAnalyst showed phenylalanine metabolism, phenylalanine, tyrosine, and tryptophan metabolism, valine, leucine and isoleucine metabolism, aminoacyl tRNA biosynthesis, etc. Enrichment analysis of significant metabolite sets showed enrichment of aminoacyl-tRNA biosynthesis, valine, leucine, and isoleucine biosynthesis and degradation, phenylalanine tyrosine, and tryptophan biosynthesis, and so on (Figure 6b). Furthermore, analysis of a different set of published GC-MS data sets (Madsen et al., 2011) using metaboanalyst showed deregulation of aminoacyl tRNA biosynthesis, pentose phosphate pathway, metabolism of various amino acids (alanine aspartate and glutamate, glycine, serine and threonine, valine leucine and isoleucine, histidine, beta-alanine, cysteine, and methionine). The glycerolipid metabolism, glycoxylate, and dicarboxylate metabolism were also found to be deregulated. Enrichment analysis of significant metabolite sets showed enrichment of aminoacyl-tRNA biosynthesis, glycine serine and threonine biosynthesis, valine leucine and isoleucine biosynthesis, histidine metabolism, and so on (Figure 6c).

Analyses of transcription factors and kinases suggest their critical involvement in RA
Transcription factor and kinases analysis for the GEO datasets -RA PBMCs (GSE15573), RA synovial macrophages (GSE97779, GSE10500), and RA synovial biopsy (GSE77298) was carried out. The analysis showed enrichment of transcription factors and kinases that of are critical importance to processes like cartilage degradation, synoviocyte proliferation, inflammatory response, macrophage differentiation into osteoclasts, and differentiation of MSCs into osteoblasts and the mineralization. Further, commonality analyses for both transcription factors and kinases were carried out among different GEO datasets. Our analysis shows a considerable overlap of transcription factors and kinases among the different datasets (Supplementary Figure 13). The enrichment analysis showed 16 common transcription factors which include -EGR1, RUNX1, HNF4A, SPI1, AR, BACH1, CREM, FLI1, KDM5B, MITF, MTF2, PPARD, SIN3B, SOX2, SUZ12, and TP63. The enrichment analysis showed 18 common kinases which include -AKT1, CDK7, CDK9, CSNK2A1, GSK3B, HIPK2, MAPK1, TAF1, ABL1, CDK, CHUK, CSNK2A2, MAPK14, MAPK8, ATM, CDK6, MAPK3, WEE1. Further, the target genes of these transcription factors and kinases were analyzed for the pathways they are involved in. The details of the pathways that each transcription factors are involved or has impact are given in Table 2.
Commonality analysis of data from SNP analysis and multi-omics datasets analysis shows pathways that are overlapped Pathways resulted from the pathway analysis of SNPs, and data from transcriptomics, proteomics, and metabolomics had a significant overlap when a commonality analysis was carried out (Figure 7). Data from analysis of SNP, transcriptomics, and proteomics resulted in common pathways such as influenza A, systemic lupus erythematosus, PI3K -Akt signaling, phagosome, and human papillomavirus infection, proteoglycans in cancer, and ferroptosis. Whereas SNP analysis & transcriptomics datasets had overlapping pathways such as AGE-RAGE signaling in diabetic complications, NOD-like receptor signaling, tuberculosis, FoxO signaling, and herpes simplex virus 1 infection as the major pathways. Proteomics and SNP analyses had complement and coagulation cascades, malaria, african trypanosomiasis, and staphylococcus aureus infection as overlapping pathways. Proteomics and transcriptomics had common pathways such as regulation of actin cytoskeleton, focal adhesion, platelet activation, and ECM-receptor interaction. Cysteine and methionine metabolism, arginine and proline metabolism, D-glutamine and Dglutamate metabolism, and valine, leucine, and isoleucine degradation were found to be common pathways between metabolomics and transcriptomics datasets. A significant overlap of pathways was seen among SNP analysis, proteomics, and transcriptomics but not with metabolomics.
Integrative analysis of multi-omic datasets suggests pathways that are critical for RA pathophysiology To understand the play of different metabolic and signaling pathways contributing to the joint destruction in RA, we identified through literature the role of the transcription factors, kinases, and metabolites that can have a potential impact on processes such as cartilage destruction, synoviocyte proliferation, inflammation, differentiation of macrophages to osteoclasts and differentiation of MSCs to osteoblasts and their differentiation (Figure 8). From the transcriptomic data analysis of 4 GEO datasets, we have transcription factors such as EGR1, SPI1, HNF4A and PPARD activate or have a role in osteoclast differentiation from macrophages and also enhances osteoclast function while the AR transcription factor inhibits osteoclast formation. Synoviocyte proliferation is positively regulated by transcription factors such as EGR1, and TP63 and negatively regulated by CREM Similarly, kinases such as TAF1, AKT1 and HIPK2 are important role players in osteoclast differentiation from macrophages and osteoclast function. Whereas AKT1 and CSNK2A1 also have a potential role in the proliferation of synoviocytes. TAF1 is also involved in the activation of inflammatory processes.
We had identified key metabolic pathways that had implications on the joint remodelling in RA based on the serum metabolomic analysis of three public datasets of RA patients compared to controls. We found that the metabolites such as L-valine, L-isoleucine, (S)-3-Methyl 2-oxopentanoic acid, and L-leucine have potential roles in inhibiting osteoclast differentiation but they are present in low levels in RA patients due to their degradation. Valine, leucine, and isoleucine metabolism have a major role in tissue regeneration and repair and also inhibit cartilage degradation. Cysteine and methionine metabolism inhibits osteoclast differentiation from macrophages and also inhibits cartilage degradation. Arginine and proline metabolism has an important role in inducing osteoblast differentiation from MSCs, activating synoviocyte proliferation, and inhibiting inflammatory processes. D-glutamine and D-glutamate metabolism metabolites have an important role in osteoclast differentiation and function and synoviocyte proliferation, and on the contrary, it negatively regulates cartilage degradation and inflammation.

Discussion
Single nucleotide polymorphisms (SNP) in many genes were shown to be associated with RA (Saad et al., 2016). Analysis of SNPs using SNP nexus shows its association with clinical parameters like RBC, WBC and platelet count, CRP, ANA, ACPA, blood protein, and Ig levels. Previous studies have also shown that RA is associated with changes in biochemical and hematological parameters like hemoglobin levels, MCV, ESR, and hematocrit (Castro & Gourley, 2010;D'Cruz et al., 2020;Karsdal et al., 2011;Klimenta et al., 2020). Our pathway analysis of SNP data showed enrichment of inflammatory pathways among the top 50 significant pathways which majorly encompass cytokine and other inflammatory signalling pathways.
Transcriptomics analysis was carried out to understand changes and perturbations in various pathways. Consistent with the association of SNPs to immune pathways, analysis of our transcriptomic data sets revealed deregulation of Th1, Activates osteoclast differentiation and their function, and synoviocyte proliferation (Aicher et al., 1999).
Th2, and Th17 signalling pathways and cytokine signalling in RA. Previous studies have shown a role for elevated levels of Th1 cytokines and Th17 cytokines with joint destruction (Alunno et al., 2017). Our previous study has shown elevated levels of TNFa, IFNc, IL-6, RANKL, and IL17 in our RA patient cohort . Several studies have shown changes in levels of cytokines and chemokines correlate with disease activity and favorable prognosis following treatments with DMARDs or biologics (Osiri et al., 2016;Ozaki et al., 2001;Rodriguez-Garc ıa et al., 2021). RA is associated with citrullination resulting from the conversion of amino acid 'Arg' to citrulline by enzymes called peptidyl arginine deaminases (PADs). Citrullinated peptides lead to the production of autoantibodies (ACPA) by shared epitopes (Brink et al., 2013). Our previous study reported high levels of ACPAs in RA patients with elevated ADA levels . These findings gain relevance as bacterial and viral infections lead to high levels of ADA (Afrasiabian et al., 2013;Bou Ghanem et al., 2015;Conde et al., 2002;Ketavarapu et al., 2013;Yusti et al., 2018). Indeed, transcriptomic analysis of GEO data sets grouped genes into pathways belonging to bacterial and viral infections. These infections lead to the activation of pattern recognition receptors and nucleotide-binding oligomerization domain (NOD)-like receptors (NLRs) resulting in elevated levels of cytokines such as TNFa, IFNc, IL-1b, and IL-18 (McCormack et al., 2009). Previous studies reported the production of interleukin-12 (IL-12) by macrophages and mast cells (H. S. Kim & Chung, 2012). Taken together, bacterial and viral infections by inducing elevated citrullinated peptides and cytokines might invoke the development of RA symptoms, which could be mitigated by prophylactic treatment with methotrexate after the resolution of inflammation (Srimadh . Analysis of transcriptomic and metabolomic data from our previous study and those from the literature shows glycolysis/gluconeogenesis, Branch chain amino acid, and oxidative phosphorylation as the deregulated pathways. Studies have confirmed that activated macrophages and dendritic cell maturation exhibit increased glycolysis, glutamine metabolism, and pentose phosphate pathway (Dong & Bullock, 2014;Møller et al., 2021;Perl et al., 2011;Rodr ıguez-Prados et al., 2010;Williams & O'Neill, 2018). Elevated lactate levels in the blood and synovial fluid of RA (Pucino et al., 2019) are associated with pro-inflammatory response and aggressive phenotype of FLS (Biniecka et al., 2016). In collagen-induced arthritis in an animal model of RA, glycolytic inhibitors could decrease disease activity as well as reduce the proliferation of FLS (Abboud et al., 2018;Bustamante et al., 2017;Mousavi et al., 2021). BCAAs are essential amino acids, metabolized by the branch chain aminotransferase 1 (Coras et al., 2020;Hole cek, 2018). Changes in the levels of BCAA are also correlated with muscle wastage in the CIA mice model of RA attributed to protein degradation leading to changes in amino acid levels (He et al., 2019) Degradation of leucine results in acetyl CoA and glutamine which enters the TCA cycle by glutaminolysis (Anderson et al., 2018). The proliferation of FLS in RA patients was shown to be dependent on glutamine (Alyssa et al., 2020;Takahashi et al., 2017). Inhibitors of glutaminase-1 led to an impaired proliferation of FLS both in in vitro culture and in mice model of disease (Takahashi et al., 2017).
TCA cycle is upregulated in transcriptomics as well as in metabolomic data. Equally metabolites of the TCA cycle like succinate aid in the activation and proliferation of macrophages and synovial fibroblasts and play an important role in RA (Fearon et al., 2019;S. Kim et al., 2014). GPR91 which is a succinate receptor is essential for macrophage activation and IL-1b production in an animal model of RA (Littlewood- Our transcriptomic and metabolomic analysis shows deregulation of glycerolipid metabolism in RA. Studies confirm a role for choline kinase which is expressed in RA tissues in glycerolipid metabolism (Guma et al., 2015), TNFa induces the expression of choline kinase, leading to elevated levels of phosphatidylcholine in FLS (Guma et al., 2015). Consistent with this, inhibitors of choline kinase ameliorated FLS migration and resistance to cell death in vitro and also helped to achieve a favorable prognosis in an animal model of RA (Guma et al., 2015).
Levels of arginine a substrate for biosynthesis of nitric oxide (NO) by NOS is deregulated in RA (Lorin et al., 2014;Panfili et al., 2020). The synovium also expresses high levels of arginase (ARG) which degrades arginine thus indirectly modulating NO levels (Huang et al., 2001). RANKL-induced osteoclastogenesis is dependent on arginine (Brunner et al., 2020). Arginine restriction improves the murine model of arthritis (McHugh, 2020).
Increased levels of proline are also associated with RA (J. Zhou et al., 2016). Elevated levels of ornithine and proline are also associated with osteoarthritis which is correlated with the imbalance between cartilage repair and degradation (W. Zhang et al., 2016). However, the role of proline remains uncertain concerning the fate of cartilage in RA .
Methionine and cysteine metabolism was one of the common pathways among the different multi-omic datasets, which was found to be deregulated. Methionine plays a pivotal role in methylation reaction and hence in epigenetic modulation of the immune system (Roy et al., 2020). Methionine administration was found to mitigate RA in neonate rats . A high methionine diet also conferred protection against adjuvant arthritis in mice (M. Li et al., 2016). Consistent with this, methionine inhibited osteoclastogenesis in vitro (Vijayan et al., 2014). Cysteine is an important component of glutathione and has been reported to inhibit osteoclast formation. S-propargyl-cysteine was shown to attenuate inflammation in RA (Wu et al., 2016). Taken together our results show a critical role for different metabolic pathways in modulating different processes associated with RA. Achieving metabolic homeostasis might help to attain a potentially favorable prognosis.
Studies on the molecular and cellular mechanisms associated with RA have conferred a critical role for transcription factors in the disease process (Tsuchiya et al., 2020). Our TF analysis on the transcriptomic data sets shows enrichment of TF associated with RA. The common TF include the EGR1, RUNX1, SPI1, CREM, HNF4A, etc. (Figure 8). EGR1 is implicated in healing and its dysfunction is associated with fibrotic conditions (Havis & Duprez, 2020). EGR1 is also associated with the expression of collagen by fibroblasts and its increased activity is associated with fibrosis observed in RA synovium (Alexander et al., 2002). Previous systems analysis shows that RUNX1 is associated with T cell activation (Wilfahrt et al., 2019). RUNX1 also inhibits the differentiation of naïve CD4 T cells into the Th2 lineage (Komine et al., 2003). SNP in RUNX1 results in reduced expression of SLC22A4 and is associated with RA in the japanese population (Tokuhiro et al., 2003). RUNX1 SNP was reported to be in the super-enhancer region in the CD4þ T cells from synovial fluid of juvenile idiopathic arthritis patients (Peeters et al., 2015). SpI1 transcription factor which is a macrophagespecific marker is common to all the GEO data sets analyzed. Indeed, previous studies on transcriptomic data show enrichment of macrophage-specific TF SpI1 (Xue et al., 2014).
Proinflammatory signals activate SpI1 which drives the expansion of myeloid progenitors (Hernandez et al., 2020).
Severe disease activity was observed in ICER/CREM sufficient mice than in deficient ones suggesting a major role for CREM in the progression of the disease under inflammatory conditions (Yoshida et al., 2016). The cytokine TNFa significantly induced the expression of EZH2 and SUZ12 in synovial fibroblasts (Trenkmann et al., 2011). The TNFa induced NFjB target genes were impaired when EZH1 or SUZ12 was knockdown in HCT116 cells (Shuai-Kun et al., 2016). Previous studies have also shown a potential role for EZH2 and SUZ12 in the epigenetic regulation of NK cells during inflammation (Gan et al., 2018).
Our analysis shows the involvement of the androgen receptor (AR) with RA. Previous studies have shown that increased AR activity due to reduced CAG repeat is associated with the early onset of disease (Price et al., 2010). However elevated levels of androgens act as immune suppressors and delay RA (Cutolo, 2009). Reduced testosterone levels are associated with many autoimmune diseases and increased inflammatory markers (Gubbels Bupp & Jorgensen, 2018;Mohamad et al., 2019). Testosterone also suppressed the secretion of IL1 by PBMCs from RA patients and the synthesis of IL-1b by synovial macrophages (Mohamad et al., 2019). Taken together AR modulates inflammation with a potential consequence for RA.
TP63 belongs to the p53 family of transcription factors expressed in chondrocyte cell lines. Overexpression of p63 can modulate chondrocyte differentiation in vivo (Gu et al., 2013). Previous studies showed that the p63 gamma isoform has decreased chondrocytes, potentially due to the expression of apoptotic genes resulting in apoptosis. Conditional KO of p63 was shown to be chondroprotective (Taniguchi et al., 2017). P63 also regulates the expression of matrix metallopeptidase-13 (MMP13) which is essential for the invasiveness of metastatic cancers (Celardo et al., 2014). MMP13 is expressed in synovium and is involved in cartilage degradation (Moore et al., 2000).
PPARD participates in lipid homeostasis, inflammation, and balancing of blood cholesterol and glucose (Decara et al., 2020;Tan et al., 2001;. PPARD activation leads to increased expression of proteases in cartilage cells and PPARD KO conferred protection against cartilage death in the DMM model of post-traumatic OA (Ratneswaran et al., 2017). PPARD antagonist GSK0660 inhibited TNFa induced expression of cytokines and leukocyte recruitment (Wagner & Wagner, 2020). In PPARD KO mice impaired the proliferation of T lymphocytes in vitro and the development and progression of CIA (collagen-induced arthritis; (Luz-Crawford et al., 2016).
FLI1 is a critical regulator of chemokines which play a role in attracting immune cells . FLI1 regulates chondrogenesis, angiogenesis, and wound healing by modulating the expression of CCN2 in a TGFb dependent manner (Nakerakanti et al., 2011). TGFb invokes CCN2 expression by displacing FLI1 from the CCN2 promoter (Van Beek et al., 2006). FLI1 is shown to activate the transcription of IL6 . Taken together, FLI1 can be critical in RA pathophysiology.
Our transcriptomics analysis also identified common kinases that are present in all the GEO datasets analyzed. Kinases are associated with RA disease progression (Chimenti et al., 2015;Okamoto & Kobayashi, 2011). TNF-mediated activation of RANKL expression can induce MAPK signaling in RA and is involved in osteoclast differentiation (Ha et al., 2004). HIPK2 kinase is associated with cartilage degradation in RA. The miRNA miR-27b-3p inhibits of apoptosis of chondrocytes by directly inhibiting the HIPK2 expression (Zhou et al., 2019). Akt1 and Akt2 kinases play a critical role in the differentiation of osteoclasts and their survival (Sugatani & Hruska, 2005). Akt signaling is also involved in osteoclastogenesis through IL-21 and RANKL independent signaling (Xing et al., 2016). Protein kinases such as CSNK2A1 and CSNK2A2 are associated with osteoclast differentiation (Borgo et al., 2021). Taken together our kinase analysis of transcriptomics data reiterates the significant role played by kinases in RA pathophysiology.
In conclusion, RA is a multifactorial systemic autoimmune disease that eventually reduces a patient's mobility and leads to joint erosions and destruction. Despite multiple efforts on different omics platforms, there is more to be explored in the context of early biomarkers and therapeutics of RA. In our study, we carried out an integrative analysis of SNPs, transcriptomics, proteomics, and metabolomics datasets of RA. The SNPs were analyzed for their clinical and functional relevance and they are grouped into pathways. Further, the kinases and transcription factors that are common to different transcriptomics datasets as well as the pathways that are common among different multi-omics datasets were identified. These TFs, kinases and deregulated pathways modulate the processes like inflammation, cartilage death, synoviocyte proliferation as well as differentiation of macrophages and MSC into osteoclasts and osteoblasts respectively and their functions. Overall, as these pathways affect joint remodeling in RA, our analysis identified molecular signatures that are associated with the observed symptoms of RA and joint erosions. Our analysis by systems biology approach identified molecular signatures like metabolites from deregulated metabolic pathways as potential biomarkers and kinases, TFs and deregulated metabolic pathways which could be potential therapeutic targets in RA.