Identification of Potential Therapeutic Targets for Myopic Choroidal Neovascularization via Discovery-Driven Data Mining

Abstract Purpose: Myopic choroidal neovascularization (mCNV) is a prevalent cause of vision loss. However, the development of effective therapeutic targets for mCNV has been hindered by the paucity of suitable animal models. Therefore, the aim of this study is to identify potential genes and pathways associated with mCNV and to unearth prospective therapeutic targets that can be utilized to devise efficacious treatments. Methods: Text data mining was used to identify genes linked to choroid, neovascularization, and myopia. g: Profiler was utilized to analyze the biological processes of gene ontology and the Reactome pathways. Protein interaction network analysis was performed using strings and visualized in Cytoscape. MCODE and cytoHubba were used for further screening. Results: Discovery-driven text data mining identified 55 potential genes related to choroid, neovascularization, and myopia. Gene enrichment analysis revealed 11 biological processes and seven Reactome pathways. A protein-protein interaction network with 47 nodes was constructed and analyzed using centrality ranking. Key clusters were identified through algorithm tools. Finally, 14 genes (IL6, FGF2, MMP9, IL10, TNF, MMP2, HGF, MMP3, IGF1, CCL2, CTNNB1, BDNF, NGF, and EDN1), in addition to VEGFA, were evaluated as targets with potential as future therapeutics. Conclusions: This study provides new potential therapeutic targets for mCNV, including IL6, FGF2, MMP9, IL10, TNF, MMP2, HGF, MMP3, IGF1, CCL2, CTNNB1, BDNF, NGF, and EDN1, which correspond to seven potential enriched pathways. These findings provide a basis for further research and offer new possibilities for developing therapeutic interventions for this condition.


Introduction
Over the past 50 years, the prevalence of myopia, particularly high myopia, has increased considerably.However, the underlying mechanism of myopia remains unclear. 1 Recent studies have suggested a potential correlation between choroid and ocular growth and myopia. 2 The choroid plays a crucial role in various functions, including acting as the primary blood supply to the outer retina, 3 regulating vascularization and scleral growth, 4 and affecting visual focus through changes in choroidal thickness. 5Moreover, angiogenesis and neovascularization are involved in a complex interplay of physiological and pathological mechanisms that have been implicated in various eye disorders, including myopia.Pathological myopia in adults is the second leading cause of choroidal neovascularization (CNV), which is a serious and vision-threatening complication, affecting around 5-11% of pathological myopia patients. 6CNV involves the pathological formation of new blood vessels that affect the choroid and exhibit high permeability, structural defects, and sensitivity to bleeding and hematoma. 7In non-pathological myopia, choroidal thinning has been linked to the onset and progression of severe myopia-related disorders. 8Furthermore, impaired choroidal blood flow can harm the retina's microstructure and function, 9 and angiogenic factors are closely involved in maintaining vascular homeostasis.
Current research in the field has predominantly focused on vascular endothelial growth factor (VEGF), with several studies reporting a clinical correlation between myopic choroidal neovascularization or myopic macular neovascularization (mMNV) and elevated levels of VEGF in aqueous humor. 10Anti-VEGF therapy has become the primary treatment for mCNV and has been shown to improve the overall visual outcomes of mCNV in myopic patients. 11However, this treatment approach necessitates frequent injections and the cost of anti-VEGF drugs can be a significant burden for patients. 12Despite the success of anti-VEGF therapy, there is still a dearth of treatment targets and drugs for mCNV, 13 underscoring the pressing need to identify new targets for the treatment of mCNV.Nevertheless, the lack of effective animal models for myopic choroidal neovascularization presents a significant challenge in the identification and development of new therapeutic targets. 14As a result, there is a pressing need to devise novel approaches that can facilitate the screening and validation of potential targets.
In recent years, the development of therapeutics has become increasingly expensive and time-consuming.As a result, network screening based on biological databases has emerged as a valuable tool in the early stages of target identification and drug discovery, allowing for the acceleration of drug candidate discovery and the reduction of therapeutic development costs. 157][18][19] By converting text information into database content or complex networks, text data mining can provide researchers with an efficient method for processing and extracting information in response to the rapid growth of information flow.This process may also provide context and guidance for new research, particularly with respect to complex processes involving genes, proteins, and phenotypes.Ultimately, the application of biomedical text mining and network databases can help achieve the goal of identifying new targets for therapeutic development in low cost and high efficiency.
The objective of this study is to investigate potential genes and pathways that are associated with myopic choroidal neovascularization in an economically efficient manner by employing biomedical data mining techniques from multiple databases, in light of the lack of an effective mCNV animal model.The anticipated outcome of this research is to establish a theoretical foundation that can facilitate the development of new therapies and drugs for this condition.

Data mining
To perform discovery-driven text data mining, we utilized two databases: pubmed2ensembl 20 and GenCLiP3 21 databases.The pubmed2ensembl database provides access to nearly 150,000 genes in Ensembl from 50 species, while GenCLiP3 integrates CoreNLP, Sphinx, and MySQL and has undergone two architecture upgrades to improve the reliability of text mining results.By combining multiple data sources, the reliability of text data mining results is enhanced.To initiate the queries, we used three search terms: "myopia," "choroid," and "neovascularization," with "homo sapiens" specified as the species.We selected "search for PubMed IDs," "retrieve up to 100,000 document IDs," and "filter on MEDLINE PubMed ID" in pubmed2ensembl to generate a list of genes.In GenCLiP3, we chose "All human genes" and "MEDLINE" as screen settings.We then removed duplicate genes and obtained lists of the queries for further analysis.
In addition, we obtained risk genes from the GWAS catalog (2019 version) 22 and the Consortium for Refractive Error and Myopia (CREAM) 23 as supplementary sources, and integrated them with the myopia-related genes obtained through data mining to create a gene list corresponding to the keyword "myopia," which was used for further analysis.

Enrichment analyses (GO biological process and Reactome pathway)
To perform enrichment analyses for the gene intersections obtained from text data mining, we utilized g: Profiler, a powerful tool that can map genes to functional information sources and identify significantly enriched terms. 24The organism was set as "Homo sapiens (human)" and the significance threshold was set by g: SCS threshold for computing multiple testing correction for the p value.This method constructs a simplicial complex from p values and uses the topology of the complex to adjust the p values for multiple hypothesis testing.The threshold cut-off was selected to identify the most enriched GO biological process terms and Reactome Knowledgebase 25 pathways.

Protein-protein interaction (PPI) network and MCODE/cytoHubba analysis
In order to establish a protein-protein interaction (PPI) network, we utilized the STRING database, 26 which incorporates information from various sources such as highthroughput experiments, computational predictions, and literature curation to provide a comprehensive and current perspective on protein interactions and functional networks.To generate the PPI network, we inputted gene intersections obtained from data mining and specified "Homo sapiens" as the organism of interest.The type of analysis selected was "Multiple Protein," and the network type was set to "Full network," with "evidence" chosen as the meaning of network edges.In order to ensure a high degree of confidence, we established a minimum interaction score of 0.700.Finally, we exported the resulting PPI network as a TSV document for subsequent analyses.
For visualization and analysis of the gene network, we utilized the open-source software Cytoscape 27 (Version: 3.9.1,RRID: SCR_003032), which is widely recognized for its ability to model, analyze and visualize biological networks.To identify the most important genes in the network, we applied the Maximal Clique Centrality (MCC) algorithm 28 in the CytoHubba plugin (RRID: SCR_017677), which sorts the genes based on their level of importance.MCC evaluates the significance of nodes in a group based on the size of the maximal cliques they belong to, and has proven to be effective in identifying key nodes in diverse types of networks.We selected genes with a centrality score higher than the median for further analysis.
To identify gene clusters of significance within the PPI network, we employed the Molecular Complex Detection (MCODE) tool (RRID: SCR_015828), utilizing the following criteria: "In Whole Network," "Degree Cutoff ¼ 2," "Node Score Cutoff ¼ 0.2," "K-Core ¼ 2," and "Max.Depth ¼ 100."The Database for Annotation, Visualization and Integrated Discovery (DAVID) 29 offers a comprehensive collection of functional annotation tools that enable researchers to elucidate the biological implications of genes.In this study, we utilized the DAVID database to validate the results of our MCODE analysis and clusters.

Results of text mining
Figure 1 illustrates the text mining approach employed in this study.To mitigate algorithm bias and minimize the effect of noise words on the results, we extracted gene lists from both pubmed2ensembl and GenCLiP3 databases.We retrieved 819 unique genes for neovascularization (Supplementary Table S1) and 544 genes for choroid (Supplementary Table S2).In addition, we obtained a total of 533 genes associated with myopia by combining the results of GWAS catalog 2019, CREAM, and text mining (Supplementary Table S3).By analyzing the gene sets, we identified 55 common genes (Table 1) across all three sets.

Results of enrichment analyses
The results of text mining were utilized to perform enrichment analyses using g: Profiler, which enabled the identification of the most enriched terms related to biological processes.To ensure the selection of the most significant items, a significance threshold cut-off (adjusted p value < 1.0E-13) was set.Text mining tools have proven valuable for enrichment analysis by providing access to biologically relevant contextual information that is not currently covered by structured database records. 30It is worth noting that a lower p value and a higher proportion of genes may indicate a strong correlation between the biological process and myopic choroidal neovascularization.However, in certain cases, genes that appear more frequently in the literature may lead to false-positive results.To address this issue, we utilized the g: SCS correction method 31 in our enrichment analysis.This method takes into account the dependence of multiple tests by considering the overlap of functional terms, thereby reducing false-positive results.

Results of protein-protein interaction network and algorithm analysis
In this study, we utilized the STRING database (Version: 11.5) to construct a protein-protein interaction (PPI) network.To ensure the accuracy and reliability of our analysis, we set a high confidence threshold of 0.700 for interaction scores (Figure 3).We then exported the analysis results as source data and imported them into Cytoscape for hub gene analysis using algorithmic tools.The Cytoscape visualization network produced is comprised of 47 nodes and 418 edges.The nodes' color is graded on a scale from light to dark, reflecting their automated text mining score (tscore) and the co-occurrence of gene/protein from data mining.Consequently, darker nodes represent stronger relationships between the nodes (Figure 4).
To further identify important nodes in the gene network, we used the cytoHubba plugin to rank node importance.After reviewing relevant literature, we selected the MCC algorithm for improved sensitivity and specificity. 32Nodes with higher scores indicated greater centrality within the  overall network (Supplementary table S4).Based on the MCC algorithm, we scored the nodes and sorted the results in descending order.We then selected nodes with scores above the median for subnetworks required for MCODE analysis (Figure 5).Using MCODE analysis, we generated key clusters from the sub-network (Supplementary Figure S1).Cluster I (MCODE Score¼ 7.714) consisted of 8 nodes and 27 edges corresponding to genes VEGFA, MMP9, MMP2, IL10, FGF2, TGFB1, IL6, and MMP3.Cluster II (MCODE score¼ 3.75) comprised 9 nodes and 15 edges corresponding to genes HGF, CTNNB1, IGF1, BDNF, NGF, TNF, CCL2, CRP, and EDN1.
In order to delimit the scope of our analysis and evaluate the significance of the two identified clusters, we conducted a Reactome pathway enrichment analysis utilizing DAVID database 29 on the gene lists.To filter out the top-ranked pathways and their corresponding genes, we adopted a threshold of p ¼ 1.0E-4.The analysis revealed pathways corresponding to 15 genes, including VEGFA, MMP9, EDN1, CCL2, MMP2, BDNF, IGF1, NGF, IL10, MMP3, IL6, CTNNB1, HGF, FGF2, and TNF.
Based on the results presented, a total of 15 genes belonging to the two clusters have been identified as promising potential therapeutic targets.Notably, besides the conventional VEGFA targets, these genes encompass 14 new targets including IL6, FGF2, MMP9, IL10, TNF, MMP2, HGF, MMP3, IGF1, CCL2, CTNNB1, BDNF, NGF, and EDN1, ranked in descending order according to their MCC score.

Discussion
Myopia is a leading cause of visual impairment and blindness globally. 1Patients with myopia are also susceptible to developing choroidal neovascularization (CNV), which can result in a significant progressive decline in vision. 6urrently, anti-VEGF therapy is the primary treatment for mCNV. 11,33However, this treatment is associated with high costs and frequent injections, which can pose a burden to patients. 12Therefore, it is essential to identify new targets for choroidal neovascularization in myopic eyes and develop corresponding therapeutic agents.Nevertheless, establishing mCNV-related animal models remains challenging.Some researchers have made valuable attempts to combine formdeprived myopia and laser-induced CNV to mimic mCNV, 14 but there are still gaps in the mechanism between this model and real mCNV. 34Data mining could provide valuable insights for the therapeutic development of mCNV, given the current realities.
This study utilized biomedical database data mining to investigate 55 genes associated with myopic choroidal neovascularization (intersection can be seen in Supplementary Figure S2).The results of the GO biological process analysis revealed 11 terms and 7 Reactome pathways.The study further established a network of 14 core candidate genes (IL6, FGF2, MMP9, IL10, TNF, MMP2, HGF, MMP3, IGF1, CCL2, CTNNB1, BDNF, NGF, and EDN1) and their interaction patterns.These findings provide a foundation for future studies on the mechanisms underlying mCNV and pave the way for potential therapeutic development.
The analysis of gene function enrichment revealed significant enrichment in interleukins, platelets, and immune function.The interleukin family is recognized for their multifaceted roles in immunity and inflammation, which have been implicated in the pathogenesis of various ocular diseases. 35Altered expression levels of interleukins, including IL6 and IL10, were reported in patients with CNV, mCNV, and high myopia. 13The secretion of IL10 may be a compensatory response before the emergence of CNV, and the continuous decompensation of IL10 may lead to neovascularization. 36Platelet-related pathways have been established to play a crucial role in neovascularization. 37Apart from interleukins and platelets, immune regulation mediated by other factors is also an important focus of CNV research.Animal models have demonstrated the significance of RAGE 38 and Fas/FasL 36 in regulating immunity and their contribution to CNV development.In addition, as important inflammatory cytokines, the expression of TNF and CCL2 was increased in laser-induced CNV, while intravitreal injection of heparin could inhibit the development of CNV. 39At the same time, the expression of TNF was up-regulated in myopic eyes, which may play a role in the progression of myopia through chronic inflammation. 40These findings emphasize the strong correlation between the aforementioned pathways and the pathology of mCNV, aligning with the current body of knowledge.
Growth factors and neurotrophic factors are classic targets for ocular therapeutics drug development.The association of FGF2 with high myopia has been demonstrated in murine models. 41Although some previous studies have suggested that targeted disruption of FGF2 alone cannot prevent the occurrence of laser-induced CNV, 42 subsequent research indicates that FGF2 may regulate pathogenic angiogenesis through indirect mechanisms. 43HGF may serve as an initial regulator of neovascularization in CNV. 44Also, HGF has been reported to be associated with high myopia in the Han Chinese population. 45Neurotrophic factors (FGF2, NGF, IGF1, and BDNF) are thought to be associated with CNV in age-related macular degeneration. 46romisingly, new FGF2 inhibitors, such as Dovitinib, ponatinib, and AZD4547, have shown encouraging results in preclinical model experiments in cancer therapy, 47 and may have the potential to be a valuable adjunct to anti-VEGF therapy in the future.
MMP2, MMP3, and MMP9 are zinc-dependent endopeptidases belonging to the matrix metalloproteinases (MMPs) family, which are involved in various biological processes such as angiogenesis and neovascularization. 48Additionally, MMP3 has been linked to high myopia. 49MMP2 has been reported to be associated with myopic macular degeneration. 50However, further studies are needed to elucidate its relationship with mCNV.Currently, MMP-targeting drugs have attracted attention in various disease fields, and some potential drugs have entered human clinical trials. 51onsidering the significant involvement of MMPs in both high myopia and the CNV process, it signifies the practical significance of MMPs as a potential target for mCNV.
Simultaneously, it is advantageous to explore the pathogenesis of mCNV and the relationship of the identified genes.The pathogenesis of mCNV remains unclear and encompasses mechanical, genetic, and hemodynamic factors. 13echanical stretching and structural changes resulting from scleral elongation and thinning of the retinal pigment epithelium and Bruch's membrane disrupt the normal physiological balance of the choroidal vasculature. 6Moreover, dysregulation of intraocular cytokines such as VEGF, IL-6, and TNF-a contributes to choroidal neovascularization. 35,52Notably, EDN1 has been identified as a potential novel target in this study.Studies have demonstrated that EDN1 can induce an angiogenic phenotype in cultured vascular endothelial cells, while in isolated porcine retinal arterioles, EDN1 stimulation activates EDNRA, leading to vasoconstriction and highlighting the vital role of endothelin in vascular homeostasis. 53,54owever, the direct association between EDN1 and myopia or CNV has not been confirmed according to existing reports.Therefore, it would be worthwhile to investigate whether EDN1 plays a distinct role in mCNV progression.Furthermore, as mentioned earlier, the identified target genes and pathways in this study are implicated in mCNV, but they may also play a role in other forms of neovascularization such as age-related macular degeneration and diabetic retinopathy.Further research is needed to determine the specific contributions of these factors and their interactions with other neovascular diseases, ultimately aiding the development of targeted therapies for mCNV.
The utilization of bioinformatics tools, including data text mining, has been shown to provide new opportunities to reveal key nodes of disease. 30,55,56By performing biomedical database-based enrichment analysis, disease-associated biological processes and pathways can be unveiled.The combination of STRING and Cytoscape enables a visual representation of gene interactions and a highly adaptable network framework.Over time, data text mining methods and tools have been effectively employed in drug discovery and evaluation across numerous domains, 16 following several iterative updates.Our research also involved investigating potential drug-gene interactions through the DGIdb database. 57We generated a list of 80 corresponding FDAapproved drugs (Supplementary Figure S3), in addition to the classic VEGFA target.However, we acknowledge the limitations of silicon-based algorithm analysis, and despite the significance of this attempt, the potential drug-gene-disease interaction displayed in the results necessitates further animal experimentation and human sample verification.
It is important to note that while we have addressed certain known issues in this study through the use of various data processing tools, there are still limitations that need to be considered.Firstly, text mining, which involves extracting high-quality information from text, is reliant on the database capacity chosen and may be subject to bias due to algorithmic and database limitations. 58Balancing the accuracy and data volume of text mining is an essential topic.In our study, we combined multiple data sources to obtain sufficient data while increasing reliability, and applied multiple databases for checking and correction during the analysis.Secondly, the architecture of existing text tools limits their data sources to mainly abstracts or main texts of publications, rendering them susceptible to overlooking data present in supplementary documents, such as some GWAS studies.To address this, we combined the GWAS catalog and CREAM genes based on the text mining list, to compensate for the limitations of the dataset to a certain extent.Meanwhile, although the 14 target genes were strongly correlated with mCNV based on data mining, interaction network analysis, and subsequent literature retrieval, further studies, such as functional studies and experimental validation, are still needed to definitively establish causality.Finally, as a possible alternative to valid animal models, which are currently lacking, this study proposes a cost-effective method of investigation.However, developing valid animal models remains an important future direction to consider.
Furthermore, to our knowledge, there are very few studies on myopia that have utilized textual data mining methods. 59,60e believe that as the data volume expands and algorithms continue to advance, data text mining will have a greater potential for application in related fields of myopia research.

Conclusion
To summarize, in the context of the lack of efficient animal models, our analysis identified a central cluster of 14 genes (IL6, FGF2, MMP9, IL10, TNF, MMP2, HGF, MMP3, IGF1, CCL2, CTNNB1, BDNF, NGF, and EDN1) that may serve as new potential therapeutic targets for mCNV.By constructing a disease-gene-protein visualization network, we were able to analyze the interaction patterns and potential molecular mechanism of these target genes, providing a foundation for the subsequent target screening and drug development.

Compliance with ethics guidelines
This study was conducted in accordance with the ethical standards of the Declaration of Helsinki and its later amendments.

Figure 1 .
Figure 1.Summary of the research path.(1) Text mining: 55 human genes were found with the keywords angiogenesis, choroidal, and myopia by pubmed2ensembl.(2)Enrichment analyses: Biological process and Reactome pathways analysis were performed by g: Profiler, data visualization is completed by Cytoscape, combined with multiple algorithm tools.(3) 14 genes were selected as therapeutic potential targets for mCNV.

Figure 2 .
Figure2.The significance threshold cut-off adjusted p VALUE ＜1.03E-13 was selected to get the most enriched biological process terms.G: SCS correction was performed to reduce false positive results.Further screening was performed combined with algorithm and the existing literature and research, and the results contained 11 biological processes corresponding to 52 genes.

Figure 3 .
Figure 3. Protein-protein interaction network of high confidence score (0.700) by STRING.The network nodes represent the proteins corresponding to the genes, and the edges represent the possible protein interactions.

Figure 4 .
Figure 4.The visualized result of the gene-interaction network by Cytoscape.The network nodes represent the proteins corresponding to the genes, and the edges represent the protein interactions.The node's shading, ranging from light to dark, corresponds to their tscore, reflecting the co-occurrence of gene/protein from text mining data of STRING.Nodes with darker shades indicate stronger connections within the network.

Table 1 .
List of common genes from datamining.