Identification of genes containing expanded purine repeats in the human genome and their apparent protective role against cancer

Purine repeat sequences present in a gene are unique as they have high propensity to form unusual DNA-triple helix structures. Friedreich’s ataxia is the only human disease that is well known to be associated with DNA-triplexes formed by purine repeats. The purpose of this study was to recognize the expanded purine repeats (EPRs) in human genome and find their correlation with cancer pathogenesis. We developed “” algorithm to identify non-overlapping EPRs without pyrimidine interruptions in the human genome and customized for searching repeat lengths, n ≥ 200. A total of 1158 EPRs were identified in the genome which followed Wakeby distribution. Two hundred and ninety-six EPRs were found in geneic regions of 282 genes (EPR-genes). Gene clustering of EPR-genes was done based on their cellular function and a large number of EPR-genes were found to be enzymes/enzyme modulators. Meta-analysis of 282 EPR-genes identified only 63 EPR-genes in association with cancer, mostly in breast, lung, and blood cancers. Protein–protein interaction network analysis of all 282 EPR-genes identified proteins including those in cadherins and VEGF. The two observations, that EPRs can induce mutations under malignant conditions and that identification of some EPR-gene products in vital cell signaling-mediated pathways, together suggest the crucial role of EPRs in carcinogenesis. The new link between EPR-genes and their functionally interacting proteins throws a new dimension in the present understanding of cancer pathogenesis and can help in planning therapeutic strategies. Validation of present results using techniques like NGS is required to establish the role of the EPR genes in cancer pathology.