MURPHREE FESTSCHRIFT: RESEARCH REPORT Identification of novel genes by targeted exome sequencing in Retinoblastoma

Background: Retinoblastoma (RB) is initiated by mutation in both alleles of RB1 gene. However, few cases may occur even in the absence of RB1 mutation suggesting the role of genes other than RB1. Methodology: The current study was planned to utilize targeted exome sequencing in Indian RB patients affected with unilateral non-familial RB. 75 unilateral RB patients below 5 years of age were enrolled. Genomic DNA was extracted from blood and tumor tissue. From peripheral blood DNA, all coding and exon/intron regions were amplified using PCR and direct sequencing. Cases which did not harbor pathogenic variants in peripheral blood DNA were further screened for mutations in their tumor tissue DNA using targeted exome sequencing. Three pathogenicity prediction tools (Mutation Taster, SIFT, and PolyPhen-2) were used to determine the pathogenicity of non-synonymous variations. An in-house bioinformatics pipeline was devised for the mutation screening by targeted exome sequencing. Protein modeling studies were also done to predict the effect of the mutations on the protein structure and function. Results: Using the mentioned approach, we found two novel variants (g.69673_69674insT and g.48373314C>A) in RB1 gene in peripheral blood DNA. We also found novel variants in eight genes (RB1, ACAD11, GPR151, KCNA1, OTOR, SOX30, ARL11, and MYCT1) that may be associated with RB pathogenesis. Conclusion: The present study expands our current knowledge regarding the genomic landscape of RB and also highlights the importance of NGS technologies to detect genes and novel variants that may play an important role in cancer initiation, progression, and prognosis. ARTICLE HISTORY Received 24 February 2022


Introduction
Retinoblastoma (RB, OMIM#180200) is the most common childhood eye cancer which affects children below the age of five years (1). The basic concept behind development of RB tumors was first proposed by Alfred G. Knudson's Two-hit classical hypothesis which states that loss-of-function mutation in both the alleles of RB1 (tumor suppressor gene) is the main cause for RB initiation (2). The worldwide incidence of RB is 1 case in every 15,000-20,000 live births which translates to 9,000 newly diagnosed cases every year. In India, the incidence of RB is high with 1,400-1,800 new cases every year (3). RB occurs in both hereditary and non-hereditary forms. Most of the bilateral cases are heritable (the first RB1 mutation is constitutional and the second RB1 mutation occurs somatically in one or more retinal cells) while some (15%) unilateral cases can also be heritable (4). The non-heritable form of the RB tumors leads to development of unilateral tumors where in both RB1 mutational events occur in a single somatic retinal cell. The mean age of presentation of unilateral and bilateral RB is 27 months and 15 months respectively (4).
While the initiating hit in RB is known for several decades, it is generally accepted that other hits beside mutations in the RB1 gene are required for RB tumors to develop (5). Loss-of-function mutation of RB1 gene gives rise to benign tumors called as retinoma (non-proliferative precursor of RB tumor) which is often associated with lowlevel chromosomal instability that can lead to additional genetic hits and subsequent progression to RB. The genomic instability caused due to RB1 loss-of-function mutation further leads to additional genomic aberrations. Additional genomic alterations in the putative oncogenes (MDM4, KIF14, MYCN, DEK, and E2F3) and potential tumor suppressor gene (CDH11 and NGFR) serve as essential drivers for RB oncogenesis ( Table 1 enlists the genes that are drivers in RB progression) (6). There are numerous studies which have investigated the secondary hits in RB tumors by analyzing gain-of-function, loss-of-function, and copy number variations (CNVs) in putative oncogenes and tumor suppressor genes in the genome of RB patients (6)(7)(8).
Previous studies have shown that RB tumors harbor frequent gains at 1q, 2p and 6p, and loss at 16q chromosomal loci. Furthermore, these investigations also revealed differences in chromosomal instability associated with clinical characteristics such as age at diagnosis, laterality of the tumor, heredity of the disease, and tumor differentiation (9). Next generation sequencing (NGS) is an efficient and time saving approach for molecular genetic testing of complex disorders such as cancer because of its high resolution (100X resolution) as compared to conventional sequencing techniques such as Sanger (2X resolution) (10). In the past decade, NGS has helped to better understand the genomic landscape of RB tumors with much better resolution (11,12). A higher resolution view using the NGS techniques have also unraveled the complex genomic aberrations leading to RB tumors that are required to be carefully validated (8). Furthermore, it helps to understand the complex etiology and molecular complexity of these eye tumors even though most of the RB tumors are initiated by the loss-offunction mutations in the RB1 tumor suppressor gene. Keeping under consideration the important role of NGS technology in better understanding of the genomic landscape of RB will further help to devise better management strategies for early detection, clinical management and prognosis of RB patients. In the present study, we have used targeted exome sequencing approach with an inhouse bioinformatics pipeline for the mutation screening to find out novel changes/variants (SNVs, InDels) associated with RB pathophysiology in Indian patients.

Clinical examination and selection of cases
After receiving ethical approval from the Institute's Ethics Committee (Ref. No. IESC/T-319/23.06.2015), 75 unilateral nonfamilial RB patients presenting at the Dr. R. P. Centre for Ophthalmic Sciences (AIIMS, New Delhi, India) were enrolled in this study. All cases were of Indian origin. The enrolled RB patients had no history of any prior treatment. These patients had no other ocular or systemic abnormalities. All cases with positive family history of RB (maternal/paternal) were also excluded. A total of 75 ethnically and age-matched healthy children without any history of ocular or systemic illness or cancer were enrolled as controls. They had no metabolic, genetic, or ocular disorder and an extensive history was taken regarding family, occupation of parents, any medical problem, and drug intake. Informed consent in accordance with the Declaration of Helsinki was obtained from the parents of all participants and controls.

DNA extraction
Peripheral blood samples (2-3 ml for RB patients and 4-5 ml for parents) were drawn by venipuncture and collected in EDTA vacutainers (Greiner Bio-one, Frickenhausen, Germany) and stored at −80°C until further used. Tissue sample was collected (in PBS) and stored at −80°C until further used. Genomic DNA extraction from peripheral blood samples was carried out using the Phenol Chloroform Isoamyl method (13). The tissue samples were processed for DNA extraction using the QIAamp® Fast DNA Tissue Kit as per the manufacturer's protocol.

PCR and DNA sequencing
All the exons, intron/exon boundaries, 3ʹ UTR and 5ʹ UTR regions of all the 27 exons of RB1 gene were amplified by PCR in peripheral blood DNA samples using oligonucleotide primers directed towards these regions (Primer sequences for the quantification of RB1 gene were given in Table S1). The exonintron regions of all the genes with novel mutations (RB1, ACAD11, GPR151, KCNA1, OTOR, SOX30, ARL11, and MYCT1) were also amplified (The details of the gene primers were described in Table S2). PCR amplifications were performed in a 20 μl volume containing 1.0 μl of 20 mM stock solution for each primer (Integrated DNA Technologies, Inc., 1710 Commercial Park, Coralville, Iowa 52241, USA), 100ng of genomic DNA, 1 unit of Taq polymerase (Bangalore Genei, Bengaluru, Karnataka, India), 0.1 mM of each deoxynucleotide triphosphate (dNTP), and 2 μl of 10X PCR buffer (with 25 mM MgCl2). Amplified PCR products were purified using a gel/ PCR DNA fragments extraction kit (Geneaid Biotech, Sijhih City, Taiwan). Purified PCR products were sent for sequencing to Molecular Cloning Laboratories (South San Francisco, CA). All sequence variants were compared to the Human Genome Reference Sequence (NC_000002.11 and NC_000001.10) provided by the National Center for Biotechnology Information (NCBI) using ClustalW2 (multiple sequence alignment program for DNA; European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK). All PCR products were sequenced in both forward and reverse directions for confirmation of any nucleotide variation.

Library preparation and targeted exome sequencing
Targeted exome sequencing was carried out in tumor DNA in 50 out of 75 RB patients in which no variants were found in the RB1 gene (in the analyzed blood DNA samples). All the libraries were prepared using Agilent HaloPlex Exome target enrichment system by following manufacturer's instruction. Briefly, 250ng of each DNA sample was used for fragmentation using restriction enzyme-based digestion. Digested DNA was subjected to probe hybridization in overnight reaction. Haloplex probe bind and capture exonic region of DNA and was captured based on streptavidin biotin chemistry. Captured exonic region was ligated with illumina specific adapters and amplified as final clinical exome library. All prepared libraries were checked on Agilent High Sensitivity (HS) chip on Bioanalyzer 2100 and quantified on fluorometer by Qubit dsDNA HS Assay kit (Life Technologies). Average size and concentration of each library were calculated from HS chip and Qubit respectively. The detail of the RB patients in which targeted exome sequencing was carried out is outlined in Table S3.

Protein modeling
The three-dimensional crystallographic structure of the novel variants (proteins) is not available in the Protein Data Bank (PDB). Schrödinger Release 2017-1: BioLuminate-2.6, Schrödinger, LLC, New York, NY, 2017 program (14) and Iterative Threading ASSEmbly Refinement (I-TASSER) (15) online server was used for building the homology models for the proteins. The analysis of Phi-Psi stereo-chemical parameters of the homology model were validated by the Ramachandran plot using the PROCHECK online server (http://services.mbi. ucla.edu/PROCHECK/). All the three-dimensional modeled structures were generated by using the PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.

Identification of pathogenic variants
We have used three pathogenicity prediction tools (MutationTaster, Sorting Intolerant From Tolerant (SIFT), and PolyPhen-2) to determine the pathogenicity of nonsynonymous variations. MutationTaster (http://www.muta tiontaster.org/) is a free, web-based application for rapid evaluation of the disease-causing potential of DNA sequence alterations (16). Mutation Taster integrates information from different biomedical databases and uses established analysis tools. SIFT analysis tool (available at http:/sift.icvi.org/) was used to predict the functional impact of missense changes identified in this study (17). SIFT is a multi-step algorithm that uses a sequence homology-based approach to classify amino acid substitutions. PolyPhen-2 is a free web-based application which structurally analyses an amino acid polymorphism and predicts whether that amino acid change is likely to be deleterious to protein function (18). PolyPhen-2 score of >2.0 indicate probably damaging to protein function. Scores of 1.5-2.0 are possibly damaging, and scores of <1.5 are likely benign. The variations were considered pathogenic only when the outcome of two out of three applications (MutationTaster, SIFT, and PolyPhen-2) suggested the variations were disease causing.

Analysis of targeted exome sequencing results
The obtained results for targeted exome sequencing were analyzed as per the analysis pipeline described in Figure 1. FastQ files generated after NGS run were considered for downstream analysis. Quality check step was done for each Fastq file using FastQC and after that adapter sequences were removed using Trimmomatic. The processed FastQ files were mapped over the reference using Burrow wheeler aligner. Genome analysis toolkit 4.0 pipeline for variant call was used. For annotation and functional effect prediction of the identified variants, we used snpEff tool. After that, we found a huge number of variants that require further comprehensive analysis. Stringent cut-off values were used to filter out the nonsignificant variants, e.g., Minor allele frequency ≤0.05, Quality score ≥100, non-synonymous variants, deleteriouseffect, etc. Stringent Along with SnpSift tool unix-based shell script was also used in variants filtering. ANNOVAR software was used for some additional information related to annotation of the probable potential variants. Finally, after all the above analysis using different cut-off criteria, we predicted eight potential variants related to RB that was further subject to PCR-Sanger sequencing for validation. Targeted exome sequencing was carried out in RB patient's enucleated tumor DNA, and later confirmed by PCR-Sanger sequencing in patient's enucleated tumor DNA and peripheral blood DNA. PCR-Sanger sequencing was also carried out in the peripheral blood DNA of RB patient's parents and unaffected siblings to identify the presence of the variants detected by NGS.

Confirmation of the variants obtained on targeted exome sequencing
The variants obtained via the targeted exome sequencing were further confirmed by conventional PCR followed by Sanger's sequencing. The details of the gene primers for validation of the targeted exome sequencing results were described in Table S2.

Participant characteristics
The present study is a case-control study comparing the mutation status of RB1 gene and other putative oncogenes and tumor suppressor genes in unilateral RB patients (cases) (N = 75) and healthy children (controls) (N = 75). All enrolled cases were from North Indian region including states of Delhi, Haryana, Uttar Pradesh, Punjab, Uttarakhand and Bihar. All cases were of Indo-Aryan ethnicity. All cases were unilateral non-familial and were enrolled consecutively as they presented to Dr. R.P. Centre for Ophthalmic Sciences. None of the cases were product of consanguineous marriage and all cases had unilateral RB. The mean age of onset of RB was 2.9 ± 1.8 years.
The cases were categorized into three groups on the basis of their age i.e., Group A (unilateral RB patients <1 year); Group B (unilateral RB patients between 1-3 years); Group C (unilateral RB patients between 3-5 years). 46 cases were males and 29 cases were females. Clinical grouping of the tumors as per the International Classification showed that 62 patients had Group E RB tumors and 13 patients had Group D RB tumors. The details regarding clinicopathologic parameters of RB patients are outlined in Table 2.

RB1
RB1 gene comprises 27 exons dispersed over about 200 kb of genomic DNA and encodes for RB protein (pRb) (a nuclear phosphoprotein having 928 amino acids and size 110 kDa) which plays a negative role in cell cycle regulation and tumor progression (19). RB1 gene harbors a variety of mutations, most common among them are point mutations, large deletions, InDels, duplications, structural variations, mutations in the regions that regulate transcription, inactivation by hypermethylation or even chromothripsis (massive genomic rearrangement) at the RB1 locus may cause gene inactivation (20). Although mutation hot-spots have been identified in the RB1 gene but they account for only 40% of the cases (21). Due to high genetic heterogeneity among RB patients, the molecular genetic testing to detect spectrum of mutations is quite difficult and time-consuming (in RB1 gene).
In the present study, direct sequencing of the coding regions and of the flanking intronic sequences of RB1 gene revealed only two nucleotide variations. The first one was a frameshift mutation (g.69673_69674insT) and the second one was a variant (g.48373314A>C). Out of the two nucleotide variations observed in this study, one (g.69673_69674insT) was found in the coding region (exon 12) and the other (g.48373314A>C) was found in the intronic region (Intron 11-12). Both the nucleotide variations were novel. The mutation (g.69673_69674insT) was present in 1.33% (1/75) patients (RB533/15) resulting in an amino acid change of p.Q383S (Figure 2a). The patient's parents and the unaffected sibling were found to be negative for this mutation. None of the controls harbored this mutation (g.69673_69674insT). The amino acid Q is conserved through various orthologs at position 383 ( Figure 2b). This insertion resulted in a frameshift mutation and subsequent loss of domain A (373-579) in pRb which constitutes one of the essential domains required for binding of E2F transcription factor and thus, regulates G1 to S phase transition during cell cycle progression ( Figure 2c). Therefore, this mutation was found to be pathogenic as it directly affects the pRb function. and imaging studies in the patient revealed unilateral left eye Group E intraocular RB. The patient's eye was enucleated at the age of 2.8 years. Histopathology findings reported poorly differentiated RB where choroid, sclera, iris, ciliary body, and optic nerve including its resected margin were found to be tumor free. The polymorphism, g.48373314A>C was present in a homozygous state in 32% (24/75) patients and has led to a change in poly(A) signal ( Figure 3). In silico analysis of the polymorphism predicted it to be disease causing {Mutation tester: Prediction disease causing; Polyphen-2: N/A; SIFT: N/ A}. 24 RB patients who had this polymorphism had poorly differentiated RB with optic nerve including its resected margin free of tumor. None of the controls had this polymorphism (g.48373314A>C). Pre-mRNA maturation involves the 3′-end cleavage and polyadenylation, which is considered critical for the nucleo-cytoplasmic translocation of the transcript, mRNA transcript stability and translation (22). The variable lengths of the 3′ UTR created by change in the poly(A) signal is a recognizable target for differential regulation and directly affect the fate of the transcript, and thus, ultimately modulates the gene expression (23). Change in poly(A) signal which is an abnormality in the 3′-end processing mechanisms is implicated in many disorders including cancers (24). Change in poly(A) signal of RB1 mRNA which is important for the mRNA stability leads to its degradation and hence, inactivation of pRb (25,26). Since, 50/75 (66.67%) patients did not harbor mutation in the RB1 gene, so we further proceeded with targeted exome sequencing in the tumor DNA in these patients.

Identification of germline SNVs and InDels in RB patients by targeted exome sequencing
Tumor DNA samples of 50 unilateral non-familial RB patients were analyzed to detect SNVs and InDels. Eight pathogenic variants were identified in 26% (13/50) patients of which five are nonsense mutation, one is frameshift mutation, one is non-synonymous mutation and one is single base change (Table 3). 10% (5/50) of the patients had mutations in more than one gene. All the 8 mutations which were obtained by targeted exome sequencing (in the patient's tumor DNA samples) were validated by PCR-Sanger sequencing. PCR-Sanger sequencing was also performed in patient's peripheral blood DNA, patient's parents and unaffected siblings (to determine if the patient is sporadic or heritable for the mutation/change). All the identified mutations were novel and were shown to be sporadic as only the patient had the mutation and not the family members (parents and siblings) and were absent in the blood of the patient making it evident these were not de-novo germ line mutations.

RB1: g.59097C>T (GenBank accession number NG_009009)
The mutation g.59097C>T was present in 10% (5/50) patients and resulted in an amino acid change of R251*, a nonsense mutation in RB1 gene. The mutation (g.59097C>T) was present in 6% (3) patients in a homozygous state (Figure 4a) and in 4% (2) patients in a heterozygous state (Figure 4b). In silico analysis showed this mutation to be pathogenic ( Table 3). The amino acid R is conserved through various orthologs at position 251 (Figure 4c). The protein change and cDNA change associated with the g.59097C>T mutation is p.Arg251X and c.751C>T respectively. Gene network analysis showed the interaction of RB1 with other genes ( Figure S1). Table S4 summarizes the important genes that interact with RB1 and may affect the pRb function.
For protein modeling, the crystal structure of pRb (PDB ID: 4ELJ) was selected as the template model for homology modeling. The sequence identities with the queried protein sequence which was 70% selected as the template for homology modeling (Figure S2.1). The Ramachandran map ( Figure  S2.2) statistics for modeled pRb provided further evidence of the modeled structure showed that 79.8% residues were in the most favored region, 16.7% in the additionally allowed regions, 2.0% in the generously allowed region and 1.4% residues in the disallowed regions (Figure S2.3, S2.4, S2.5).

ACAD11: g.42091G>T (GenBank accession number NG_042830)
The mutation g.42,091G>T was present in 4% (2/50) patients. The change was a homozygous single base change in 2 patients leading to a splice site change (splice donor variant with a consensus value of 82.96) (analyzed by Human Splicing Finder 3.1) ( Figure 5). In silico analysis showed this mutation to be pathogenic (Table 3). This is a novel mutation. Gene network analysis showed the interaction of ACAD11 with other genes ( Figure S3). Table S6 summarizes the essential genes that interact with ACAD11 and directly affects ACAD11 protein function.
The details of the RB patients showing g.42091G>t mutation are outlined in Table S7.
The protein modeling was not possible for g.42091G>T single base change as this change is present in the intronic region.

GPR151: g.374C>A (GenBank accession number NC_000005)
The mutation g.374C>A was present in 2% (1/50) patients and resulted in an amino acid change of Y99*, a nonsense mutation in GPR151 (G Protein-Coupled Receptor 151). The patient was found to be heterozygous for the change (Figure 6a). This is a novel mutation. In silico analysis showed this mutation to be pathogenic ( Table 3). The amino acid Y is conserved through various orthologs at position 99 (Figure 6b). Gene network analysis showed the interaction of GPR151 with other genes ( Figure S5). Table S8 summarizes the important genes that interact with GPR151 and may affect the GPR151 protein function.
The patient (RB41) with g.374C>A (heterozygous) mutation was a male patient diagnosed with RB at the age of 3 years and eye tumor was enucleated at the age of 4 years. The patient (RB41) harbored mutation in two other genes also: ACAD11 (g.42091G>T) and ARL11 (g.2595G>A). The parents or siblings did not harbor this mutation g.374C>A (p.Y99*).  For protein modeling, the amino acid sequence of GPR151 protein (Q8TDV0) was retrieved from UniProt. The GPR151 wild type protein has 419 amino acids. The crystal structure of GPR151 protein (PDB ID: 4ZWJ) was selected as the template model and sequence identities with the queried protein sequence which was 98% selected as the template for homology modeling (Figure S6.1). The Ramachandran map ( Figure  S6.2) shows the statistics for modeled GPR151 protein provided further evidence of the modeled structure showed that 73.2% residues are in the most favored region, 18.3% in the additionally allowed regions, 6.0% in the generously allowed region and 2.5% residues in the disallowed regions ( Figure  S6.3, S6.4, S6.5).

KCNA1: g.2882C>T(GenBank accession number NG_011815)
The mutation g.2882C>T was present in 2% (1/50) patients and resulted in an amino acid change of Q470*, a nonsense mutation in KCNA1 (potassium voltage-gated channel subfamily A member 1). This mutation was found only in one patient and resulted in premature truncation of the protein at amino acid 470. The patient was found to be heterozygous for the change (Figure 7a). In silico analysis showed this mutation to be pathogenic (Table 3). This is a novel mutation and has not been previously reported in RB patients. Multiple sequence alignment of KCNA1 protein with orthologues showed that it is highly conserved and hence, functionally important (Figure 7b). Gene network analysis showed the interaction of KCNA1 with other genes (Figure S7). Table S9 summarizes the important genes that interact with KCNA1 and directly affects the KCNA1 protein function.
The patient (RB26) with g.2882C>T(heterozygous) mutation was a male patient diagnosed with RB at the age of 2.9 years and was enucleated at the age of 4 years. The parents of the patients and unaffected sibling were also found to be negative for the mutation.
For protein modeling, the amino acid sequence of KCNA1 protein (Q09470) was retrieved from UniProt. The KCNA1 wild type protein has 495 amino acid residues in the sequence. The crystal structure of Voltage-gated potassium channel protein (PDB ID: 2R9 R) was selected as the template model and sequence identities with the queried protein sequence which was 87% selected as the template for homology modeling (Figure S8.1). The Ramachandran map (Figure S8.2) statistics for modeled KCNA1 protein provided further evidence of the model structure showed that 87.2% residues were in the most favored region, 9.6% in the additionally allowed regions, 1.8% in the generously allowed region (Figure S8.3,  S8.4, S8.5).

OTOR: g.46T>C (GenBank accession number NC_000020)
The mutation g.46T>C was present in 4% (2/50) patients. This resulted in a codon change from ATG to ACG resulted in an amino acid change from Methionine to Threonine; thus, an initiating Methionine codon is lost in OTOR (Otoraplin) gene. This resulted in start ATG (methionine) codon shifting and hence, delayed start site of translation. Wild type OTOR protein has 128 amino acids but due to this change, a truncated protein of 19 amino acids is formed. This is a non-synonymous mutation. In silico analysis showed this mutation to be pathogenic (Table 3). This change was found in two patients (RB11 and RB25) and both the patients were heterozygous for this change (Figure 8a). Multiple sequence alignment of OTOR protein with orthologues showed that it is highly conserved and hence, functionally important (Figure 8b). This change (rs17686437) is a novel mutation had not been previously reported in RB patients. Gene network analysis showed the interaction of OTOR with other genes ( Figure  S9). Table S10 summarizes the important genes that interact with OTOR and directly affects the OTOR protein function.
The parents or siblings did not harbor this variant. The protein modeling was not possible for g.46T>C mutation because wild type OTOR protein has 128 amino acids but due to this change, a truncated protein of 19 amino acids is formed. The number of amino acid residues formed after the change were very less, hence modeling the mutant protein was not possible in this case (minimum 25 amino acid residues are required for modelling a protein structure).

SOX30: g.20297C>T (GenBank accession number NC_000005)
The mutation g.20297C>T was present in 2% (1/50) patients and resulted in an amino acid change of R413*, a nonsense mutation in SOX30 (SRY-box transcription factor 30). This is a novel mutation and had not been previously reported in RB patients. The only patient (RB26) showing this mutation (g.20297C>T) was found to be heterozygous for the change (Figure 9a). In silico analysis showed this mutation to be pathogenic (Table 3). The amino acid R is conserved through various orthologs at position 413 (Figure 9b). Gene network analysis showed the interaction of SOX30 with other genes ( Figure S10). Table S12 summarizes the important genes that interact with SOX30 and may affect the SOX30 protein function. The patient (RB26) with g.20297C>T(heterozygous) mutation also harbors mutation in KCNA1 (g.2882C>T). As already discussed for the patient (RB26) in context of g.2882C>T mutation, the histopathologic risk features were found to be severe in the patient as compared to the patients that didn't have this mutation (g.20297C>T). Both these mutations g.20297C>T and g.2882C>Tare might be associated with a high RB risk leading to poor histopathological risk factors. The parents of the patients and unaffected sibling were also found to be negative for the mutation.
For protein modeling, the amino acid sequence of SOX30 protein(O94993) was retrieved from UniProt. The SOX30 wild type protein has 753 amino acids. The crystal structure of coatomer subunit gamma-1 protein (PDB ID: 5A1 U) was selected as the template model and the sequence identities with the queried protein sequence which was 96% selected as the template for homology modeling (Figure S11.1). The Ramachandran map (Figure S11.2) statistics for modeled SOX30 protein provided further evidence of the modeled structure showed that 48.8% residues were in the most favored region, 43.0% in the additionally allowed regions, and 4.6% in the generously allowed region and 3.6% residues in the disallowed region (Figure S11.3, S11.4, S11.5).

ARL11: g.2595G>A (GenBank accession number NG_021342)
The mutation g.2595G>A was present in 8% (4/50) patients and resulted in an amino acid change of W149*, a nonsense mutation in ARL11 (ADP Ribosylation Factor Like GTPase 11) protein. This is a novel mutation (rs34301344) had not been previously reported in RB patients. In silico analysis showed this mutation to be pathogenic ( Table 3). The amino acid W is conserved through various orthologs at position 149 ( Figure 10). None of the controls had this mutation (g.2595 G>A). The mutation was not confirmed by PCR-Sanger sequencing. Gene network analysis showed the interaction of ARL11with other genes ( Figure S12). Table S13 summarizes the important genes that interact with ARL11and may affect the ARL11 protein function. The details of the RB patients showing g.2595 G>A mutation is outlined in Table S14. The g.2595G>A mutation was present in 4 patients. No significant genotype-phenotype correlation of the mutation g.2595 G>A was observed in the present study but one of the patients (RB41) with g. 42091G>T mutation also have mutation in two other genes viz. ACAD11 (g.42091G>T) and GPR151 (g.374C>A) and the other patient (RB26) harbors mutation in two genes viz. SOX30 (g.24694C>T) and KCNA1 (g.2882C>T). The parents or siblings did not harbor this variant.
For protein modeling, the amino acid sequence of ARL11 protein (Q969Q4) was retrieved from UniProt. The ARL11 wild type protein has 196 amino acids. The crystal structure of GTP binding protein (PDB ID: 6A8D) was selected as the template model and sequence identities with the queried protein sequence which was 90% selected as the template for homology modelling (Figure S13.1). The Ramachandran map ( Figure S13.2) statistics for modelled ARL11 protein provided further evidence of the model structure showed that 86.3% residues were in the most favored region, 11.3% in the additionally allowed regions, 1.8% in the generously allowed region and 0.6% residues in the disallowed regions ( Figure  S13.3, S13.4, S13.5).

MYCT1: g.71_74delAGAT (GenBank accession number NC_000006)
The mutation g.71_74delAGAT was present in 8% patients (4/ 50) patients. Deletion of four nucleotides (AGAT) was observed at genomic position g.71_74, coding nucleotide number c.63_66 in four patients (RB22, RB33, RB38, RB42). This caused a frameshift after 22 nd codon and introduced a stop codon (TAG) at 55 th position and D22Efs *34. Wild type MYCT1 (MYC target 1) protein has 235 amino acids but this frameshift mutation has created a truncated MYCT1 protein of 53 amino acids. In silico analysis showed this mutation to be pathogenic (Table 3). This mutation (rs3841162) had not been previously reported in RB patients. The amino acid D is conserved through various orthologs (Figure 11). Gene network analysis showed the interaction of MYCT1 with other genes ( Figure S14). Table S15 summarizes the important genes that interact with MYCT1 and directly affects the MYCT1 protein function.
The details of the RB patients showing g.71_74delAGAT mutation is outlined in Table S16. The g.71_74delAGAT mutation was present in 4 patients, and all of them were males. All these patients had PDRB, poor histopathological risk factors (such as calcification and necrosis) and hence, very poor prognosis. All of these patients showed early age of RB onset as compared to the patients who didn't have this mutation. 1 (RB33) out of 4 patients who had g.71_74delAGAT mutation also harbors mutations in RB1 gene (g.59097C>T). We can conclude that g.71_74delAGAT mutation is associated with an early onset of RB and leads to poor histopathological risk factors. The parents and unaffected siblings did not harbor this change in MYCT1 gene.
For protein modeling, the amino acid sequence of MYCT1 (Q8N699) was retrieved from UniProt. The MYCT1 wild type protein has 235 amino acid residues. The crystal structure of human mARC1 protein (PDB ID: 6JW2) was selected as the template model and sequence identities with the queried protein sequence which was 48% selected as the template for homology modeling (Figure S15.1). The Ramachandran map ( Figure S15.2) statistics for modeled MYCT1 protein provided further evidence of the model structure showed that 54.8% residues were in the most favored region, 39.4% in the additionally allowed regions, and 2.9% in the generously allowed region and 2.9% residues in the disallowed regions ( Figure  S15.3, S15.4, S15.5).

Discussion
RB is the most common pediatric ocular malignancy that initiates from the developing retina during early childhood. Although much progress has been made in the last decade regarding RB diagnosis and management, still there is an urgent need to further improve diagnostics and patient care. The worldwide incidence of RB (1 case in every 15,000-20,000 live births) largely impacts the health care system of a country (4). The heritable nature of RB tumors and high risk of susceptibility to secondary malignancies is associated with the need for lifelong follow up which makes genetic testing and family counselling an important aspect of RB clinical management practices. Early diagnosis of the tumor (which includes genetic testing to determine the nature of the tumor: heritable/non-heritable) followed by treatment can reduce morbidity whereas late diagnosis and delay in treatment can lead to tumor spread from the eye and reduce the chances of survival (27). Knudson's two hit hypothesis was the fundamental idea describing the initiation of RB tumors and has  described the RB1 as the prototype of the first identified tumor suppressor gene and the key player in RB initiation (2,28). Mutation in the RB1 gene was considered as the main cause for initiation of all the RB tumors until 2013 when Gallie and colleagues first reported a previously unrecognized form of unilateral RB tumors (RB1 -/-/MYCN A ) with high level MYCN oncogene amplification (28-121 copies) in the tumor tissue, fully functional Rb protein (pRb), fewer overall genomic copy-number changes in genes characteristic of RB than did RB1 −/− tumors, distinct aggressive histological features, and a very early median age of RB diagnosis (4.5 months) (29). This study challenged the dogma proposed by Knudson that this tumor is always initiated by loss-of-function mutation in RB1 gene and thus, postulated that RB tumors can also be initiated by amplification of the MYCN oncogene in the presence of non-mutated RB1 gene (29). RB represents heterogenous group of tumors at cellular, molecular and genetic levels (30). RB1 gene harbors a variety of mutations and most common among them are point mutations, large deletions, small InDels, duplications, structural variations, mutations in the regions that regulate transcription, inactivation by hypermethylation or even chromothripsis (massive genomic rearrangement) at the RB1 locus that may cause gene inactivation (19). Although mutation hot-spots have been identified in the RB1 gene but they account for only 40% of the cases (21). RB tumors show genetic heterogeneity and hence, the molecular genetic testing to detect spectrum of mutations is quite difficult and time-consuming.
While several candidate oncogenes and tumor suppressor genes have been identified to have a role in RB oncogenesis, no targeted or clinical therapeutic interventions have yet been implicated in the treatment of these tumors. This stresses the need for further investigation of the genetic abnormalities in these tumors in addition to making genetic testing essential while treating the patients with RB tumors in order to determine the nature of the tumors: heritable/non-heritable (sporadic). India has one of the largest numbers of RB cases which makes genetic testing at low cost and rapid mutation screening an essential requirement (31,32).
Devarajan et al., have reported for the first time the efficiency of NGS technology in the detection of pathogenic variations in RB1 gene with high accuracy and in a time efficient way (33). The authors have utilized an in-house developed pipeline for the detection of heterogeneous spectrum of variants in RB1 gene including SNVs, InDels and CNVs in an Indian subset of RB patients. Therefore, in the light of the above-mentioned findings, molecular genetic testing has become very much essential as a part of routine clinical management of RB so that these molecular/genetic markers could be established as signatures for RB diagnosis and may further help in the management.
In RB1 gene, nonsense mutations constitute majority of the (50%) reported point mutations in RB patients till date (34). In the present study, we found nonsense mutation in RB1 gene in 5 RB patients (10%) by targeted exome sequencing. These mutations are severe and there is a loss-of-function of pRb due to loss of critical domain and thus, impaired regulation of cell cycle progression which may leads to RB and predisposes to other secondary malignancies later in life. We have found two patients heterozygous for the g.59097C>T mutation which indicates that one RB1 allele has got mutated (inactivated) and as RB is now considered as an epigenetic disorder rather than genetic (35) therefore, there is a possibility that other RB1 allele (wild type) may be hypermethylated thus, leading to its reduced expression and thus, may initiate RB or it may be possible that the wild type RB1 allele is not enough to perform the normal function thus, leading to RB (36).
ACAD11 gene encodes for acyl-CoA dehydrogenase. Combined overexpression of ACAD11 and other p53 target genes involved in fatty acid metabolism did not increase survival upon glucose starvation, suggesting that these specific genes may function in a redundant fashion and that genes in other metabolic pathways may also be important for mediating the full pro-survival effects of p53 (37). g.42091G>T mutation (rs41272317) was reported by Yamada et al. in chronic kidney disease or hyperuricemia in Japanese population (38) but has not been previously reported in RB patients. The single base change g.42091G>T which we found in our study at 11-12 intron of ACAD11 gene directly affects the process of splicing. The splicing is itself controlled by the given sequences known as splice-donor and splice-acceptor sequences. Variations in these sequences may lead to retention of large segments of intronic DNA by the mRNA, or to entire exons being spliced out of the mRNA. These changes could result in production of a non-functional protein. The essential domain of ACAD11 protein which might get affected due to single base change (g.42091G>T) have been shown in Figure S4. ACAD11 protein mediates the pro-survival function of p53 gene which has an important role as a tumor suppressor. p53 gets activated in response to cellular stress signals (DNA damage, hypoxia, oxidative stress) and mediates diverse functions in the cell such as apoptosis, senescence, and cell cycle arrest (39). In response to inactive ACAD11 protein, the activation of p53 might get affected leading to inactivation of the p53/MDM2/ p14ARF pathway which might lead to progression of RB tumors (40). Thus, we can explain an essential but indirect role of ACAD11 protein in RB tumor progression which is mediated through the action of p53 inactivation but further studies are required to determine the exact role of ACAD11 in RB progression.
GPR151 encodes an orphan member of the class A rhodopsin-like family of G-protein-coupled receptors (GPCRs). A general feature of GPCR signaling is the agonistinduced conformational change in the receptor, leading to activation of the heterotrimeric G protein (41). The activated G protein then binds to and activates numerous downstream effector proteins, which generate secondary messengers that regulate a broad range of cellular and physiological processes (42). Role of GPR151 is not elucidated in RB tumors yet but implicated in breast cancer (mediated by RPS7 gene) (43) and prostate cancer (mediated by ENO2 gene) (44). GPCRs have recently emerged as key molecules playing essential role during the processes of angiogenesis, tumorigenesis, and metastasis and have important implication in breast cancer progression and thus, poor prognosis (45). Therefore, we predicted that g.374C>A nonsense mutation in the GPCR family member (GPR151) generates inactivated GPR151 protein which may affect the essential cellular processes leading to progression and metastasis of RB tumors. Further studies are required in this direction to decipher the direct role of GPR151 in RB pathogenesis.
KCNA1 gene encodes for KCNA1 protein which is a voltage-gated potassium channel that is phylogenetically related to the Drosophila Shaker channel. In mammals, KCNA1 contributes to the regulation of the membrane potential and nerve signaling, and prevents neuronal hyperexcitability (46). KCNA1 expression is required for normal postnatal brain development and normal proliferation of neuronal precursor cells in the brain (47). KCNA1 expression is found to be reduced in human cancers which is directly correlated with an increase in breast cancer aggressiveness. In our study, KCNA1 loss-of-function (by nonsense mutation) may lead to escape of the mutant cells (harboring RB1 mutation) from oncogene induced senescence thus, leading to unlimited proliferation of the mutant cells. pRb mediates its action through limiting the ribosomal read-through during the process of oncogene induced senescence (48). However, further exploratory studies are required to be carried out to determine the exact role of KCNA1 in RB pathogenesis. The previously reported variant in the same position in KCNA1 gene was a missense variant (rs867107765) where a single nucleotide (C) was replaced by (A) which resulted in a codon change from CAG to AAG resulted in replacing amino acid glutamine with lysine. The clinical significance of this missense variant has not been reported in the literature. The role of KCNA1 in RB pathogenesis has not been reported so far.
OTOR gene comprises 4 exons. A frequent polymorphism in the translation start codon of this gene can abolish translation and may be associated with deafness (49). The direct role of OTOR gene has not been described in any cancer but indirectly it affects the functions of other genes that have important implication in other cancers. For example, OTOR regulates the function of ATP5J whose over-expression is associated with colorectal cancers (50); OTOR mediates the function of ENPP4 that has important role in regulating cell motility, angiogenesis, and tumor cell invasion (51); and OTOR also regulates the expression of GPR112 which is a novel therapeutic biomarker for gastrointestinal neuroendocrine carcinomas. How OTOR gene contributes to RB pathogenesis is still not known. Therefore, functional studies are required to determine the role of OTOR gene in RB development.
SOX30 belongs to the family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein acts as a transcriptional regulator when present in a complex with other proteins. It can activate p53 transcription to promote tumor cell apoptosis in lung cancer (52). SOX30 may be involved in the differentiation of developing male germ cells (53). Alternative splicing of this gene results in multiple transcript variants. The direct role of SOX30 gene has not been implicated in RB tumors. SOX30 plays an important role in spermatogenesis and regulate tumor angiogenesis and metastasis (mediated through TSGA10) (54); SOX30 regulates zinc finger protein (ZNF274) that has a role in cancer progression (55); SOX30 mediates the function of NUDT3 that has an important role against elevated levels of 8-oxo-dGTP that leads to AT→CG transversions (56). Although direct role of SOX30 has not been elucidated in RB tumors, but we may predict that SOX30 mediates it functions through other downstream genes (TSGA10, ZNF274, NUDT3, and POU5F) which regulates tumor angiogenesis, progression and oxidative DNA damage. Inactivation of SOX30 (by nonsense mutation) directly affects the important cancer regulatory pathways thus, may contribute to RB. However, functional studies are warranted to elucidate the direct role of SOX30 in RB pathogenesis.
ARL11 encodes for a tumor suppressor gene related to the ADP-ribosylation factor (ARF) family of proteins (57). ARL11 protein plays an essential role in apoptosis in a caspasedependent manner. Polymorphisms in ARL11 are found to be associated with some forms of familial cancers (58). Downregulation of ARL11 expression in several sporadic lung cancer and ovarian tumors attributed to promoter methylation and loss of heterozygosity (LOH) at the ARL11 locus (57). Although the direct role of ARL11 mutation has not been described in RB pathogenesis, but in our study, we have found ARL11: g.2595G>A (nonsense mutation) in 4 RB patients which showed a significant contribution of the lossof-function of tumor suppressor genes present in the 13q14.2 (location of RB1 tumor suppressor gene also) region. The actual mechanism of how ARL11 loss-of-function might contribute to RB progression is not known. But we might predict that ARL11 mediates its function through regulation of its downstream target genes (ADAM17, HIF1AN, IL17RD, and ALDH3A2) that have important role on tumorigenesis regulation. Hence, further extensive studies are required to determine the role of other genes present in the 13q14.2 locus so as to determine their essential functions in RB pathogenesis.
MYCT1 encodes for MYCT1 protein which is a direct target gene of c-MYC which is a novel candidate tumor suppressor gene first cloned from laryngeal squamous cell carcinoma (59). The down regulation or loss-of-function of MYCT1 is found to be associated with certain cancers whereas its overexpression seems to mediate many of the known phenotypic features associated with c-MYC such as initiation of apoptotic process, alteration of morphology, enhancement of anchorageindependent growth, tumorigenic conversion, promotion of genomic instability, and inhibition of hematopoietic differentiation (60). In the present study, we didn't find any mutation in MYCN oncogene that acts as a transcription factor and regulates the expression of cell cycle genes that regulates proliferation (61). MYCN gene is found to be frequently amplified in tumors of neuroectodermal origin such as neuroblastoma, RB, glioblastoma, medulloblastoma, rhabdomyosarcoma and small cell lung carcinoma, and is associated with poor prognosis and poor response to administered therapy in neuroblastoma (61-63). Rushlow et al. described that 1.5% of unilateral RB tumors occur due to somatic MYCN oncogene in the presence of non-mutated RB1 gene (29). These tumors showed aberrant expression of many genes involved in photoreceptor differentiation, mitosis and RNA biogenesis. When compared to RB1 -/tumors, RB1 +/+ /MYCN A tumors are distinct outliers in levels of expression of many other genes consistent with the idea that they originate from an earlier retinal precursor than RB1 -/tumor does (29,64). Therefore, we may conclude that this new subset of RB tumors challenges the central dogma proposed by Alfred G. Knudson that this tumor is always caused by loss-of-function mutation in both the alleles of RB1 tumor suppressor gene. In the present study, we did not find any mutation in MYCN oncogene; this is may be due to small sample size and needs to be confirmed in population of different ethnicities and in larger populations. But rather we found a frameshift mutation in MYCT1 (a direct target gene of c-Myc), a novel tumor suppressor gene whose downregulation is associated with carcinogenesis and over expression leads to inhibition of proliferation and induces apoptosis in human acute myeloid leukemia (60). In our study, we found MYCT1: g.71_74delAGAT frameshift mutation leading to a truncated MYCT1 protein of 53 amino acids (found in 4 RB patients) and thus, behaves as a loss-of-function mutation. From this, we may conclude c-Myc mediates its action through down-regulation of its downstream target MYCT1 leading to its inactivation which leads to proliferation of RB tumors and may leads to metastasis and poor prognosis. Further studies are required to validate the exact role of MYCT1 gene in RB.
In a previous study published from our laboratory, we documented that sporadic heritable RB may arise due to sperm oxidative DNA damage (65). Oxidative DNA adducts like 8-hydroxy-2'-deoxyguanosine are highly promutagenic and predispose to mutation and epimutations. It leads to de novo germ line mutations and persistence of these mutations postfertilization leads to higher genetic and epigenetic disease load in the offspring (65,66). In many cases, we were unable to detect any pathogenic variant and these cases may have hypermethylation of the RB1 gene promoter region as it has been documented in previous studies that RB is an epigenetic disease (67). Our previous study on the same subjects documented that father of these cases had high levels of oxidative DNA damage and sperm DNA fragmentation (68). This is known to cause genome wide hypomethylation and loci-specific hypermethylation (69). Previous studies have documented that tumor suppressor genes may be hypermethylated and thus, cases where no pathogenic variants were found need to be analyzed for aberrant DNA methylation marks (70). In the same cohort of patients, we found high levels of seminal oxidative stress which may lead to non-familial sporadic heritable RB (66). On targeted exome sequencing in tumor tissue, we found 8 novel variants in 13 out of 50 (26%) RB patients. The present study describes that targeted exome analysis is an important genetic testing to determine novel variants and may help to determine the therapy to be administered to RB patients. Given that RB is the most common eye malignancy in children below five years of age and represents 2.5-4% of all pediatric malignancies, future directions for RB management points towards the use of genetic testing systems in particularly NGS technologies to detect novel variants and their role in the RB initiation, progression, metastasis and poor prognosis in response to administered therapies. The main barriers in achieving efficient detection of RB1 mutations by PCR-Sanger sequencing are many, few of which include: large size of the gene, presence of mosaicism, and possibility of mutations within non-coding regions that are not routinely screened. There is much difference in RB1 gene mutation detection rate by routine PCR-Sanger sequencing and NGS techniques as reported by various studies. Molecular genetic testing will help to distinguish between heritable and non-heritable RB tumors. This knowledge may further make surveillance, management and treatment of the affected patient and related families possible, and also eliminate the risk of unnecessary clinical surveillance in cases with non-heritable RB. This may further help to lower down the burden on the healthcare system. For this, two tools are of utmost importance: Exhaustive molecular genetic testing and prompt and accurate genetic counselling to deliver the exact information to the caregivers of the affected patient. Furthermore, extensive in vivo studies are necessary to determine the role of pathogenic variants so as to determine exact therapy to the affected patients and also to translate the findings of molecular genetic testing from Bench-to-Bedside.
The first report of utilizing exome sequencing in RB patients was published by Kooi et al. In this study, the authors performed exome sequencing in 71 RB and matched blood DNA samples and found that aside from RB1 gene mutation, the other recurrent gene mutation is a rare event. In the RB samples that the authors analyzed in their study, they found alterations in two genes: BCOR in 10% cases and CREBBP in 4% cases (71). In the present study, we utilized targeted exome sequencing approach and identified novel genes (ACAD11, GPR151, KCNA1, OTOR, SOX30, ARL11, and MYCT1) associated with the pathogenesis of unilateral non-familial RB tumors. This study demonstrated the role of genes (ACAD11, GPR151, KCNA1, OTOR, SOX30, ARL11, and MYCT1) other than RB1 in RB pathogenesis. Functional validation is required to confirm their pathogenicity. None of the mutations in these genes have been reported earlier in RB patients, which highlighted that we need to look beyond the RB1 gene to investigate the pathogenesis of this ocular malignancy. Our study and study by Kooi et al. have pointed towards the importance of exome sequencing techniques over whole genome sequencing in terms of cost, time and space (71). Exome sequencing targets only a subset (protein-coding region) of the genome and thus, prove to be a valuable tool to find variants that could explain disease etiology and biology of the disease. In case of RB, the clinical screening of potential genetic causes can easily be done using exome sequencing approach and thus, exome sequencing holds an important future for diagnostics, prognostics and clinical management of RB cases as it could help to identify novel variants in RB1 and also novel variants beyond RB1.
The most common changes in RB1 gene were the nonsense mutations as documented by previous studies (Nonsense mutations contribute about 50% of pathogenic variants in RB1). 50.67% cases harbored no mutations. High levels of oxidative damage in sperm may predispose to accumulation of oxidative DNA adducts, which not only predisposes to de novo germ line mutations but also to post-zygotic mutations and epimutations. These cases may have hypermethylated promoter of RB1 gene, thus, there is a need to do epigenetic studies as well.
In our study, we have found pathogenic RB1 gene mutation (by PCR-Sanger sequencing and targeted exome sequencing) in 40% (30/75) of cases. Therefore, the exact mechanism about how this eye tumor is initiated in the presence of non-mutated RB1 gene still remains elusive. This also highlights the need to study the promoter region for hypermethylation and thus, the reduced expression of this gene in absence of mutation in this gene. The sample size in our study was small (N = 75) and included only the unilateral non-familial RB patients, limiting its generalizability to other RB tumor groups. The other limitation is absence of functional studies for novel non-synonymous variants found in the study.