Advantage of Whole Exome Sequencing over Allele-Specific and Targeted Segment Sequencing in Detection of Novel TULP1 Mutation in Leber Congenital Amaurosis.

Abstract Background: Leber congenital amaurosis (LCA) is a severe form of retinal dystrophy with marked underlying genetic heterogeneity. Until recently, allele-specific assays and Sanger sequencing of targeted segments were the only available approaches for attempted genetic diagnosis in this condition. A broader next-generation sequencing (NGS) strategy, such as whole exome sequencing, provides an improved molecular genetic diagnostic capacity for patients with these conditions. Materials and Methods: In a child with LCA, an allele-specific assay analyzing 135 known LCA-causing variations, followed by targeted segment sequencing of 61 regions in 14 causative genes was performed. Subsequently, exome sequencing was undertaken in the proband, unaffected consanguineous parents and two unaffected siblings. Bioinformatic analysis used two independent pipelines, BWA-GATK and SOAP, followed by Annovar and SnpEff to annotate the variants. Results: No disease-causing variants were found using the allele-specific or targeted segment Sanger sequencing assays. Analysis of variants in the exome sequence data revealed a novel homozygous nonsense mutation (c.1081C > T, p.Arg361*) in TULP1, a gene with roles in photoreceptor function where mutations were previously shown to cause LCA and retinitis pigmentosa. The identified homozygous variant was the top candidate using both bioinformatic pipelines. Conclusions: This study highlights the value of the broad sequencing strategy of exome sequencing for disease gene identification in LCA, over other existing methods. NGS is particularly beneficial in LCA where there are a large number of causative disease genes, few distinguishing clinical features for precise candidate disease gene selection, and few mutation hotspots in any of the known disease genes.


INTRODUCTION
Leber congenital amaurosis (LCA) is a severe form of retinal degeneration with onset in the first 12 months of life. 1 It is usually inherited in an autosomal recessive manner, and may be associated with other clinical features including photophobia, nystagmus, strabismus and the oculo-digital phenomenon. It is markedly genetically heterogeneous with mutations in at least 21 genes identified including: AIPL1, ALMS1, CABP4, CEP290, CNGA3, CRB1, CRX, GUCY2D, IMPDH1, IQCB1, KCNJ13, LCA5, LRAT, NMNAT1, OTX2, RD3, RDH12, RPE65, RPGRIP1, SPATA7, TULP1. 1,2 The mutations in these genes account for about 60$70% of LCA cases. 3 However, there are few particular clinical features which direct genetic investigation precisely to any one of these genes, and there are no specific region/s of the genes which provide genetic answers in a large majority of the cases. Clinical trials are in progress for gene therapy for RPE65-related retinal degeneration. Some patients, especially those with IQCB1 and CEP290 mutations are at risk for the subsequent development of renal impairment. These factors, along with the need for accurate genetic information for families, highlight the value of genetic diagnosis for patients with LCA.
Clinical investigation for disease gene identification in LCA to date, has relied on chip-based approaches for particular known variants 4 or examination of regions from a particularly selected group of genes, 5 owing to the high costs of geneby-gene Sanger sequencing. While these approaches have provided answers to some patients, they have not been able to identify the causative change for the majority of patients affected with this disorder.
Disease genes underlying LCA are critical in a number of molecular events for normal photoreceptor function in the retina, including cycling of vitamin A derivatives (LRAT, RPE65, RDH12), phototransduction (GUCY2D), and transport of cargo between the inner and outer segments of the photoreceptors (CEP290, RPGRIP1). 3 TULP1, a member of the tubby gene family, has roles in photoreceptor function including contribution to trafficking of rhodopsin from the inner to outer segments, normal photoreceptor synapse formation and photoreceptor cell survival. [6][7][8] Mutations in TULP1 underlie LCA in approximately 1-2% of cases, and are also found in cases of severe or early-onset autosomal recessive retinitis pigmentosa (RP). [9][10][11] We undertook three different strategies to identify causative mutation/s in LCA in an affected proband. The first two, an allele-specific assay and targeted Sanger sequencing of particular LCA disease-gene segments, were unsuccessful. However, our nextgeneration sequencing (NGS) strategy of whole exome sequencing identified the disease-causing variant, indicating the value of this broader sequencing approach for genetic diagnosis in LCA. This led to the identification of a novel nonsense mutation in TULP1, predicted to cause deletion of much of the highly conserved C-terminal domain of the protein.

Clinical Evaluation
The proband and family members underwent full ophthalmic examination. Genomic DNA was isolated from leukocytes of peripheral venous blood. All experiments were performed in accordance with the ethical tenets of The Children's Hospital at Westmead, Sydney, Australia.
The proband in this family, diagnosed with Leber congenital amaurosis, was the third child to first cousin parents of Italian ethnicity ( Figure 1A). Problems with her vision were first noted at the age of 6 months when she had nystagmus and head shaking. At 18 months of age, her parents reported that she only seemed able to see large objects and had particular problems with her vision at night. Electroretinogram at age 3 years 8 months revealed flat photopic and scotopic responses, and this was confirmed on repeat examinations at ages 5 and 7 years ( Figure 1B). At age 8 years, her vision was recorded as 6/60 (right eye) and 6/60 + 1 (left eye) using the Snellen chart. She had a fine high frequency, low amplitude vertical nystagmus in the primary position, which converted to a more exaggerated horizontal nystagmus in side gaze. Cycloplegic refraction revealed a low and normal degree of hypermetropia with some moderate astigmatism (+0.5/ +1.75 Â 120 in the right eye and +1.25/+1.75 Â 60 in the left eye). Fundal examination showed slight attenuation of the retinal arterioles, loss of both foveal and ring reflexes of the maculae and a diffuse abnormal retinal sheen with some pigment stippling inferiorly in the left eye. She was an otherwise well child, with normal hearing and intellectual development. Her two older brothers and parents did not have any features consistent with Leber congenital amaurosis.
In this family, the available clinical diagnostic testing was undertaken in the proband at the time of her initial investigation, using an allele-specific assay of 135 of the most common LCA-causing variations, followed by Sanger sequencing of 61 regions in 14 causative genes in LCA, AIPL1, CEP290, CRB1, CRX, GUCY2D, IQCB1, LCA5, LRAT, RD3, RDH12, RPE65, RPGRIP1, SPATA7, and TULP1 (John and Marcia Carver Laboratory, University of Iowa, US). This did not lead to the identification of any disease-causing variations. To maximize our capacity for novel variant detection in this patient, we then undertook an exome sequencing strategy.

Whole Exome Sequencing and Bioinformatics
The exome capture was performed using Agilent SureSelect Human All Exon kit (Agilent Technologies, Santa Clara, CA, USA), following the manufacturer's protocols. Subsequent sequencing was performed on Illumina HiSeq 2000 machines using standard pair-end read sequencing protocol (Illumina, San Diego, CA, USA) to generate up to 90 cycles. Default settings were used in the Illumina pipeline to call bases from raw images, resulting in generation of raw sequencing reads in the format of fastq files. We subsequently applied two independent analysis methods to perform alignment, variant calling and annotation. Pipeline 1) fastq files were aligned to the human reference genome (UCSC hg19) with the Burrows-Wheeler alignment (BWA). 12 Genome Analysis Tool Kit (version 1.4) 13 was used to call variants following by Annovar 14 and SnpEff 15 to perform functional annotation of the variants. Pipeline 2) fastq files were aligned to the human reference genome (UCSC hg19) with the Short Oligonucleotide Analysis Package (SOAP, version 2.21). 16 SOAPsnp (version 1.05) 16 was used for single nucleotide variant (SNV) detection, and GATK for small insertion-deletion (indel) detection, following by BGI's self-developed programs to perform variant functional annotation.

Multiple Filtering Steps and Mendelian Genetic Analysis
Assuming a recessive inheritance model a series of filtering criteria were applied for the variants in both pipelines. We excluded variants that were: (1) out of exonic regions, (2) synonymous changes, and (3) with minor allele frequency (MAF)40.5% in either the 1000 Genomes Project (http://www.1000genomes.org/), the Exome Sequencing Project (ESP6500; http:// evs.gs.washington.edu/EVS/), or our internal exome datasets. Variants near splicing donor/recipient sites and frameshift indels were given particular attention as they could cause pathogenic changes like exonskipping or frameshifts. A compound heterozygous model of inheritance was considered as well. The variants found to be homozygous in the proband,  heterozygous in the unaffected parents, and not in homozygous state in both unaffected siblings were considered for further analyses. Subsequent prioritization steps included filtering out of variants based on (1) evolutionary conservation, i.e. variants of PhyloP 17 value 50.95 were considered to be in nonconserved regions thus discarded, (2) prediction of pathogenicity by PolyPhen 18 and SIFT 19 , and (3) biological and clinical relevance of identified variants with emphasis on pathways and interaction networks of known genes and/or proteins pertinent to retinal disease.

Sanger Sequencing as Validation
Validation of the mutation was performed by Sanger sequencing in all members of the family with primers designed to the relevant region of TULP1 (Reference sequence: NM_003322): forward primer is 5 0 -CTGATT TCTCCCTGCAGCTCAC-3 0 , and reverse primer is 5 0 -CTATGTACATCAAAGCGAGAGGC-3 0 .

Exome Sequencing and Variant Filtering
Since the clinically available diagnostic testing of an allele specific assay of 135 LCA variants and Sanger sequencing of 61 regions of 14 LCA genes had not revealed the causative mutation/s in this patient, we proceeded to whole exome sequencing. This was undertaken in all five family members ( Figure 1A), with each individual's exome covered by at least 55 times (Supplementary Table 1 -online only). For each of the exomes, a total of $50,000 SNVs and $9000 indels were called by pipeline 1, while $60,000 SNVs and $5000 indels were called by pipeline 2. Variants were restricted by their occurrence in coding regions or at splicing sites, reducing SNVs to $18,000 and indels to 5770 in pipeline 1, and $15,000 SNVs and 5700 indels in pipeline 2 (Supplementary Table 1). Considering a recessive model of inheritance, the mutation should be in the scenario: (1) homozygous state in the proband, (2) heterozygous state in both unaffected parents, and (3) not homozygous state in two unaffected siblings. We retained only rare (MAF 0.5% in 1000 Genomes Project, the ESP6500 and CAG/BGI's internal exome datasets) nonsynonymous variants. We implemented the strategy using two independent methods of exome-wide analysis, and obtained four variants based on pipeline 1, while six variants were identified based on pipeline 2 (Supplementary Table 2 -online only). We focused on three overlapping variants which were generated from the two different analysis pipelines. A single nonsense variant topped both lists, which is chr6:35473549 (c.1081C4T, p.Arg361*, NM_003322) in TULP1 (Supplementary Figure 1A -online only), a gene known to cause LCA and autosomal recessive RP (OMIM 602280, http://www.omim.org/). Meanwhile, we also examined the 21 genes known to cause LCA for the presence of a homozygous deleterious variant in the proband and this mutation was identified again, whereas no deleterious mutation was found in other LCA genes.

Mutation Validation
The mutation was considered novel as it was not

DISCUSSION
Leber congenital amaurosis is a severe, early-onset form of retinal degeneration. Accurate early genetic diagnosis is required to determine individuals who may benefit from gene-specific treatment trials, identify those who may have mutations in genes indicating a requirement for renal monitoring, and for the provision of accurate recurrence risk information and reproductive options for family members. Up until recently, genetic diagnosis has relied on allelespecific assays screening for known disease variants with or without Sanger sequencing of additional gene regions. 4,5 These strategies have a relatively low detection rate compared with studies where more comprehensive Sanger sequencing has been able to be undertaken through research studies. 3 In this family, the available initial clinical diagnostic testing using an allele-specific assay and sequencing of the segments of the LCA genes thought to be most commonly involved in LCA (John and Marcia Carver Laboratory, University of Iowa, US) did not lead to disease gene identification in our patient. Clinical diagnostic testing using a homozygosity mapping approach in known disease gene regions was not available. Full Sanger sequencing of all known LCA genes was also not clinically available due to the high costs associated with this. Hence, with the projected decrease in cost of whole exome sequencing as a diagnostic testing strategy in such cases, we decided to undertake this approach as a means of variant detection in a known or novel disease gene in this family. We proceeded with a whole exome sequencing and comparison approach between family members, looking for deleterious variants homozygous in the patient, heterozygous in both parents and heterozygous or absent in the unaffected sibs. Previous reports have indicated variability in detection of SNVs among various bioinformatic pipelines used for analysis of exome sequence data. 20,21 Hence, we employed two different bioinformatic pipelines, BWA-GATK and SOAP, to increase our capacity for valid variant detection in this family. Output from these two independent analysis pipelines suggested a single novel nonsense mutation chr6:35473549 (c.1081C4T, p.Arg361*, NM_003322) in TULP1 to be the most likely diseasecausing variant. Simultaneously, given consanguinity in the family, we analyzed the exome sequence data in the proband for any evidence of a homozygous deleterious variant in any of the known LCA-causing genes. Examination of the known 21 LCA genes led to identification of only one, also the same, deleterious homozygous variant in TULP1.
Twenty-five mutations in TULP1, including the one in this project, are reported which cause LCA or juvenile-onset or severe autosomal recessive RP (Supplementary Table 3 -online only). 10,11,[22][23][24][25] In our patient with LCA, the novel homozygous nonsense mutation resulted in a stop codon at amino acid position 361, leading to a significantly shortened protein with loss of the conserved C terminal domain of TULP1. This brings to three the number of nonsense mutations reported in TULP1, with all patients with this type of mutation having the severe retinal phenotype of LCA. Our patient had nystagmus, night blindness, absence of photophobia, mild peripheral pigmentary changes and changes at the macula consistent with previously reported clinical findings in patients with TULP1 mutations. In a classification system proposed by Hanein, 11 absence of increased hypermetropia was noted in patients with TULP1 mutations, as is the case for our patient. However, we note that this is not a completely consistent finding considering a number of individuals with the homozygous stop gain mutation, c.901C4T, p.Gln301*, were found to have hypermetropia. 26 This highlights the lack of precise genotypephenotype correlations that can be made in LCA, indicating the value of a broad sequencing strategy for pathogenic variant identification. With the advent of clinically available whole exome sequencing or NGS of all parts of a large number of genes, results from our study indicate that these strategies are likely to be advantageous over allele-specific assays and targeted gene segment Sanger sequencing, in disease gene identification in highly genetically heterogeneous conditions such as LCA.
There is recent progress in therapeutic strategies for some forms of LCA. 27 Although no treatment options for LCA with TULP1 mutations are currently available, there are advances in identification of the functions of TULP1 and its genetic modifiers. 6,28 TULP1 is located in the cytoplasm where it associates with the cytoskeleton and cellular membrane. Various functions have been attributed to TULP1 including maintenance of the photoreceptor synapse, contribution to vesicular transport within the cell, as well as other proposed roles in transcriptional regulation and phagocytosis with the C-terminal domain noted as of particular importance. 6,7,29,30 Investigations using a TULP1 mutant mouse line crossed with mice with a putative protective allele of microtubule-associated protein 1A, resulted in reduced photoreceptor loss. 28 This suggests a critical role for microtubule-based intracellular transport and synapse function as a focus for further development of potential treatment strategies where there is abnormality of TULP1.
Our study shows the value of exome sequencing in discovering causes of the genetically heterogeneous condition of LCA, where there is a general lack of hotspot mutations or regions that will lead to a genetic diagnosis in the majority of cases. The exome sequencing approach was beneficial over allele-specific and targeted Sanger sequencing assays. It allowed rapid identification of a novel nonsense mutation in TULP1, and highlighted the critical role of the C-terminal domain for normal function of this protein in the retina. Use of a broader NGS strategy for improved genetic diagnosis in LCA will lead to better understanding of the relative contributions of the various disease genes and mutation types, and ultimately improved molecular understanding and treatment of these disorders.