Genome-wide association study across five cohorts identifies five novel loci associated with idiopathic pulmonary fibrosis

Idiopathic pulmonary fibrosis (IPF) is a chronic lung condition with poor survival times. We previously published a genome-wide meta-analysis of IPF risk across three studies with independent replication of associated variants in two additional studies. To maximise power and to generate more accurate effect size estimates, we performed a genome-wide meta-analysis across all five studies included in the previous IPF risk genome-wide association studies. We used the distribution of effect sizes across the five studies to assess the replicability of the results and identified five robust novel genetic association signals implicating mTOR (mammalian target of rapamycin) signalling, telomere maintenance and spindle assembly genes in IPF risk.


ABsTrACT
Idiopathic pulmonary fibrosis (IPF) is a chronic lung condition with poor survival times. We previously published a genome-wide meta-analysis of IPF risk across three studies with independent replication of associated variants in two additional studies. To maximise power and to generate more accurate effect size estimates, we performed a genome-wide metaanalysis across all five studies included in the previous IPF risk genome-wide association studies. We used the distribution of effect sizes across the five studies to assess the replicability of the results and identified five robust novel genetic association signals implicating mTOR (mammalian target of rapamycin) signalling, telomere maintenance and spindle assembly genes in IPF risk.

InTroduCTIon
Idiopathic pulmonary fibrosis (IPF) is a chronic lung disease believed to result from an aberrant response to alveolar injury leading to a build-up of scar tissue. This progressive scarring is eventually fatal with half of individuals dying within 3-5 years of diagnosis. 1 The cause of IPF is unknown but genetics play an important role in how susceptible an individual is to IPF. 2 Genome-wide association studies (GWAS) are an approach whereby genetic variants from across the genome are tested for their association with a disease. Genetic loci identified by GWAS can implicate genes important in disease pathogenesis, and drugs which target the products encoded by these genetically supported genes are twice as likely to be successful during development. The genetic association statistics from a GWAS are also widely used to identify causal markers of disease through Mendelian randomisation, to conduct heritability estimation and for genetic correlation analyses.
We recently published a GWAS of IPF risk. 2 The discovery GWAS consisted of three studies (named as the UK, Chicago and Colorado studies) and a replication analysis performed in two independent studies (named as the USA, UK and Spain (UUS) and Genentech studies). This analysis reported 14 genetic signals which implicated host defence, cell-cell adhesion, spindle assembly, transforming growth factor beta (TGF-β) signalling regulation and telomere maintenance as important biological processes involved in IPF disease risk. The effect size estimates from this analysis have been widely used in other genetic analyses [3][4][5] and have been integrated into drug target discovery pipelines.
To maximise sample sizes for detection of new genetic associations and to generate more precise effect size estimates, we have reanalysed the data and present a meta-analysis of genome-wide data from all five datasets included in our previous study. The results of this analysis implicate new genetic loci in IPF pathogenesis and provide a unique resource for other studies of IPF risk and pathogenesis.

MeThods
Quality control and sample selection have been previously described. 2 In summary, datasets comprised unrelated European-ancestry individuals from across the USA, UK and Spain, diagnosed using American Thoracic Society/European Respiratory Society guidelines. 6 7 Individuals in the Genentech study were sequenced using HiSeq X Ten platform (Illumina), and all other individuals were imputed from genotyping data using the Haplotype Reference Consortium (HRC) reference panel. 8 Genome-wide analyses were performed in each study separately using an additive logistic regression model adjusting for the first 10 genetic principal components to account for population stratification.
The five separate study-level GWAS were metaanalysed into a single GWAS, using an inversevariance weighted fixed effect meta-analysis using METAL. 9 Variants were included in the metaanalysis if they were available in at least four studies. Genomic control was performed on the meta-analysis results using the LD (linkage disequilibrium) score regression intercept to account for inflation not explained by polygenic effects. 10 Significant variants were defined as those with meta-analysis p value of <5×10 −8 , and conditional analyses were performed using GCTA-COJO (genome-wide complex trait analysis-conditional and joint analysis) to identify additional independent associated variants. 11 Independent associated variants were defined as variants remaining genome-wide significant after conditioning on the most significant variant (sentinel) in the region with consistent effect size estimates in the conditional and non-conditional analyses. Annotation of  the sentinel variants was then performed using Variant Effect Predictor. 12 To assess the robustness of novel results, we tested the strength and consistency of results across studies using the Meta-Analysis Model-Based Assessment of Replicability (MAMBA). 13 Variants with a posterior probability of replication (PPR) of ≥90% were considered robust and likely to replicate should additional independent datasets become available.
Summary statistics (ie, effect size estimates, SEs, p values and basic variant information) for all variants included in the genome-wide meta-analysis can be accessed online (https:// github.com/genomicsITER/PFgenetics).

resulTs
A total of 4125 cases, 20 464 controls and 7 554 248 genetic variants were included in the analysis (figure 1). The UUS study included one additional case (due to resolving a sample ID issue since the previous publication) and one fewer control (where the individual has since withdrawn consent from UK Biobank) than described in the previous GWAS. 2 After conditional analyses, there were 23 independent signals with p<5×10 −8 in the genome-wide meta-analysis (figure 2). These 23 signals included all 14 associations reported in the previous GWAS (online supplemental table 1). Of the nine

dIsCussIon
By increasing the number of cases in the discovery analysis by more than 50% compared with the previous IPF risk GWAS, we identified novel genetic signals associated with IPF risk and improved the precision of estimations for previously reported signals. The five novel loci had internal evidence of replicability, giving us confidence that these signals are likely to be generalisable. The signals in RTEL1 and OBFC1 have been reported previously but did not meet the significance criteria of the previous three-way GWAS. 2 The new MAMBA analysis suggests that the consistency of effect across studies provides high confidence that the RTEL1 signal will replicate should an independent dataset become available. This is not the case for the OBFC1 signal where a low PPR suggests that there may be heterogeneity in effect across the contributing studies.
The novel signals require further characterisation to determine the likely causal gene and underlying functional effect of the variants. However, some of the genes that are closest to these new signals have strong candidacy for involvement in IPF pathogenesis. NPRL3 encodes a GATOR1 complex function component and acts through mTORC1 signalling to inhibit mTOR kinase activity. 14 mTOR regulates TGF-β collagen synthesis, and inhibiting mTOR leads to increased deposition of scar tissue. 15 We previously reported an association implicating DEPTOR, another mTOR inhibiting gene. We also add to the evidence that cellular ageing plays a key role in IPF pathogenesis through associations at the telomere maintenance genes TERT, TERC and RTEL1. We previously reported associations in spindle assembly genes (MAD1L1 and KIF15) and have identified a novel genetic association in another spindle assembly gene kinetochore scaffold 1 (KNL1, also known as CASC5). Stathmin 3 (STMN3) implicates another cell replication process through tubulin binding. 14 Our analysis also shows the benefits of including all samples in the genome-wide analysis. By using recent statistical methodological advances to test for the replicability of signals when all available datasets are included in the discovery GWAS, 13 we were able to identify five additional variants with evidence of being robustly associated with IPF risk. Additional independent replication of these signals would strengthen the evidence for their role in IPF susceptibility.
By maximising the statistical power of the analysis, we identified novel genetic associations with IPF risk. These signals may implicate biologically relevant genes that support the importance of TGF-β signalling and cell replication as important processes in disease pathogenesis. Competing interests AS and BLY are employees of Genentech/Roche and hold stock and stock options in Roche. JMO reports personal fees from Boehringer Ingelheim, Genentech, United Therapeutics, AmMax Bio and Lupin Pharmaceuticals unrelated to the submitted work. RGJ is a trustee of Action for Pulmonary Fibrosis and reports personal fees from Astra Zeneca, Biogen, Boehringer Ingelheim, Bristol Myers Squibb, Chiesi, Daewoong, Galapagos, Galecto, GlaxoSmithKline, Heptares, NuMedii, PatientMPower, Pliant, Promedior, Redx, Resolution Therapeutics, Roche, Veracyte and Vicore. DAS is the founder and chief scientific officer of Eleven P15, a company focused on the early detection and treatment of pulmonary fibrosis. LW reports research funding from GlaxoSmithKline and Orion Pharma, and consultancy for Galapagos (all outside of the submitted work). . Individuals from the COMET (NCT01071707) and Lung Tissue Research Consortium (NCT02988388) studies were also included in the Chicago study. All subjects in the Colorado study gave written informed consent as part of IRB-approved protocols for their recruitment at each site, and the genome-wide association studies were approved by the National Jewish Health IRB and Colorado Combined Institutional Review Boards. Subjects in the Genentech study provided written informed consent for whole-genome sequencing of their DNA. Ethical approval was provided as per the original clinical trials (INSPIRE (NCT00075998), RIFF (NCT01872689), CAPACITY (NCT00287729 and NCT00287716) and ASCEND (NCT01366209)). For the UCSF (University of California, San Francisco) cohort, sample and data collection were approved by the University of California San Francisco Committee on Human Research, and all patients provided written informed consent. For the Vanderbilt cohort, the IRBs from Vanderbilt University approved the study, and all participants provided written informed consent before enrolment.

Patient consent for publication
Provenance and peer review Not commissioned; externally peer reviewed.
supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.