Genetic Variants in IL6R and ADAM19 are Associated with COPD Severity in a Mexican Mestizo Population.

Chronic obstructive pulmonary disease (COPD) is a complex and multifactorial disease with a strong genetic component. Our objective is to identify the genetic variants associated with COPD risk and its severity in Mexican Mestizo population. We evaluated 1285 single-nucleotide polymorphisms (SNPs) of candidate genes in 299 smokers with COPD (COPD-S) and 531 smokers without COPD (SWOC) using an Illumina GoldenGate genotyping microarray. In addition, 251 ancestry informative markers were included. Allele A of rs2545771 in CYP2F2P is associated with a lower risk of COPD (p = 4.02E-10, odds ratio [OR] = 0.104, confidence interval [CI] 95% 0.05-0.18). When the COPD group was stratified by severity according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD; levels III + IV vs. I + II), 3 SNPs (rs4329505 and rs4845626 in interleukin 6 receptor [IL6R] and rs1422794 in a disintegrin and metalloproteinase domain 19 [ADAM19]) were associated with a lower risk of suffering the most severe stages of the disease. rs2819096 in the surfactant protein D (SFTPD) gene was associated with a higher risk of COPD GOLD III + IV (p = 7.79E-03, OR = 1.80, CI 95% 1.16-2.79). Finally, the haplotype in IL6R was associated with a lower risk of suffering from more severe COPD, whereas the haplotype in ADAM19 was associated with a higher risk (p = 7.40E-03, OR = 2.83, CI 95% 1.20-6.86) of suffering from the severe stages of the disease. Our data suggest that there are alleles and haplotypes in the IL6R, ADAM19, and SFTPD genes associated with different severity stages of COPD; in CYP2F2P, rs25455771 is associated with a lower risk of COPD.


Introduction
According to the World Health Organization, by 2030, chronic obstructive pulmonary disease (COPD) will be the third leading cause of death worldwide (1). In Mexico, the prevalence estimated by the Latin American Project of Research for Pulmonary Obstruction (Proyecto Latinoamericano de Investigación in Obstrucción Pulmonar [PLATINO]) (2) according to the criteria established by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) is 7.8% in people older than 40 years (3).
COPD is a preventable and treatable disease, is characterized by airflow obstruction, is usually progressive, and is associated with an exaggerated lung inflammatory reaction in response to the inhalation of particles and noxious gases. The main environmental risk factors are smoking and the exposure to smoke from burning biomass (3)(4)(5). Studies in families show that there are certain genetic components associated with COPD. The most clearly associated component is alpha-1 antitrypsin deficiency; however, in the Mexican Mestizo population, the risk allele PiZ (rs28929474/A) is detected in less than 2% of patients with COPD (6). This finding suggests that there are other not yet clearly identified genes involved in COPD susceptibility (7). In the last 8 years, using whole-genome studies, regions associated with the risk of suffering COPD have been identified (8). However, these types of studies do not yet exist for the Latin American population, especially among Mexican Mestizos. Therefore, the aim of this study was to identify genetic variants associated with the risk of suffering from COPD and its severity in the Mexican Mestizo population.

Study participants
A prospective, exploratory study was performed. A total of 1,117 individuals were included and classified into two groups: smokers with COPD (COPD-S) and without COPD (SWOC of Mexico. Only Mexican Mestizos by ancestry (Mexican-birth parents and grandparents) who were older than 30 years and smokers or ex-smokers of ࣙ 10 cigarettes/day for ࣙ 10 years were included. The COPD diagnosis was based on clinical history, physical exploration, and spirometry data, taking into account the criteria established by the American Thoracic Society and the GOLD for severity (5). The values calculated by Pérez-Padilla for the Mexican population were used (9). For post-bronchodilator spirometry, 400 mg of nebulized salbutamol was administered by an inhaler and spacer (10). The SWOC were recruited from a quit smoking help clinic within the same department, in addition to subjects who came to INER during a campaign for COPD early detection on World COPD Day and World No Tobacco Day, who were invited to perform a spirometry test. These individuals were considered to be the control group based on the GOLD criteria (4). Individuals with bronchial asthma, bronchiectasis, active tuberculosis, lung cancer, cystic fibrosis, pneumonitis by hypersensitivity, or idiopathic lung fibrosis were excluded from the study. The participants completed a survey of anthropometric data and family genetic background data. Six-milliliter samples of peripheral blood were collected in tubes with EDTA as the anticoagulant. The individuals agreed to participate voluntarily and signed an informed consent written specifically for this protocol, which was approved by the research and biosafety bioethics committee of INER (protocol number B20-08). The Strengthening the Reporting of Genetic Association Studies (STREGA) guidelines were taken into account in the design of this genetic association study (11).

Genomic DNA extraction and concentration adjustment
Genomic DNA was extracted using a commercial BDtract isolation kit (Maxim Biotech, San Francisco CA, USA). The DNA was quantified via UV microspectrophotometry at 260 nm using a NanoDrop 2000 (Thermo Scientific, Wilmington DE, USA). Contamination levels with organic compounds and proteins were determined by establishing the ratio of the 260/240 and 260/280 measurements, respectively. The samples were considered to be contaminant-free when both ratios were between 1.7 and 2.0. All of the sample concentrations were adjusted to 50 ng/µl for subsequent genotyping.

Single-nucleotide polymorphism selection and microarray design
A genotyping microarray with 1,536 single-nucleotide polymorphisms (SNPs) was designed for the Illumina GoldenGate platform (Illumina, Inc., San Diego CA, USA). The selection was performed by reviewing papers found in the National Center for Biotechnology Information databases (12). The words COPD, association, genome-wide association studies (GWAS) and smokers were used, and papers published between 2005 and 2012 with SNPs associated with COPD were selected from the results obtained. Each study was analyzed for its experimental design, phenotype studied, population used (ethnic origin,  (13,14) and that the populations fulfilled the criterion for Hardy-Weinberg equilibrium (EHW, p ࣙ 1E-04). A total of 458 SNPs individually associated in the literature were included, from which we selected those genes with the highest numbers of associated SNPs. In total, 24 candidate genomic regions on 14 chromosomes were considered (Supplementary Table 1). The software Haploview v4.2 was used for the selection of tag SNPs (15). Minimum and maximum distances of 60 and 200 bp were established between each tag SNP, respectively; a total of 825 tag SNPs were included. A complete list of the SNPs included is provided in Supplementary Table 2.
The population ancestry was evaluated using four reference populations: Caucasian (CEU), Eastern Asian (EA), and African (YRI) from the HapMap International Project (16) and Amerindian from the MGDP (NATIVE) (13,14). A total of 253 ancestry informative markers (AIMs) were included, which had allelic frequency differences () ࣙ 0.3 for each pair of reference populations.

Genotyping and quality control
To obtain genotypes, the protocol design for the Illumina Gold-enGate platform was used (Illumina, Inc.). The microarray was read on a BeadArray scanner (Illumina, Inc.). Genotyping and document generation (output as .ped and .map files) were performed using the GenomeStudio 2011 v1.0 software (Illumina, Inc.). The samples that did not satisfy the call rate (ࣙ95%) criteria were excluded from the analysis.

Statistical analysis
To describe the population under study, SPSS v.15.0 software was used (SPSS software, IBM, New York, USA) to determine  (20). Associations according to the GOLD classification were analyzed using PLINK 1.07 software; a logistic regression model was performed that included co-variables such as age, sex, BMI, years of smoking, cigarettes per day, and packs-year history.

Results
Out of 1,117 individuals selected, 839 met the established criteria. Of these, 302 were patients with COPD (COPD-S) and 537 were SWOC. Two subjects were removed from the case group, and three from the control group because they did not meet the genotyping criteria. In the subsequent ancestry analysis, three subjects were removed from the control group, and one from the case group. Finally, 299 cases and 531 controls met the quality criteria established in each stage.
The clinical and lung function criteria are shown in Table 1. The mean age was higher in patients with COPD compared with the controls (p < 0.001). The sex ratios [Female (F): Male (M)] were 1:1.2 in COPD-S and 1:1.5 in SWOC (p < 0.001). In general, among the variables associated with smoking, higher values were observed in the case group compared with the controls (p < 0.001 for all five comparisons). Regarding the GOLD level, most individuals were classified in levels II and III (∼70%), and GOLD II had the highest frequency (47.5%). Differences in the lung function measurements were observed that were caused by the inherent characteristics of the selection of cases and controls. In the group COPD-S twenty-two subjects (9.6% of the sample) showed the forced expiratory volume in the first second/forced vital capacity FEV1/FVC ratio above the Lower Limit of Normal; however, the mean FEV1% predicted was 59 ± 20 and the tobacco smoking history, the clinical and imaging studies were compatible with COPD.

Ancestry population analysis
A total of 251 AIMs were analyzed using PCA. The first three eigenvectors were taken into account, obtaining a value of Fst = 0 between the groups studied (Supplementary Table 3). The population's structure was analyzed with the same reference populations under unsupervised conditions (Figure 1). Table 2 shows the mean population contribution of each group according to the reference populations. Between COPD-S and SWOC, the contributions were not statistically significantly different, which supports the results obtained by the PCA.

Genetic association analysis: Alleles and haplotypes
A total of 1,201 SNPs met the established criteria. Association analyses between SNPs and the presence of COPD were performed using a logistic regression adjusted for age, sex, BMI, years of smoking, cigarettes per day, and packs-year history. We identified 29 associated SNPs (p < 0.01, Supplementary  Table 4), of which 12 were strongly associated with disease (p < 0.001). Three SNPs were associated with risk (rs13109246, rs1511532, and rs995043, OR = 1.4-1.7). After the Bonferroni correction, only rs2545771/A in CYP2F2P was associated with a lower risk of COPD (p = 4.02E-10, OR = 0.104, CI 95% 0.05-0.18). The haplotype analysis did not reveal any statistically significant associations.

Allele and haplotype association analysis according to the GOLD classification
The patients were classified according to the GOLD severity, grouping the lower-severity levels (GOLD I Mild and II Moderate, GMM n = 194) and the higher-severity levels (GOLD III Severe and IV Very Severe; GSVS n = 105). Allele and haplotype association analyses were performed using PLINK 1.07 software and a chi-square test; Table 3 shows the four SNPs that were associated (p < 0.01), of which three were associated with a lower risk of suffering the more severe stages (interleukin 6 receptor [IL6R] rs4329505, rs4845626 and a disintegrin and metalloproteinase domain 19 [ADAM19] rs1422794); whereas rs2819096 in surfactant protein D (SFTPD) was associated with a higher risk of suffering COPD stages III or IV. The haplotype analysis identified two associated blocks (p < 0.01, Table 4), including one in IL6R formed by five SNPs, associated with a lower risk of suffering the more severe stages of the disease (OR = 0.34), whereas the block located on ADAM19 clustered six polymorphisms and was associated with an increased risk of suffering severe and very severe COPD (OR = 2.83). The blocks formed for each gene are shown in Figure 2.

Discussion
COPD is a worldwide health problem. According to the PLATINO study in Mexico, the prevalence of COPD is 7.8% and is as high as 20% in Montevideo. In 2011, Bruse proposed that  Hispanics, with a high Amerindian component, are "less susceptible" to the disease because they possess "protective alleles" that are found in high frequencies in these populations (21). In this study, we determined that allele A of rs2545771 in CYP2F2P is a protective allele found on chromosome 19, a region that encodes genes belonging to the cytochrome P450 family (22). The function of CYP2F2P in COPD is unknown, but it is possibly involved in nicotine metabolism; this variant has an allelic frequency of 5.8% in the Caucasian population. Its prevalence barely reached 2.3% in COPD-S, whereas its prevalence was 18.7% in SWOC. These previous data support the hypothesis proposed by Bruse et al. which indicate that the "protective" alleles are found in higher frequencies in Latin American populations compared with those populations of Caucasian origin.
The genetic association analysis according to GOLD severity identified 3 alleles that are associated with a lower risk of suffering the more severe disease stages: two (rs4329505 and rs4845626) in IL6R and one (rs1422794) in ADAM19. The IL6R gene is expressed in the lung and is important in immune responses (8). Previous studies have reported differences in the expression levels between smokers with and without COPD; thus, IL6R has been proposed as a potential biomarker for this disease (23). ADAM19 is abundantly expressed in alveolar epithelial cells and inflammatory cells, which suggests its involvement in early immune defense mechanisms and perpetuation of the inflammatory process (24). It stimulates transforming growth factor beta 1 (TGFβ1) expression, can increase proinflammatory activity via tumor necrosis factor (TNF), and has been proposed to be involved in damage modulation at the lung level (25). In addition, ADAM19 participates in the proteolytic processing of beta-type neuregulin isoforms, which are involved in neurogenesis and synaptogenesis, suggesting that ADAM19 also plays a regulatory role in glial cells (26).
In our study, we found that rs1422794 (allele C) is associated with a lower risk of developing the severe stages of COPD. Previously, Hancock et al reported that SNPs implicated for FEV1/FVC or FEV1 are intergenic, intronic, or located at 3' untranslated regions (3 UTRs). Of these, three intronic SNPs at GPR126 and one intergenic SNP near nephronectin NPNT are located in transcription factor binding sites (27). According to the University of California Santa Cruz (UCSC), Genome Browser, additional SNPs in ADAM19 (rs2277027 and rs1422795) were not significantly associated with FEV1/FVC at a genome-wide significance level (28), but these SNPs were associated at a genome-wide significance level in a joint metaanalysis (27).
Our data showed that in the ADAM19 gene, the C allele of rs1422794 is associated with a decreased risk of developing more severe COPD. This polymorphism is located in the intron region (between exons 8 and 9). The function of this polymorphism is currently unknown, but it may participate in the process of alternative splicing. As a consequence of this process, according to Ensembl, five transcripts are reported, of which only three encode a protein (29).
Castaldi et al proposed that a subset of these markers would also be associated with COPD susceptibility. For the SNPs, the direction of the OR was consistent with the effect observed in the spirometric GWA studies (i.e., alleles that were associated  with lower levels of FEV1 or FEV1/FVC were associated with a higher risk of COPD). The degree of linkage disequilibrium between the spirometric GWAS SNPs and the top SNPs in the gene-based analysis varied widely (r 2 = 0.07-0.71), suggesting that for at least some of these loci, the top gene-based extended association result may represent an independent signal (30).
There is functional evidence supporting the potential etiological role of the ADAM19 SNP rs1422795. This nonsynonymous coding SNP results in a serine-to-glycine substitution. According to the UCSC Genome Browser, rs1422795 is ∼6-kb upstream of the transcription start site of an ADAM19 transcript variant, suggesting that this SNP could be part of the cis-regulatory region of this transcript (28). Furthermore, rs1422795 is close to the beginning of the translation start site of another ADAM19 transcript variant. Because of this proximity, the amino acid change could influence the expression of this transcript (25); however, in our analysis, rs1422795 did not surpass statistical significance after Bonferroni correction (p = 0.0103), and it is located 185 bp from rs1422794, our associated SNP. Further, rs1422794 is not in high linkage disequilibrium with rs1422795 (r 2 = 0.39), suggesting an independent signal.
Finally, for SFTPD, we identified a polymorphism (rs2819096, allele A) associated (p = 7.79E-03) with a higher risk (OR = 1.80, CI 95% 1.16-2.79) of developing the more severe disease stages. In animal models, it has been demonstrated that tobacco exposure increases the concentrations of surfactant protein D (SP-D) in serum (31). However, in a 2010 study, Foreman did not find any statistically significant association with such polymorphisms and the concentration of this protein in serum (32). Previous studies have proposed that SP-D level could be a biomarker associated with the risk of suffering COPD, mortality, exacerbations, or decreased lung function. Although the results are inconsistent, it is well known that this protein is crucial for maintaining lung homeostasis in addition to being the first line of defense against pathogens in mucosal surfaces (33).
Regarding the haplotype analysis, the haplotype identified in IL6R formed by rs6684439-rs7549250-rs4129267-rs10752641-rs4072391 (G-A-G-G-G) is associated with a lower risk (protection) of suffering from COPD in several disease stages; it is located near the 3 UTR regulatory element of the same gene. The haplotype detected in ADAM19, formed by rs6860507-rs6879450-rs10404-rs3822585-rs10475585-rs1990950 (G-G-G-G-A-C), shows a higher linkage disequilibrium between the polymorphisms forming it compared with the haplotype obtained in IL6R; the ADAM19 haplotype is associated with a higher risk (OR = 2.83, CI 95% 1.20-6.86) of presenting the most severe stages of COPD and is part of the 3 UTR of ADAM19. This haplotype is located in the 3 UTR region of the gene, wherein gene regulation occurs through different transcription factors. The associations reported suggest the involvement of ADAM19 with results in opposite directions, but both associations could be involved in different pathophysiological mechanisms. Biological trials should be designed to demonstrate these hypotheses and clarify the participation of our findings on the severity of COPD.
These two associated haplotypes may be involved in the regulation of the expression of these genes, resulting in impacts on the susceptibility of certain patients with COPD for developing different disease stages, but some questions remain regarding the underlying genetic signal because of substantial linkage disequilibrium in the gene region. This possibility should be confirmed in future studies.