Synergistic polymorphic interactions of phase II metabolizing genes and their association toward lung cancer susceptibility in North Indians

ABSTRACT Lung cancer is a multifactorial carcinoma with diverse heterogeneity. Genetic variations in drug-metabolizing enzymes may lead to defective detoxification and clearance of carcinogenic compounds. The high-order gene-gene interaction has been carried out between different genotypes of Phase II detoxification genes (NQO1, SULT1A1, NAT2, and EPHX1). Our results depict the genetic combination of SULT1A1 R213H with NAT2 × 5B L161L, SULT1A1 R213H with NAT2 × 5C K268R, EPHX1 H139R and NAT2 × 5B L161L exhibit a protective effect towards lung cancer risk. Further, the triple combinations of NQO1 P187S, EPHX1 Y113H, and EPHX1 H139R; NQO1 P187S, EPHX1 Y113H, and NAT2 × 6 R197Q; NQO1 P187S, EPHX1 Y113H, and NAT2 × 7 G286E; SULT1A1 R213H, EPHX1 H139R, and NAT2 × 7 G286E suggested a two-fold increased risk of lung cancer for subjects. Genetic polymorphisms of phase II detoxifying genes (NAT2, NQO1, EPHX1, SULT1A1) are prognostic markers for lung cancer.


Introduction
Lung cancer is classified as a multifactorial and complicated carcinoma with different heterogeneity, indicating a pathological understanding of the disease.Several genetic and molecular changes establish primary lung cancer and its spread/metastasis.(Coroller et al., 2016) Exposure to the aromatic amines and polycyclic aromatic hydrocarbons, majorly from smoking or occupation, has been a significant reason for carcinogenesis.Other than smoking, several other risk factors such as environmental pollution, alkylating agents, oxidizing agents, and genetic variability determine an individual's susceptibility to lung tumours.(de Groot and Munden, 2012) Physiologically, the human body has a detoxification system that handles the metabolism of such xenobiotics, including mutagenic and carcinogenic agents that enter the body from various sources.
The individual susceptibility to lung cancer may be encompassed by inherited genetic predisposition associated with genetic polymorphism of genes encoding the enzymes responsible for xenobiotics metabolism in lung tissue.(Clemens, 1991) The drug-metabolizing enzymes, i.e., NAT2, NQO1, EPHX1, and SULT1A1, play a crucial role in detoxification, but genetic variation may be held accountable for alteration in its activity or protein biosynthesis that leads to defective detoxification and clearance of carcinogenic compounds.(Justenhoven, 2012) So, the genotypic combination of Phase II detoxification genes and their association with lung cancer has been investigated in the current study.
NAT is a crucial phase II detoxifying enzyme; its polymorphisms lead to impaired enzyme activity.One important polymorphism of NQO1 present on exon six and positioned at 609 of this gene involves the C to T allele substitution.(Tian et al. 2014) The mutation results in the enzyme's variable functionality, and this mutant form is unstable, easily denatured by the proteasome, and rapidly ubiquitinated.(Siegel et al., 2012) Additionally, in the coding region of EPHX1, two specific polymorphisms have been studied extensively.Two mutations are in exon 3; variation Tyr 113 His is a T to C change while, in exon 4, His 139 Arg is A to G substitution.Both polymorphisms play a crucial role in modulating enzyme activity.The variant allele of Tyr 113 His has been correlated with at least 50% of lower enzyme activity.The mutant of His 139 Arg is associated with 25% elevated enzyme activity.(Yu et al., 2015) Also, the crucial role of the SULT1A1 gene in detoxification is undebatable.One well-studied polymorphism of the SULT1A1 gene, i.e., Arg 213 His, that involves the arginine to histidine substitution in exon 7 modulates the effect on the activity of this gene. 213His variant allele derivative protein has a reduced catalytic activity and lowers thermal stability compared to its native typederived protein.(Walia et al., 2021) Our study includes the analysis of polymorphic variants of phase II detoxification enzymes i.e.NAT2, NQO1, EPHX1, SULT1A1 and their possible association with modulating risk for lung cancer.The role of other xenobiotic metabolizing enzymes such as CYP1A1, GSTM1, GSTT1 and their associated role with lung cancer has already been published in our previous reports; (Girdhar et al., 2016aKaur Walia et al., 2019;Sharma et al., 2015;Girdhar et al., 2016b) Although, researchers in the past have studied high-order gene interactions and the association of various polymorphisms with susceptibility to cancers such as lung and bladder cancer.However, in this study, we used advanced statistical and analytical methods (such as MDR and CART) to reduce the possibility of uncertainty.To increase the study's robustness, we evaluated the high-order gene interactions between the different genetic polymorphisms of phase II detoxifying genes (NAT2, NQO1, EPHX1, SULT1A1), which play a crucial role in xenobiotic metabolism and are prognostic markers for lung cancer.

Study population and follow-up
The current case-control study was a hospital-based study conducted for 550 cases and healthy controls each.These subjects were registered in the Department of Pulmonary Medicine of Post Graduate Institute of Medical Education and Research (PGIMER), Chandigarh.The study was approved by the Ethical Committee Board of PGIMER and Thapar Institute of Engineering and Technology (TIET), Patiala (IEC-04/2018-884).The healthy controls were those who were not having any prior history of cancer.All the participants involved in the study gave their written informed consent.The interviewer completed a detailed questionnaire for cases and controls that included smoking history, demographic information, the number of chemotherapeutic cycles, the regimen of treatment and TNM.The staging was obtained from the medical records of hospitalized patients.Each case in the study was followed up telephonically every two months until death or the end of the study.The survival time was calculated from the date of lung cancer diagnosis to the last follow-up date or death due to unknown causes.There were no specific comorbidities among lung cancer patients.The pack year was calculated using the following formula: number of cigarettes or bidis smoked per day× number of years smoked 20 Patients meeting all the following requirements shall be eligible for enrollment (i) Diagnosis of lung cancer (NSCLC or SCLC) is confirmed either by histology or cytology.(ii) stage III or IV disease.(iii) No age, gender, smoking, histology, and staging restrictions were applied.(iv) Untreated and intent to treat with definitive chemotherapy (Treated with platinum agents cisplatin/carboplatin, either as the first or second line).(v) An Eastern Cooperative Oncology Group (ECOG) performance status (PS) of 0-2.(vi) At least one bi-dimensionally measurable lesion, according to the RECIST criteria.(vii) Adequate organ function, defined as absolute neutrophil count >1500/μL, platelet count >100,000/μl, and levels of creatinine, liver enzymes, and alanine aminotransferase (ALT) less than two times the upper limits of normal (ULN).(viii) Written informed consent was obtained.
The exclusion criteria included the following (i) Patients had a prior history of any other carcinoma.(ii) Active infection or immunosuppression (HIV).(iii) Patients were receiving any systemic steroids.(iv) Patients are suffering from a chronic form of diarrhea due to any cause.(v) Patients who are not undergoing chemotherapy.

DNA extractions and genotyping
The DNA was isolated from 3-4 ml of the blood using the phenol-chloroform extraction method with certain modifications as carried out by Bahl et al., (2017) The polymerase chain reaction (PCR)-restriction fragment length polymorphism (RFLP) technique was used for genotyping each SNP.The genotyping for the NQO1 (Pro 187 Ser, 609C>T, rs1800566) gene variant was performed as previously detailed by.Mandal et al. (2012) In the SULT1A1 Arg 213 His (638 G>A, rs9282861) polymorphic site, the genotyping was carried out similarly as reported by.Arslan (2010) In genetic variants of the EPHX1 gene, namely Tyr 113 His (337T>C, rs1051740) and His 139 Arg (415A>G, rs2234922), the protocol described by Ghattas and Amer (2012) with slight modifications was followed to find out the genotype of the subjects.Further, for the genetic variants of the NAT2 gene, Leu 161 Leu (418C>T, rs1799929), Arg 197 Gln (590 G>A, rs1799930), Lys 268 Arg (803A>G, rs1208), Gly 286 Gln (857 G>A, rs1799931) polymorphic sites, the genotyping was carried out as previously described by Lotfi et al., (2018) with slight modifications.
The PCR mixture of 25 µl was used to amplify the described fragment, which consists of 1X PCR buffer, 1.5 mM MgCl 2 , 0.5 µM of both forward and reverse primer, 200 µM dNTPs, 100 µg/ml bovine serum albumin (BSA) and 1 U Taq polymerase (DNAzyme, Thermo Scientific) and 200 ng DNA.After amplification, the PCR product of 162 bp for exon 3 variant and 357 bp for exon 4 variant was checked on 2.0% agarose gel and then digested with 5 U of EcoRV RsaI restriction enzyme (New England Biolabs, Ipswich MA), respectively at 37 °C.The digested samples were run on 8% Native-PAGE, stained in ethidium bromide, and visualized under a UV trans-illuminator.As described above, the amplified product was digested with their respective restriction enzymes.The digested products were resolved on either an agarose gel or a polyacrylamide gel to determine the restriction patterns.The genotypic status of the sample was determined by scoring the patterns.The genotyping of 20% of the samples was repeated twice to ensure that the results were reproducible, and it was found to be 100%.

Statistical analysis
The study was restricted to the North Indian population, with complete gender, age, and smoking status information.The Chi-square goodness-of-fit test was used to determine whether the cases and controls were following Hardy-Weinberg equilibrium (p 2 +2pq+q 2 = 1; p and q are the frequencies of wild type and variant type, respectively).The odds of lung cancer risk were examined using MedCalc Statistical Software version 14.8.1 (MedCalc Software bvba, Ostend, Belgium).Odds ratio (OR) and 95% confidence interval (CI) were calculated after adjustment with age, gender and smoking status using unconditional multivariate logistic regression.The non-parametric approach was used to analyze various gene-gene interactions contributing to LC predisposition.Multiple comparison corrections were also carried out by the false discovery rate (FDR) method using online software based on Benjamini and Hochberg's approach (http://sdmproject.com/utilities/?show=FDR).The adjusted p-values are mentioned, and the FDR-adjusted p-value of <0.01 was considered significant.
The multifactor dimensionality reduction (MDR) method involves data reduction to detect multi-loci genotypic combinations that can aid in predicting the risk of a complex disease like LC MDR condenses multidimensional data into a single dimension by grouping genotypes into high and low-risk groups.The interaction models are then evaluated using cross-validation consistency (CVC), the number of times a model is identified as the best across cross-validation sets.A higher CVC indicates more significant support for the model's strength.The average prediction error is also computed (1-testing accuracy).Permutation testing (p-value) is another critical parameter that determines the significance of the hypothesis generated.(Ritchie & Motsinger, 2005) To combat the effect of confounding factors in this interaction analysis, the author used a stratified approach based on factors such as histology and smoking, as this is the only way to overcome this major drawback of the MDR approach.CART using the CART software was the subsequent critical analysis performed to determine high-order logistic regression complex interactions (6.0, Salford Systems, CA, USA).It is a recursive binary partitioning approach that divides the data based on risk and creates a decision tree that depicts all high-and low-risk subgroups.The first split in the tree is formed by the most significant factor contributing to disease susceptibility, and subsequent splits are made based on the significance levels to control tree growth.The tree is divided into two parts: nodes and terminal nodes.The splitting process is repeated until the terminal nodes have no further statistically significant values or have a minimal number of subjects.This aids in estimating different genotypic combinations affecting LC susceptibility that traditional logistic regression does not yield.It considers many variables at once to identify high-risk subgroups.This data mining exercise results in a decision tree-like structure that depicts various factors, their interactions, and the risk associated with these combinations.The nodes present at the tree's initial splits are biologically significant in modulating the LC predisposition.The terminal node with the lowest case rate is used as a reference to calculate the OR and 95% CI for all genotypes depicted in other nodes.CART employs the simple-to-calculate Gini index.Gini Index is the impurity (or purity) measure used in CART decision tree construction.Gini impurity measures how frequently a randomly selected element from the set would be mislabeled if labelled randomly according to the distribution of labels in the subset.(Srivastava et al., 2012)

Demographic characteristics
The current study was a hospital-based case-control study that involved 550 cases and 550 healthy controls.The subjects in this study were evaluated based on demographic characteristics such as age, gender, smoking status, pack years, and histological subtypes, as summarized in Supplementary Table S1.

Genotypic combination of Phase II detoxification genes and their association with Lung cancer
Table 1 summarizes the minor allele frequency (MAF) in both cases and controls and the risk of the three genotypes for the different SNPs of phase II detoxifying genes studied.As shown in Table 1, the MAF for Phase II detoxification genes was higher in controls than in cases, except for NAT2 × 7 G 286 E and NQO1 P 187 S, wherein the MAF was higher in cases than controls (0.35 vs 0.32; 0.31 vs 0.28, respectively).The adjusted odds ratio for individuals with homozygous variant genotype for NAT2 × 5B L 161 L showed a protective effect towards susceptibility for lung cancer (AOR = 0.50, 95% C.I. = 0.34-0.73,p = 0.0003, FDR = 0.002).Furthermore, it was also evident that subjects who were heterozygous a Adjusted odds ratios, 95% confidence intervals, and their corresponding p-values were calculated using logistic regression analysis after adjusting for age, gender, and smoking status.Highlighted p-value marks the significant value.b Two-sided χ2 test for either genotype distribution or allelic frequencies between the cases and controls.
The genotypes depicting dual combinations and their association with lung cancer risk are depicted in supplementary Table S2.The genetic combination of SULT1A1 R 213 H and NAT2 × 5B L 161 L had a higher representation of wild genotype (GG+CC) in cases than controls, respectively (68.56 vs 55.91%).In contrast, patients with heterozygous genotypes for both alleles (GACT) were more illustrated in controls than cases (41.36 vs 31.44%).Further, the combination of heterozygous and mutant subjects as a single genotype (GA + AA + CT + TT) depicted a higher representation in controls than cases (44.09% vs 31.44%).Logistic regression analysis revealed a strong protective effect towards lung cancer risk (GA + CT) when compared with the reference subjects (AOR = 0.59; 95% C.I. = 0.39-0.87;p = 0.009; FDR p=0.01).Similarly, combining heterozygous and mutant genotypes as a single genotype showed a protective effect on lung cancer risk (AOR = 0.55; 95% C.I. = 0.37-0.82;p = 0.003; FDR p=0.03) (Supplementary Table S2).
Lung cancer subjects who were carrying a single copy of the variant allele for both the EPHX1 Y 113 H and NAT2 × 7 G 286 E polymorphism (heterozygous genotype, TC+GA) were more presented in the cases as compared with the controls (61.67%vs 45.58%), whereas subjects with the wild genotype (TT+GG) were higher in controls than cases (51.16% vs 33.33%).Logistic regression analysis revealed a 2-fold risk for lung cancer for individuals who were common single allelic carriers (TC + GA) compared with the reference subjects (AOR = 2.09; 95% C.I. = 1.38-3.17;p = 0.0005; FDR p= 0.001).Similarly, combining both heterozygous and mutant genotypes as a single group (TC+ CC + GA+AA), a 2-fold risk of lung cancer was also duly observed (AOR = 2.10; 95% C. I. = 1.39-3.17;p = 0.0004; FDR p= 0.001).
Supplementary Table S2 also shows that subjects who were carrying a single copy of variant allele (GA+AG) for the genotypic combination of SULT1A1 R 213 H and NAT2 × 5C K 268 R showed a protective effect towards lung cancer or no propensity towards lung cancer predisposition (AOR = 0.66; 95% CI: 0.44-0.99;p=0.004;FDR p = 0.21) as the frequency of subjects was less in cases as compared to controls (24.27 vs 31.17%).Likewise, for the combination of heterozygous and mutant genotypes as a single genotype (GA + AA + AG + GG), the frequency distribution suggested a higher representation of control subjects than cases (34.00 vs 25.10%).A significant, small odds ratio was obtained on risk analysis (AOR = 0.63; 95% C.I. = 0.43-0.94;p = 0.02; FDR p = 0.16).

Distribution of polymorphic phase II detoxification genes and their association with histological subtypes
The combinations of the different Phase II genes were subsequently sorted based upon histological subtypes to identify high-risk posing subgroups, as shown in Supplementary Table S3.Subjects diagnosed with ADCC for the group comprising both NQO1 P 187 S and NAT2 × 7 G 286 E had a significant difference in the distribution of heterozygotes between cases and controls (57.14 vs 43.64%).Logistic regression analysis revealed an approximately 2-fold increased risk of lung cancer development for subjects carrying both the heterozygous alleles (CT+GA) (AOR = 1.92; 95% C.I. = 1.14-3.24;p=0.01;FDR p= 0.02).A combination of patients who were heterozygous for both SULT1A1 R 213 H and NAT2 × 5B L 161 L SNPs revealed that the frequency of heterozygotes was almost similar both in SCLC (29.69%) and SQCC (29.63%) subtypes, and was less represented than the controls (41.36%).The subjects with heterozygous (GA+CT) (AOR = 0.49; 95% C.I. = 0.26-0.92;p= 0.025; FDR p = 0.17) and combined genotype (GA + AA + CT + TT) (AOR = 0.46; 95% C.I. = 0.25-0.87;p= 0.017; FDR p = 0.13) for SCLC subtype exhibited a protective effect towards lung cancer.Furthermore, we also observed that, in the case of SULT1A1 R 213 H and NAT2 × 5B L 161 L combination, the lung cancer patients who were carrying a single copy of the variant allele for both the polymorphisms (heterozygous; GA +AG) had a lower risk for developing SQCC(AOR = 0.45; 95% C.I. = 0.25-0.79;p= 0.006; FDR p = 0.03) and therefore was found to exhibit a protective effect towards lung cancer initiation.On the same lines, SQCC subtype for SULT1A1 R 213 H and NAT2 × 5C K 268 R together revealed a protective effect for lung cancer patients carrying the heterozygous (GA + AG) (AOR = 0.44; 95% C.I. = 0.24-0.82;p= 0.009; FDR p = 0.04) and combined genotype (GA + AA + AG + GG) (AOR = 0.42; 95% C.I. = 0.23-0.76;p= 0.005; FDR p = 0.03).

Genotypic distribution of triple combinations and combinations of four SNPs between Phase II detoxification genes (NQO1, SULT1A1, EPHXI, and NAT2) and their association with lung cancer predisposition
The present study also evaluated the triple polymorphic combinations and the four SNPs polymorphic combinations of 8 SNPs studied, as depicted in Table 2 and supplementary Table S4, respectively.The combined genotypes (heterozygous + mutant genotype) were evaluated as a single group since the number of subjects with a mutant genotype was more negligible or nonexistent.This combined genotype was then compared with the subjects with a wild-type genotype (reference) for all the combinations studied.

GMDR analysis and risk of lung cancer
This study employed the GMDR model, a data mining technique, to identify potential synergistic effects among the various SNPs.This approach helps to enhance and validate the results to some extent, and it also assists in statistically overcoming the small sample size since GMDR has no dimension limitations in the interaction analysis.The GMDR approach was used to analyze the gene-gene interactions for lung cancer to identify which characteristics were the best models for predicting high-risk subgroups.A total of eight SNPs were included in the study.Among the best multifactor models, a higher CVC and a minimum prediction error are preferred for the best results.As shown in Figure 1, the entropy dendrogram has been utilized to determine the visible interaction between the genes involved and the risk of lung cancer.The length of the lines linking two risk factors in the dendrogram determines their interaction.The longer the distance between two risk factors, the weaker the interaction.The color of the lines connecting the two polymorphisms depicts the degree of interaction.Red and orange lines depict synergy or a non-additive relationship between the two SNPs, the yellow line indicates independence or additivity, the brown line indicates weak interaction, and the green and blue lines denote loss of information, which can be interpreted as redundancy or correlation (for example, linkage disequilibrium).Table 4 displays the CVC, prediction error, and p-value calculated by the GMDR software for each factor evaluated (N = 1 to 8).As a result, in the one-factor model, it was deduced that NAT2 × 5B L 161  Figure 2 shows the epistasis interactions between lung cancer and healthy subjects between NQO1 P 187 S, EPHX1 Y 113 H, and NAT2 × 6 R 197 Q polymorphisms.In the three-factor model, the light grey bar in each cell shows individuals having lung cancer cases, and the white bar represents the frequency of healthy individuals.The high-risk genotype combinations are shown as dark grey cells, whereas the white light grey cells represent genotype combinations of the low-risk genotypes.Cells with no shading or white cells represent genotype combinations for which no data is observed.
Figure 2 shows the distribution of high-risk and low-risk genotypes in the best three-locus model.The distribution shows high-risk (dark shading) and low-risk (light shading) genotypes  associated with lung cancer in the three-locus interaction detected by MDR analysis.The percentage of lung cancer subjects (left black bar in boxes) and control subjects (right hatched bar in boxes) is shown for each three-locus genotype combination.Boxes were labelled as high-risk if the percentage of cases to controls met or exceeded the threshold of 1.0.Boxes were labelled as low risk if the threshold was not exceeded.This three-locus model is evidence of gene-to-gene interaction based on the high-risk and low-risk genotypes pattern.

GMDR and association with histological subtypes of lung cancer
Furthermore, the study subjects were comprehended for the best interaction model posing a higher risk of lung cancer based on histology.As shown in supplementary Table S5, the NAT2 × 5B L 161 L was the highest risk posing a single-factor model for SCLC patients, with a CVC of 10 and a minimum prediction error of 0.43 (p = 0.0021).
In the case of SCLC, the dendrogram demonstrates that NQO1 P 187 S and EPHX1 Y 113 H (×1 & X3) show the maximum synergy between them as they had the shortest line connecting them.The blue line that connects SULTIA1 R 213 H and NAT2 × 5B L 161 L is on the same branch (orange) that bifurcates into blue, indicating a redundant effect or association with disease phenotype.As shown in supplementary Table S6, the NAT2 × 7 G 286 E was the highest risk posing single factor model among ADCC subjects, with a CVC of 10 and minimum prediction error of 0.46 with a p-value of 0.005.The six-factor model (NQO1 ) with a CVC of 10/10 and prediction error of 0.4502, 0.4611 and 0.4386 was the best interacting model for SCLC (supplementary Table S5), ADCC (supplementary Table S6) and SQCC (supplementary Table S7) (p <0.0001) respectively, thus suggesting that contribution to the risk of the three histology's of lung cancer was due to the joint action of the six SNPs located in the three genes.The dendrogram analysis for ADCC subjects reveals synergy between EPHX1 Y 113 H and NAT2 × 5B L 161 L (×4 & X7) since they have the shortest line.SULTIA1 R 213 H, NQO1 P 187 S, and NAT2 × 5C K 268 R are all on the same line, showing a significant interaction.

GMDR analysis and role of gene-environment interaction in lung cancer risk
This study used GMDR analysis to evaluate Phase II detoxifying gene and smoking relationships.The findings of GMDR are shown in Table 3 and supplementary Table S8.The best interaction models for predicting lung cancer risk in smokers and non-smokers were also evaluated.As shown in Table 3, NAT2 × 5B L 161 L was the best model, with a CVC of 10 and a prediction error of 0.43 (p <0.0001), demonstrating a possible gene-environment interaction between NAT2 × 5B L 161 L and smoking.The second-best interaction model among smokers was a five-factor model that included 161 L, with CVC of 10 and prediction error of 0.45 (p <0.0001) (supplementary Table S8).NAT2 × 7 G 286 E, on the other hand, was the risk posing a single-factor model for non-smokers, yielding a maximum CVC of 10 and prediction error of 0.41.However, the two-factor model, consisting of NAT2 × 7 G 286 E and NAT2 × 5C K2 68 R, was the best interaction model, with a maximum CVC of 10 and a minimum prediction error of 0.40 (p = 0.0001).In the case of smokers, the dendrogram shows a powerful interaction and synergy between SULTIA1 R 213 H and EPHX1 H 139 R (×1 & X4) as they are joined together by the red line, but the blue line linking NAT2 × 7 G 286 E and NAT2 × 5B L 161 L (×6 &X7) indicates redundancy or no interaction (Figure 3).

CART analysis of Phase II detoxification genes
CART analysis is a tree-building technique that uses binary recursive partitioning to identify the different genotypic groups influencing lung cancer risk, which the logistic regression analysis cannot.A vast number of variables are considered, and high-risk categories are found.The analysis yielded a decision tree due to the data mining approach.This tree displayed many elements, their interactions, and the risk associated with the combinations.The root split is the most critical aspect.
CART analysis was performed on the genes involved in Phase II detoxification genes.As seen in Table 5, the tree demonstrates that NAT481C>T caused the initial split.The SNP is the most important interaction element.Table 5 displays the odds ratio and p-values of interaction subgroups.Individuals carrying the genotypic combination of Node 33 conferred the highest risk of lung cancer (OR = 24.35,95% CI = 2.94-201.26,p= 0.003) as shown in Table 5. Terminal node 26 also possessed a significantly high risk towards lung cancer (OR = 14.61,95%CI = 1.66-128.19,p= 0.01).Subjects with a genotypic combination of node 10 exhibited an odd of twelve folds toward cancer predisposition (OR = 12.17, 95% CI = 1.35-110.00,p=0.03).Subjects with the genotypic combination of nodes 5, 22, 38, and 64 exhibited an odd of seven-fold, as shown in Table 5.Further, nodes 1, 2, 16, 31 showed to exhibit a five-fold risk of lung cancer, and nodes 8, 28, 42,51 and nodes 6, 20, 36, 43, 46, 67 were found to exhibit an odd of four-fold and three-fold, respectively towards lung cancer risk (Table 5).
Further, interaction CART analysis was also performed to explore the risk based on different histological subtypes.Supplementary Table S9 shows the data of all the nodes for SCLC and supplementary Table S10 for ADCC.
Subgroups under the SCLC category and the genotypic combination of nodes 18, 39, and 45 had the highest risk with the predisposition of lung cancer with an odd of 78 (p=0.006), as shown in supplementary Table S9.In the ADCC subtype, supplementary Table S10 shows the genotypic combinations of nodes.The initial split was formed by NAT857 G>A.Subjects with genotypic combination of node 1 carrying NAT481C>T(W)/NAT857 G>A(M) exhibited an odd of two-folds (OR = 2.12, 95% CI = 1.30-3.45,p=0.002).Further, a CART analysis incorporating smoking status was also performed to explore geneenvironment interactions.Further, an inspection of the CART structure suggested specific risk genotypic combinations for smokers (Table 6) and non-smokers (supplementary Table S11).Terminal node 1 showed the highest risk of lung cancer in smokers with an odd of 28.7 (OR = 28.77,95% CI = 3.37-245.64,p=0.002).Node 27 possessed a significant high risk (OR = 20.92,95% CI = 4.21-103.93,p=0.0002).The genotypic combination of node two also possessed a higher risk of lung cancer in smokers with an odd of 15.69 (95% CI = 1.72-143.24,p=0.015).

Discussion
The Phase I and Phase II enzymes, are of immense importance in the context of cancer as they are involved in the metabolism of steroid hormones, chemical carcinogens, and other environmental toxicants.In phase-I, reaction substrates are frequently reduced, oxidized, or hydroxylated, giving more polar metabolites; cytochrome P450 (CYP) enzymes are the primary mediators in this Phase.(Guengerich, 1999) Phase-II conjugation processes typically follow phase-I metabolism.The exogenous or endogenous chemicals, as well as their phase I metabolites, are conjugated to a more polar molecule during Phase II, resulting in inactive and water-soluble molecules easily eliminated by urine or bile.(Yang et al., 1994;Turesky, 2004) The sulfotransferases (SULTs), N-acetyltransferases   (NATs), uridine diphosphate-glucuronosyltransferases (UGTs), Glutathione-S-transferases (GSTs), and methyltransferases are examples of conjugating enzymes.Although the combined Phase I and phase II metabolism is primarily the elimination and detoxification process, both phases bear the risk of producing toxic and highly reactive toxicants that can cause or promote significant health problems such as cancer (Windmill et al., 1997).As a result, the changes in metabolic enzyme activity can potentially increase exposure to carcinogenic chemicals and the risk of tumour formation.(Brockstedt et al., 2002;Justenhoven, 2012) In the current study, we have systematically assessed the association of polymorphism in phase II detoxification enzymes and their combinatorial impact on altering lung cancer susceptibility in patients administered platinum-based chemotherapy.To the best of our knowledge, it is the first study to investigate the role of drug-metabolizing enzymes in combination with their association with the risk of lung cancer.
The high-order gene-gene interaction was carried out between the different genotypes of Phase II detoxification genes (NQO1, SULT1A1, NAT2, and EPHX1).Our results depict the genetic combination of SULT1A1 R 213 H with NAT2 × 5B L 161 L, SULT1A1 R 213 H with NAT2 × 5C K 268 R, EPHX1 H 139 R and NAT2 × 5B L 161 L exhibit a protective effect towards lung cancer risk.A 2-fold risk of lung cancer for group EPHX1 Y 113 H and NAT2 × 7 G 286 E was observed.Furthermore, the genotypic combination of EPHX1 H139 R and NAT2 × 7 G286E also found an increased risk for the disease.Further, after stratification based on lung cancer, the group comprising NQO1 P 187 S and NAT2 × 7 G 286 E exhibited an almost two-fold risk for ADCC, whereas the combination SULT1A1 R 213 H and NAT2 × 5B L 161 L showed a reduced risk for development of SQCC as well as SCLC.On the same lines, the SQCC subtype for SULT1A1 R 213 H and NAT2 × 5C K 268 R together revealed a protective effect.The genotypic variants of SULT1A1 R 213 H and NAT2 × 7 G 286 E revealed an increased risk for the ADCC subtype.When the group EPHX1 Y 113 H and NAT2 × 6 R 197 Q were combined, there was a 2-fold risk of having SQCC subtype of Lung cancer.Further, EPHX1 Y 113 H in combination with NAT2 × 7 G 286 E polymorphism suggested an association with a significant ADCC risk and a two-fold risk of lung cancer development for SQCC patients.Along the same lines, the ADCC subtype for EPHX1 H 139 R and NAT2 × 7 G 286 E revealed a high risk for lung cancer patients.
Further, the triple combinations of Also, in our study, the different combinations of some SNPs were associated with lung cancer susceptibility.Our results in CART analysis overall depict that the primary split was by the 481C>T NAT2 polymorphism.In MDR, these genes were also found to affect lung cancer risk, showing significance.The five-factor model NQO1 P 187 S, EPHX1 Y 113 H, NAT2 × 6 R 197 Q, NAT2 × 5B L 161 L, NAT2 × 5C K 268 R was the best model with a CVC of 10/10, prediction error of 0.4345, and p <0.0001.A study conducted by Skjelbred et al., (2011) found that polymorphisms of GSTT1, EPHX1, MTHFR, MTR, and NAT2 affect the frequency of chromosome-type aberrations (CSAs), chromatid-type aberrations (CTAs), and chromatid gaps (CTGs), exhibit interaction with smoking and age differentially.Nevertheless, these polymorphisms were not evaluated in combination.Furthermore, another study by Timofeeva et al., (2010) associated MPO, GSTT1, GSTM1, GSTP1, EPHX1, and NQO1 polymorphisms with lung cancer risk was evaluated in individual SNPs.Similarly, the polymorphisms of Phase I and phase II metabolic enzymes in a study conducted by Mota et al., (2015) the GSTP1 and NAT2 polymorphisms were found to be significant based on histology and metastasis in males who were smokers.In a study on NSCLC smokers, Phase I and phase II genetic polymorphisms were analyzed on individual SNPs, but the analysis was not done in combination.(Zienolddiny et al., 2008) In a study conducted by Sørensen et al., (2005) they studied the association between genetic polymorphisms in CYP1B1, GSTA1, NQO1 and NAT2 and the risk of lung cancer.There was no overall relationship between the SNPs studied and lung cancer risk.The NAT2 fast acetylator genotype appeared to protect light smokers (less than 20 Cigarettes per day) against lung cancer but not heavy smokers (more than 20 cigarettes per day).(Sørensen et al., 2005) Another study investigated the effects on breast cancer risk of 585 SNPs in 68 genes in the XM (×enobiotics metabolizing) pathway.(Berrandou et al., 2019) The overall genetic diversity was correlated to breast cancer in premenopausal women, which was driven mainly by genetic variations in AKR1C2, ALDH1A3, CYP2C18, CYP2C19, and NAT2 genes.In postmenopausal women, no correlation was discovered.It was also observed that genetic variation in the XM route was linked to breast cancer in current and former smokers but not in never-smokers.(Berrandou et al., 2019) Thus, various studies have been conducted on different xenobiotics metabolizing enzymes, but none evaluate the role of these SNPs in combination with lung cancer risk.

Conclusion
To the best of our knowledge, the combinatorial association between Phase II detoxification enzymes and lung cancer risk has not been reported in patients undergoing platinum-based doublet chemotherapy.The genotypic combination of NAT2 × 7 G 286

Strengths of the study
The phase II detoxification genes were studied in order to understand the role of polymorphism in these genes towards lung cancer susceptibility.This is the first Indian study to highlight many Phase II detoxification genes altogether and their correlation with clinic-pathological factors.The stratified analysis was also performed for smoking and histology to find out high-risk posing subgroups.Our study is the first attempt in Phase II detoxification genes for lung cancer in the Indian population to use a multifactorial approach (CART and MDR) to explore the complex interaction between these genes and their relationship towards lung cancer risk.The present findings with a larger sample size for the overall and stratified group would help develop biomarkers both for diagnosis of lung cancer and prognosis of lung cancer.The polymorphisms in the Phase II genes are

Table 1 .
Minor allele frequencies and risk associated with single-locus sites among cases and controls.

Table 2 .
Distribution of genotypes depicting triple combination and their association with lung cancer risk.

Table 3 .
Multifactor Dimensionality Reduction (MDR) analysis showing gene-environment interactions (smokers) of Phase II detoxification variants with lung cancer risk.

Table 4 .
Multifactor Dimensionality Reduction (MDR) analysis showing interactions of Phase II detoxification variants with lung cancer risk.

Table 5 .
Overall risk estimates based on CART analysis of Phase II detoxification genes.

Table 6 .
Overall risk estimates based on CART analysis of Phase II detoxification genes for smokers.
NQO1 P 187 S, EPHX1 Y 113 H, and EPHX1 H 139 R; NQO1 P 187 S, EPHX1 Y 113 H, and NAT2 × 6 R 197 Q; NQO1 P 187 S, EPHX1 Y 113 H, and NAT2 × 7 G 286 E; SULT1A1 R 213 H, EPHX1 H 139 R, and NAT2 × 7 G 286 E suggested a two-fold increased risk of lung cancer for subjects.EPHX1 Y 113 H, EPHX1 H 139 R, and NAT2 × 7 G 286 E exist together; they expressed a 3-fold risk for the disease in patients.The genotypic combination of SULT1A1 R 213 H, NAT2 × 5B L 161 L, and NAT2 × 5C K 268 R showed a protective effect for lung cancer.Further, the interaction of 4 SNPs, the genotypic combination of NQO1 P 187 S, EPHX1 Y 113 H, EPHX1 H 139 R, and NAT2 × 6 R 197 Q; SULT1A1 R 213 H, EPHX1 Y 113 H, EPHX1 H 139 R, and NAT2 × 7 G 286 E showed a 3.6-fold increased risk of developing lung cancer in subjects.NQO1 P 187 S, EPHX1 Y 113 H, EPHX1 H 139 R, and NAT2 × 7 G 286 E, when considered in one group, depicted a 5-fold higher risk for lung cancer.A 2.7-fold increased lung cancer risk was observed for the polymorphic group of NQO1 P 187 S, EPHX1 Y 113 H, NAT2 × 6 R 197 Q, and NAT2 × 7 G 286 E.
E with EPHX1 Y 113 H/EPHX1 H 139 R exhibited an increased risk for lung cancer.The genotypic combination of NAT2 × 7 G 286 E with NQO1 P 187 S/SULT1A1 R 213 H/EPHX1 Y 113 H/EPHX1 H 139 R was found to exhibit an almost two-fold risk for the ADCC subtype.Group EPHX1 Y 113 H and NAT2 × 6 R 197 Q/NAT2 × 7 G 286 E revealed a 2-fold risk of having SQCC subtype of lung cancer.The combination of NQO1 P 187 S, EPHX1 Y 113 H, and EPHX1 H 139 R/NAT2 × 6 R 197 Q/NAT2 × 7 G 286 E; SULT1A1 R 213 H, EPHX1 H 139 R, and NAT2 × 7 G 286 E suggested a two-fold increased risk of lung cancer for subjects.EPHX1 Y 113 H, EPHX1 H 139 R, and NAT2 × 7 G 286 E exist together; they expressed a 3-fold risk for the disease in patients.Further, the genotypic combination of EPHX1 Y 113 H, EPHX1 H 139 R, NAT2 × 6 R 197 Q and NQO1 P 187 S; EPHX1 Y 113 H, EPHX1 H 139 R, NAT2 × 7 G 286 E and SULT1A1 R 213 H/NQO1 P 187 S showed a 3.6, 3.6-and 5-fold increased risk of developing lung cancer.A 2.7-fold increased lung cancer risk was observed for the polymorphic group of NQO1 P 187 S, EPHX1 Y 113 H, NAT2 × 6 R 197 Q, and NAT2 × 7 G 286 E.