posterposted on 15.02.2017, 19:17 authored by Marzieh AyatiMarzieh Ayati
In recent years, genome-wide association studies (GWAS) have successfully identified loci that harbor susceptible genetic variants for a large number of complex diseases. However, susceptible loci identified by GWAS so far generally account for a limited fraction of the genotypic variation in patient populations. Predictive models based on identified loci also have modest success in classifying phenotype (risk assessment) and therefore are of limited practical use. More recently, there has been considerable attention on identifying epistatic interactions; i.e., the improved association of pairs of loci with the disease as compared to the aggregation of the two individual loci. However, the large number of pairs to be tested for epistasis poses significant challenges, in terms of both computational (runtime) and statistical (multiple hypothesis testing) considerations. Here, we propose a new criterion, termed difference in allele distributions (DAD), for assessing the collective association of multiple genetic variants with a disease of interest. DAD is based on the comparison of the distribution of interested alleles for a set of genomic loci among case and control samples, using Kolmogorov-Smirnov statistics. This formulation of the coordination among multiple variants allows employment of efficient heuristic algorithms for the identification of sets of multiple loci that are collectively associated with disease. This formulation also enables application of permutation tests for empirical assessment of statistical significance, thereby directly correcting for multiple hypotheses. We test the proposed method on two independent data sets for two complex diseases, Psoriasis and Type 2 Diabetes (T2D), in terms of the statistical significance of identified associations and the performance of resulting risk assessment models. Our results show that, as compared to individual variants, multi-variant features provide better predictive performance in risk assessment and they are also more reproducible.