Supplementary Material for: Efficient Identification of Null-Allele Single Nucleotide Polymorphism Markers

2015-11-28T00:00:00Z (GMT) by Özbek U. Feingold E. Weeks D.E.
<b><i>Objectives:</i></b> At the beginning of a genome-wide association study, many markers are discarded because they fail to meet standard quality control criteria. Some of these markers are out of Hardy-Weinberg equilibrium (HWE) because they have ‘null alleles' (which may be deletions or third alleles that do not hybridize to standard probes). It may be useful to identify null-allele markers so that they can be analyzed under different models or in order to explore regions of copy number variation. <b><i>Methods:</i></b> We present a model for the chip-based genotype data that are produced when a null-allele single nucleotide polymorphism (SNP) is genotyped under standard (2-allele) assumptions. We show that this model can be combined with the standard HWE model to develop classification procedures based on the supervised learning algorithms Support Vector Machines (SVM), Classification and Regression Trees (CART) or Random Forests for identifying null-allele SNPs. <b><i>Results:</i></b> We report a list of null-allele SNPs we identified on the Illumina 660W-Quad chip and provide suggestions for applying our CART model to other SNP sets. <b><i>Conclusions:</i></b> Properly identified null-allele SNPs can be used to test for genotype-phenotype associations or to identify regions which may contain copy number variants.