Inferring biased allele expression across the genome
Biased allele expression, refers to the imbalanced expression of the two alleles in a diploid genome. Unequal transcription of alleles may occur due to cis-regulatory element variation or allele-specific epigenetic modifications. Allelic imbalance is associated in human diseases such as metabolic disorders and in ovary and breast cancers. However, allelic imbalance may be incorrectly inferred due to technical variation inherent in RNA-Seq data, including read depth, reference mapping bias, and the overdispersion of reads. To correct for technical variation we develop a logistic regression model with a mixed effects approach to combine information regarding biased allele expression from many individuals in a population, and across multiple genes. Simulations show that our method does not suffer from an excess of false-positives when inferring biased allele expression while standard ASE methods (a SNP-wise binomial test and a binomial- based logistic regression) test showed an excess of inflated p-values in the quantile-quantile plots. Further, we conducted additional simulations to predict the power of the method to detect the possible range of biased allele expression under assumptions of variable numbers of SNPs per gene and under variable depth of coverage. We then applied this method to inferring biased allele expression across the genome in 89 lymphoblastoid cell lines samples from a Central European Utah population, and are able to more accurately detect modest degrees of allelic imbalance.