Analysis of Distributional Variation Through Graphical Multi-Scale Beta-Binomial Models

Version 2 2019-04-01, 11:10

Version 1 2017-11-27, 15:06

dataset

posted on 2019-04-01, 11:10 authored by Li Ma, Jacopo Soriano

Many scientific studies involve comparing multiple datasets collected under different conditions to identify the difference in the underlying distributions. A common challenge in these multi-sample comparison problems is the presence of overdispersion, or extraneous causes other than the conditions of interest that also contribute to the cross-sample difference, which frequently results in false findings—identified “differences” not replicable in follow-up studies. When proper replicate samples are available under the conditions, one can in principle identify the interesting distributional variation from overdispersion through what we call the “analysis of distributional variation” (ANDOVA). We introduce a fully probabilistic framework for ANDOVA that achieves high computational efficiency. We take a divide-and-conquer multi-scale inference strategy: (i) first transform a general nonparametric ANDOVA task into a collection of ANDOVA tasks on Binomial experiments—each characterizing variations in the distributions at a particular location and scale, (ii) address each Binomial ANDOVA using a Beta-Binomial (BB) model, and (iii) use hierarchical graphical modeling to combine the inference from the BB models. We derive efficient MCMC-free Bayesian inference recipe under this framework through a combination of Laplace approximation-based numerical integration and message passing, and evaluate the performance of our method through extensive simulation. We apply the framework to analyzing DNase-seq data for identifying differences in transcriptional factor binding. Supplementary material for this article is available online.