Decentralized nonparametric multiple testing

2018-08-10T06:35:49Z (GMT) by Subhadeep Mukhopadhyay
<p>Consider a big data multiple testing task, where, due to storage and computational bottlenecks, one is given a very large collection of <i>p</i>-values by splitting into manageable chunks and distributing over thousands of computer nodes. This paper is concerned with the following question: How can we find the <i>full data multiple testing solution</i> by operating completely independently on individual machines in parallel, without <i>any</i> data exchange between nodes? This version of the problem tends naturally to arise in a wide range of data-intensive science and industry applications <i>whose methodological solution has not appeared in the literature to date</i>; therefore, we feel it is necessary to undertake such analysis. Based on the nonparametric functional statistical viewpoint of large-scale inference, started in Mukhopadhyay, S. [(2016), ‘Large Scale Signal Detection: A Unifying View’, <i>Biometrics</i>, 72, 325–334], this paper furnishes a new computing model that brings unexpected simplicity to the design of the algorithm which might otherwise seem daunting using classical approach and notations.</p>