Dataset for: A note on statistical repeatability and study design for high-throughput assays

2017-03-29T12:24:00Z (GMT) by George Nicholson Chris C. Holmes
Characterizing the technical precision of measurements is a necessary stage in the planning of experiments and in the formal sample size calculation for optimal design. Instruments that measure multiple analytes simultaneously, such as in high-throughput assays arising in biomedical research, pose particular challenges from a statistical perspective. The current most popular method for assessing precision of high-throughput assays is by scatterplotting data from technical replicates. Here we question the statistical rationale of this approach from both an empirical and theoretical perspective, illustrating our discussion using four example data sets from different genomic platforms. We demonstrate that such scatterplots convey little statistical information of relevance and are potentially highly misleading. We present an alternative framework for assessing the precision of high-throughput assays and planning biomedical experiments. Our methods are based on \emph{repeatability} -- a long-established statistical quantity also known as the intraclass correlation coefficient. We provide guidance and software for estimation and visualization of repeatability of high-throughput assays, and for its incorporation into study design.