<i>P</i> values in display items are ubiquitous and almost invariably significant: A survey of top science journals

2018-05-15T17:36:46Z (GMT) by Ioana Alina Cristea John P. A. Ioannidis
<div><p><i>P</i> values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of <i>P</i> values. We conducted a survey comparing the overall use of <i>P</i> values and the occurrence of significant <i>P</i> values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant <i>P</i> values. Our findings demonstrated substantial and growing reliance on <i>P</i> values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of <i>P</i> values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on <i>P</i> values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported <i>P</i> values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported <i>P</i> values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on <i>P</i> values and implausibly high rates of reported statistical significance are worrisome.</p></div>