The Garden of Forking Paths

figure

posted on 2016-03-14, 17:31 authored by Dorothy V M BishopDorothy V M Bishop

An illustration of how opportunities for false positives can mount up when one has a large dataset and a flexible approach to analysis. Consider an investigator who is interested in testing whether handedness is associated with attention deficit hyperactivity disorder (ADHD), and has access to a large dataset that has measures of both hand skill and hand preference at 6 years and 10 years, as well as demographic information. Suppose further that there is no true association in the population. The green node corresponds to the two-group comparison, where probability of obtaining a p-value < .05 is 1 in 20. An investigator who compares ADHD (A) and typical (T) children (green node) may be disappointed to find the comparison is nonsignificant. But the investigator may be tempted to do a subgroup analysis, because the association looks different in older (O) and younger children (Y) (purple nodes). He might then realise that results vary for measures of hand skill (S) vs hand preference (P) (blue nodes). He could then decide to subdivide the sample by gender (M vs F) (orange nodes) and according to whether children are from urban (U) rather than rural (R) areas (khaki nodes). If all these combinations of possibilities were to be considered, the chance of a finding at least one ‘significant’ result rises to 1 - .95^16 = .56 (assuming the different choices are independent). When doing numerous comparisons, there are legitimate methods for adjusting p-values to guard against false positives, but in practice these are often ignored. If the researcher presents what looks like a simple two-group comparison (e.g. ADHD and Typical young, urban females differ on a measure of relative hand skill), which is in reality selected from a wide range of possible contrasts, then uncorrected p-values can be highly misleading.
The figure title is inspired by Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no 'fishing expedition' or 'p-hacking' and the research hypothesis was posited ahead of time. www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf.
Figure created by Tim Brock of DataToDisplay.com.

History

Usage metrics

Keywords

Data-sharing Open Science Reproducibility Data dredging p-hacking Peer Review Journal policy Ethics Psychology not elsewhere classified Science Policy

Licence

CC BY 4.0

The Garden of Forking Paths

History

Usage metrics

Categories

Keywords

Licence

Exports