An illustration of how opportunities for false positives can mount up when one has a
large dataset and a flexible approach to analysis. Consider an investigator who
is interested in testing whether handedness is associated with attention
deficit hyperactivity disorder (ADHD), and has access to a large dataset that
has measures of both hand skill and hand preference at 6 years and 10 years, as
well as demographic information. Suppose further that there is no true
association in the population. The green node corresponds to the two-group
comparison, where probability of obtaining a p-value < .05 is 1 in 20. An investigator
who compares ADHD (A) and typical (T) children (green node) may be disappointed
to find the comparison is nonsignificant. But the investigator may be tempted
to do a subgroup analysis, because the association looks different in older (O)
and younger children (Y) (purple nodes). He might then realise that results vary
for measures of hand skill (S) vs hand preference (P) (blue nodes). He could
then decide to subdivide the sample by gender (M vs F) (orange nodes) and
according to whether children are from urban (U) rather than rural (R) areas (khaki
nodes). If all these combinations of possibilities were to be considered, the chance
of a finding at least one ‘significant’ result rises to 1 - .95^16 = .56 (assuming the different choices are independent). When
doing numerous comparisons, there are legitimate methods for adjusting p-values
to guard against false positives, but in practice these are often ignored. If the researcher presents what looks like a
simple two-group comparison (e.g. ADHD and Typical young, urban females differ
on a measure of relative hand skill), which is in reality selected from a wide
range of possible contrasts, then uncorrected p-values can be highly misleading. The
figure title is inspired by Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no 'fishing expedition' or 'p-hacking' and the research hypothesis was posited ahead of time. www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf. Figure created by Tim Brock of DataToDisplay.com.