Using statistics to detect potential data fabrication (WCRI 2015)

<p>These are the slides of my (CHJH) presentation at the World Conference on Research Integrity (WCRI) 2015 in Rio de Janeiro, Brazil. My presentation took place on June 1st, 2015. Below is the short summary which is also submitted for the proceedings of WCRI2015.</p> <p> </p> <p>Please note that these methods are being developed for research goals and results only indicate <strong>potential</strong> data fabrication.</p> <p>**</p> <p>Due to several (unconscious) heuristics, people are bad at appreciating randomness as it appears in nature. For instance, the gambler's fallacy refers to the tendency shown to expect too many alternation in sequences of independent random events, like coin tosses. Because data fabricators are not immune to these heuristics, the statistics they fabricate might not satisfy regularities that can be expected from random processes. Indeed, in several cases of data fabrication, fabricators consistently produced less variation than would be expected based on random sampling.</p> <p>If fabricators produce highly consistent effects throughout supposedly independent samples, the p-value distribution based on comparisons of the core descriptive statistics (Ms and SDs) will be affected. If there is no fabrication and the null hypothesis is true, the p-value distribution is expected to be uniform between 0-1; if there is a non-null population effect, the p-value distribution is skewed to the right (i.e., a bulk of small p-values). However, in the case of highly consistent, fabricated data, the p-value distribution could become skewed to the left or bimodal (i.e., too many high p-values). To test for the presence of such indicators of potential data fabrication, a reversed Fisher method can be applied (i.e., $\chi^2_{2k}=-2\sum\limits^k_{i=1}ln(1-p_{i})$). This statistical method can be used to test for highly or too similar condition means and condition variances.</p> <p>The diagnostic value of the reversed Fisher method or other statistical methods to detect potential data fabrication has not been previously studied. Diagnostics include the degree to which these methods correctly classify studies as being fabricated or not. Application of the methods on a set of assumably genuine data indicated that 8% of the results were misclassified as fabricated (alpha = .05). Simulations indicated that different data fabrication strategies were detected to varying degrees, ranging from approximately 25% through 100% for very blatant fabrication techniques. Considering that the knowledge on how data is fabricated by researchers is anecdotal, experimental studies to test the validity of these methods are planned for the next academic year.</p> <p>Validated statistical methods to detect potential data fabrication enable application for studying misconduct on the basis of published research. More specifically, when combined with text-mining methods to extract statistical information from papers, these methods can be used to estimate prevalence rates of potential data fabrication. Whereas previous estimates of misconduct prevalence mostly relied on author’s self-report admission, these novel prevalence estimates operate at the paper level and can estimate the percentage of potentially fabricated research papers, which is a more precise measure of how problematic data fabrication is for science. Additionally, these methods could be used alongside plagiarism scanners to red-flag potentially problematic papers.</p>