3 files

A meta-analysis of bioinformatics software benchmarks reveals that publication-bias unduly influences software accuracy

posted on 2016-12-08, 23:46 authored by Paul GardnerPaul Gardner, James M. Paterson, Fatemeh Ashari Ghomi, Sinan U. Umu, Stephanie McGimpsey, Aleksandra PawlikAleksandra Pawlik
Computational biology has provided widely used and powerful software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for mathematical completeness must be employed. We are interested in whether trade-offs between speed and accuracy are reasonable. Also, what factors are indicative of accurate software? In this work we mine published benchmarks of computational biology software, we collect data on the relative accuracy and speed of different software and then test to see what factors influence accuracy e.g. speed, author reputation, journal impact or recency. We found that author reputation, journal impact, the number of citations, software speed and age are not reliable predictors of software accuracy. This implies that useful bioinformatics software is not only the domain of famous researchers, and that any researchers are capable of producing good software. In addition, we found that there exists an excessive number of slow and inaccurate software tools across multiple sub-disciplines of bioinformatics. Meanwhile, there are very few tools of middling accuracy and speed. We hypothesise that a strong publication bias is unduly influencing the publication and development of bioinformatic software tools. In other words, at present software that is not highly ranked on speed and not highly ranked on accuracy is difficult to publish due to editorial and reviewer practices. This leaves an unfortunate gap in the literature upon which future software refinements could be constructed.


Rutherford Discovery Fellowship


Usage metrics



    Ref. manager