Introduction to Computational Reproducibility (and why we care)
2017-01-03T15:06:42Z (GMT) by
Also on SpeakerDeck, for nicer viewing.
Introduction & motivation on the first day of the workshop "Essential skills for reproducible research computing," at Universidad Técnica Federico Santa María (January 2017).
The American Physical Society (APS) in its Ethics & Values document (1999) explains their position on "What is Science?" It says: "The success and credibility of science are anchored in the willingness of scientists to […] Expose their ideas and results to independent testing and replication by others. This requires the open exchange of data, procedures and materials."
The journal Nature published a News Feature article on October 2010 discussing the failings of computational science. The article mentions that coding problems can sometimes cause substantial harm, and have forced some scientists to retract papers. It tells the story of a structural-biology group at Scripps Institute, led by Geoffrey Chang ... in 2006, the team realized that a code they were using had a sign error, which reversed two columns of data, causing their protein structures to be completely wrong.
Screenshot from Science, 22 December 2006.
Chang and co-authors were forced to retract five papers published in Science, the Journal of Molecular Biology and the Proceedings of the National Academy of Sciences, between the years 2001 and 2005.
Quote from Nature (2010):
As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software …
Quote from Nature (2010):
"There are terrifying statistics showing that almost all of what scientists know about coding is self-taught," says Wilson. "They just don't know how bad they are."
The Nature piece quotes Greg Wilson, leader of the “Software Carpentry” workshop series ... he ran an online survey in 2008 of 2,000 researchers working with computers in one way or another.
– only 47% of scientists have good understanding of software testing
– only 34% of scientists think that formal training in developing software is important
– 38% of scientists spend at least 1/5 of their time developing software
And it continues to happen. A paper in the Journal of Clinical Oncology, published online in January 2016 (March 2016 in print) contained analysis that mislabeled a data in a column, affecting how a substantial set of clinical results from 1990 to 2008 entered into the conclusions. Some of the conclusions were incorrect and the paper had to be retracted.
The principal investigator said that the coding error was made by a doctoral student, but gave no specifics.
You could say that this is just bad luck, that the PI can’t really have avoided this, mistakes happen, etc. But the fact is that there are engineering practices to ensure quality of research software that could have prevented this: these practices are part of what we call “Reproducible Research” and include version control, code reviews, code testing, study replication, and others.
See Retraction Watch.
Screenshot from The New York Times: "The Excel Depression"
Two economists at Harvard University, Carmen Reinhart and Kenneth Rogoff, published a study in 2010 titled “Growth in a time of Debt,” suggesting a negative effect on growth from the national debt. It appeared in a non peer-reviewed issue of the American Economic Review.
The main conclusion was that average annual growth was –0.1 % in countries with episodes of gross government debt equal to 90 % or more of GDP between 1945 and 2009.
The Reinhart-Rogoff study came out just after Greece fell into crisis, and it was widely cited by fiscal-conservative politicians to call for austerity measures.
Nobel-prize winner Paul Krugman called it “the most influential economic analysis of recent years.”
Critics of the article rightly pointed out that it could be a case of “reverse causation,” that is, it is not the debt that impacts negatively on growth, but that low growth leads to high debt.
Soon, a more serious problem—other researchers tried to replicate the Reinhart-Rogoff study with similar data, but could not reach a similar finding.
Screenshots from Business Insider and The Wall Street Journal.
University of Massachusetts graduate student Thomas Herndon started a replication exercise for an econometrics term paper. After repeated failures to replicate, he approached Reinhart and Rogoff to ask for their data and their spreadsheet, and they provided it.
Herndon found that 5 out of 20 countries had been left out of the calculation, due to a botched formula in the spreadsheet.
Screenshot from Genome Biology.
A spreadsheet error was not the only problem in the Reinhart and Rogoff study—there was omission of some countries from the analysis and questionable statistical analysis to boot.
But in other fields, Microsoft Excel has wreaked havoc.
In genomics, researchers estimate that 1 out of 5 publications using Excel for gene lists contain errors. The problem here is that Excel automatically converts some gene names to other formats, like dates or floating point numbers.
The gene SEPT2 (Septin 2) gets converted to the date 2-Sept, and the identifier ‘2310009E13’ gets converted to a floating-point number of order 10 to the 13th power.
See Genome Biology.
Screenshot of a tweet by Philip B. Stark, Professor of Statistics, University of California Berkeley.
One of the first milestones of the Reproducibility movement was the “Yale Roundtable,” which resulted in a jointly-authored “Data and Code Sharing Declaration.”
About 30 experts got together ... their fields: computer science, applied mathematics, law, biostatistics, information sciences, astronomy, biochemistry.
Screenshot from Science magazine.
On December 2011, Science had a special issue on Replication & Reproducibility.
Screenshot from R. Peng's article in that issue.
The standard of reproducibility calls for the data and the computer code used to analyze the data be made available to others.
... aim of the reproducibility standard is to fill the gap in the scientific evidence-generating process between full replication of a study and no replication
... a study may be more or less reproducible than another depending on what data and code are made available
... A critical barrier to reproducibility in many cases is that the computer code is no longer available.
We aim to carry out all research with attention to reproducibility, making all research code open-source and publishing data, plotting scripts, figures and cite our figshare repository when including the figure in the paper ... all in the aim of facilitating reproducibility of our results. We include a reproducibility statement in the papers.
We created this course to share what we’ve learned from years of thinking about reproducibility in computational science.
I provides an introduction to the tools and techniques that we consider fundamental for responsible use of computers in scientific research.
Syllabus of the workshop