2015-08-13-jones-esa-reproducible-science.pdf (4.61 MB)

Reproducible science via semantics and provenance for ecological data

Download (0 kB)
posted on 17.08.2015 by Matthew Jones, Christopher Jones, Lauren Walker, Peter Slaughter, Benjamin Leinfelder

Reproducitbility is critical for ecology and allied disciplines because of both the inherent complexity of the inference chain needed to advance ecology as well as the importance of ecological results to challenges important to society. At a minimum, scientific reproducibility requires the ability to locate, understand, and access the data and visualization products of analysis and modeling that form the basis for scientific conclusions. However, the majority of methods sections in journal publications are inadequate for reproducibility, and current data repositories generally support only minimal descriptions of data provenance and semantics. Through usability studies and design sessions, we designed and implemented new software for use by ecological scientists that enables tracking data inputs and outputs of analyses, storing and documenting software, and showing data derivation history for new data produced in synthesis and analysis. In addition, we extended the KNB Data Repository and the DataONE systems to support publishing these provenance annotations to aid in data discovery and interpretation.

Tracking provenance and semantics for heterogeneous data improves the accessibility of data, as it allows scientists to link data and analytical products to the manuscripts that present these results and to the computational processes that produced them. The DataONE provenance and semantics system prototypes the ability to search for source and derived data products, as well as label those products with appropriate semantics. Within this system, each data product is linked to the source data from which it was derived, and to the computational processes which were used to transform it. We demonstrate software improvements that provide a rich environment for understanding the context of ecological research conclusions, and that provide an effective means to archive the complete computational record for ecological research studies.