figshare
Browse
poster-2017.pptx (2.09 MB)

End to End Provenance

Download (2.09 MB)
poster
posted on 2017-02-09, 16:56 authored by Barbara LernerBarbara Lerner
Data provenance is the history of a digital artifact, from the point of collection to its present
state. Provenance includes a precise specification of a scientist’s input data and the programs or procedures applied to the data. Most computational platforms do not record such data provenance, making it difficult to ensure reproducibility, to examine decisions made during data analysis, or even to identify errors in data handling. This project addresses this problem through the development of tools that transparently and automatically capture data provenance as part of a scientist’s normal computational workflow.

This project includes the design, development, deployment, and evaluation of an end-to-end system (eeProv) that encompasses the range of activity from original data analysis by domain scientists to management and analysis of the resulting provenance in a common framework with common tools.  The tools that we are integrating to provide end-to-end provenance include programming tools (R and Python), databases (using Core Provenance Library) and operating system artifacts (using CamFlow), allowing us to track the provenance of data both internally to data analysis tools and as the results of one data analysis tool flow to other tools.

Funding

NSF Award #1450356

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC