figshare
Browse
PAV lineage - dataset example.pdf (7.65 kB)

Dataset versions and their provenance

Download (0 kB)
figure
posted on 2014-01-05, 03:25 authored by Stian Soiland-ReyesStian Soiland-Reyes

Example of using PAV to version datasets, showing the provenance of each individual version.

From the blog post Tracking versions with PAV.

In this example, dataset-1.0.0.csv has been pav:importedFrom survey.xls, i.e. probably saved from Excel (the software can be specified using pav:createdWith). The Excel file was imported from an SPSS survey data file, but in addition had a pav:sourceAccessedAt the survey form (e.g. the creator looked up more descriptive column headers).

For dataset-1.1.0.csv we (as humans) can see the minor version has been incremented, and that it has a different provenance, this version was imported from dataset.xlsx, which has been pav:derivedFrom the earlier survey.xls (indicating that the spreadsheet have evolved significantly). The data was imported frm a different survey2.spv (which might or might not be related to survey.spv), but still accessed the same surveyform.docx.

For dataset-2.0.0.csv the provenance is quite different, this time the scientist has simply used Survey Monkey rather than SPSS to manage their survey, and have published its exported CSV. Presumably this dataset is quite different in its CSV structure and/or question asked, as it has gained a new major version to become 2.0.0.

 

Figure created using Lucidchart.


History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC