10.6084/m9.figshare.5230639 Tobias Kuhn Tobias Kuhn Egon Willighagen Egon Willighagen Chris Evelo Chris Evelo Nuria Queralt-Rosinach Nuria Queralt-Rosinach Emilio Centeno Emilio Centeno Laura I. Furlong Laura I. Furlong Reliable Granular References to Changing Linked Data Springer Nature 2017 Linked Data Nanopublications RDF identifiers versioning Bioinformatics metadata data interoperability incremental dataset de nitions data provenance tools Data provenance decontextualization Linked Data versioning index nanopublications 2017-09-07 16:01:20 Dataset https://springernature.figshare.com/articles/dataset/Reliable_Granular_References_to_Changing_Linked_Data/5230639 This file set contains the Git repository and resulting datasets for the computational analyses used in the associated publication: Reliable Granular References to<br>Changing Linked Data.<br><div><br></div><div>The data is supplied in compressed .zip and .gz formats that can be uncompressed by standard compression utilities. The compressed files contain incremental datasets of nanopublications from both DisGeNET and WikiPathways, including TriG RDF graphs for each, along with the Git repository containing scripts, diagrams, background literature, output data and results files.</div><div><br></div><div>Background from associated publication:</div><div><div>Nanopublications are tiny packages of Linked Data that come with provenance and metadata attached, they are also a concept to represent Linked Data in a granular and provenance-aware manner, which has been successfully applied to a number of scientific datasets. We demonstrated in previous work how we can establish reliable and verifiable identifiers for nanopublications and sets thereof. Further adoption of these techniques, however, was probably hindered by the fact that nanopublications can lead to an explosion in the number of triples due to auxiliary information about the structure of each nanopublication and repetitive provenance and metadata. We demonstrate here that this significant overhead disappears once we take the version history of nanopublication datasets into account, calculate incremental updates, and allow users to deal with the specific subsets they need. We show that the total size and overhead of evolving scientific datasets is reduced, and typical subsets that researchers use for their analyses can be referenced and retrieved efficiently with optimized precision, persistence, and reliability.</div></div><div><br></div>