Publishing without Publishers: A Decentralized Server Network for Scientific Data
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
We propose a server network based on nanopublications and trusty URIs for publishing, retrieving, and reusing semantic data.
There exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. To solve this problem, we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. We present a protocol and a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data with formal semantics. We show how this approach allows researchers to produce, publish, retrieve, address, verify, and recombine datasets and their individual nanopublications. Due to the use of trusty URIs, which include cryptographic hash values of the content they represent, all content in the network is verifiable and immutable. Our evaluation of the current small network shows that this system is efficient and reliable, and we discuss how it could grow to handle the large amounts of structured data that modern science is producing and consuming. We believe that this network can serve as a solid basis for semantic publishing and could contribute to improve the availability and reproducibility of scientific results.