COCI N-Triples dataset of the provenance information of all the citation data
datasetposted on 07.09.2020 by OpenCitations
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This dataset contains the provenance information (in N-Triples format) of all the citation data included in COCI, released on the 6 September 2020. In particular, any citation in the dataset includes the following provenance information:
The size of the zipped archive is 38.7 GB, while the size of the unzipped N-Triples file is 1.6 TB.
- [citation IRI] the Open Citation Identifier (OCI) for the citation, defined in the final part of the URL identifying the citation (https://w3id.org/oc/index/coci/ci/[OCI]);;
- [property "prov:wasAttributedTo"] the IRI of the agent that have created the citation data;
- [property "prov:hadPrimarySource"] the IRI of the source dataset from where the citation data have been extracted;
- [property "prov:generatedAtTime"] the creation time of the citation data.
Additional information about COCI can be retrieved in the official webpage.