COCI N-Triples dataset of all the citation data
datasetposted on 07.09.2020 by OpenCitations
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This dataset contains all the citation data (in N-Triples format) included in COCI, released on 6 September 2020. In particular, any citation in the dataset, defined as an individual of the class cito:Citation, includes the following information:
- [citation IRI] the Open Citation Identifier (OCI) for the citation, defined in the final part of the URL identifying the citation (https://w3id.org/oc/index/coci/ci/[OCI]);
- [property "cito:hasCitingEntity"] the citing entity identified by its DOI URL (http://dx.doi.org/[DOI]);
- [property "cito:hasCitedEntity"] the cited entity identified by its DOI URL (http://dx.doi.org/[DOI]);
- [property "cito:hasCitationCreationDate"] the creation date of the citation (i.e. the publication date of the citing entity);
- [property "cito:hasCitationTimeSpan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity);
- [type "cito:JournalSelfCitation"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal);
- [type "cito:AuthorSelfCitation"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).
This version of the dataset contains:The size of the zipped archive is 36.6 GB, while the size of the unzipped N-Triples file is 780 GB.
- 733,367,140 citations;
- 59,455,917 bibliographic resources.