figshare
Browse
.GZ
ar.tar.gz (12.88 GB)
.GZ
br.tar.gz (11.79 GB)
.GZ
id.tar.gz (5.95 GB)
.GZ
ra.tar.gz (10.44 GB)
.GZ
re.tar.gz (2.15 GB)
1/0
5 files

OpenCitations Meta RDF dataset of all bibliographic metadata and its provenance information

Version 7 2024-06-17, 13:33
Version 6 2024-04-06, 20:33
Version 5 2023-10-25, 22:32
Version 4 2023-06-28, 13:00
Version 3 2023-02-15, 12:50
Version 2 2022-12-20, 18:27
Version 1 2022-12-19, 13:34
dataset
posted on 2024-06-17, 13:33 authored by OpenCitations ​OpenCitations ​

Compared to the previous version, this release includes metadata related to citing and cited bibliographic resources added in the March 2024 version of Crossref.

This dataset contains all the bibliographic metadata and its provenance information (in JSON-LD format) included in OpenCitations Meta. The data and the provenance are organized through a complex structure of folders and subfolders, allowing you to quickly find any entity from its URI. The first level consists of the following folders, provided compressed and separately:


The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/1000.zip

This version of the dataset contains:

  • 116,605,079 bibliographic entities
  • 348,844,164 authors and 2,561,339 editors (counted by their roles, without disambiguating individual
  • 724,563 publication venues
  • 242,362 publishers

The compressed archives total 44GB, using the tar.gz compression algorithm, and expand to 145G when decompressed. The JSON-LD files inside the archives are further compressed using the zip algorithm. It is recommended to process these inner files as compressed without extracting them, to manage data more efficiently.

Additional information about OpenCitations Meta at the official webpage.


Funding

OpenAIRE-Nexus Scholarly Communication Services for EOSC users

European Commission

Find out more...

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC