figshare
Browse

OpenCitations Meta CSV dataset of all bibliographic metadata

Download (11.75 GB)
Version 11 2025-02-13, 09:17
Version 10 2025-02-02, 18:02
Version 9 2024-06-17, 13:33
Version 8 2024-04-06, 20:14
Version 7 2024-02-16, 11:36
Version 6 2023-11-30, 12:59
Version 5 2023-10-22, 11:55
Version 4 2023-06-28, 12:48
Version 3 2023-02-15, 12:50
Version 2 2022-12-20, 17:27
Version 1 2022-12-19, 13:35
dataset
posted on 2025-02-13, 09:17 authored by OpenCitations ​OpenCitations ​

Compared to the previous version, this release includes metadata related to citing and cited bibliographic resources added in the November 2024 version of Crossref, as well as the November 2024 dump of JaLC (Japan Link Center).

In this version, we have focused on correcting a specific type of error, namely the erroneous duplication of resources with the same identifier. We have successfully merged:

  • 100% of duplicated identifiers (datacite:Identifier)
  • 100% of duplicated responsible agents (foaf:Agent)
  • 70% of duplicated bibliographic resources (fabio:Expression)

This dataset contains all the bibliographic metadata (in CSV format) included in OpenCitations Meta. In particular, each line of the CSV file defines a bibliographic resource, and includes the following information:

  • [field "id"] the IDs for the document described within the line;
  • [field "title"] the document's title;
  • [field "author"] the authors of the document;
  • [field "pub_date"] the date of publication;
  • [field "venue"] information about the venue, i.e. the bibliographical resource to which the document belongs;
  • [field "volume"] the volume sequence identifier (e.g. a number) to which the entity belongs;
  • [field "issue"] the issuesequence identifier (e.g. a number) to which the entity belongs;
  • [field "page"] the page range of the resource described in the row;
  • [field "type"] the type of resource described in the row;
  • [field "publisher"] the entity responsible for making the resource available;
  • [field "editor"] the editors of the document.

This version of the dataset contains:

  • 121,302,680 bibliographic entities
  • 368,061,399 authors, 2,718,222 editors, and 101,612,475 publishers (counted by their roles, without disambiguating individual
  • 698,995 publication venues

The compressed dataset weighs 12G, while, when extracted, it weighs 48G on an ext4 filesystem.

Additional information about OpenCitations Meta at official webpage.

Funding

OpenAIRE-Nexus Scholarly Communication Services for EOSC users

European Commission

Find out more...

GraspOS: next Generation Research Assessment to Promote Open Science

European Commission

Find out more...

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC