figshare
Browse
- No file added yet -

Number of NIH Funded papers with a link to a dataset - based on Data Citation Corpus

Download (3.62 kB)
dataset
posted on 2024-08-14, 09:26 authored by Mark HahnelMark Hahnel

We at Digital Science have been looking at the Data Citation Corpus, to dig deeper into data citation counts.

The first release is based on a seed file that includes data citations from the following sources:

  • Data citations from DataCite and Crossref DOI metadata, via Event Data.
  • Data citations from the CZI Science Knowledge Graph, identified via a Named Entity Recognition model algorithm that searches for mentions to datasets in the full text of journal articles and preprints in Europe PMC.

So we are basically looking at papers that have a link to a DataCite DOI or accession number.

By combining this dataset with Dimensions.ai data in Google Big Query, we we're able to add more dimensions to the dataset (pardon the pun), such as funder or institution. The Data Citation Corpus only gave us about 70% of the paper links that were resolvable DOIs. This should improve over time.

This allows us to track how well things like the NIH open data policy is encouraging linking to datasets from papers.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC