figshare
Browse

covid19_preprints

Download (28.83 MB)
Version 58 2021-12-16, 19:36
Version 57 2021-12-16, 19:35
Version 56 2021-09-17, 19:34
Version 55 2021-09-07, 12:44
Version 54 2021-08-20, 13:34
Version 53 2021-08-04, 14:12
Version 52 2021-08-04, 14:11
Version 51 2021-08-04, 14:10
Version 50 2021-08-04, 13:03
Version 49 2021-08-04, 13:01
Version 48 2021-08-04, 13:00
Version 47 2021-05-05, 09:00
Version 46 2021-04-02, 20:38
Version 45 2021-04-02, 20:37
Version 44 2021-04-02, 20:36
Version 43 2021-03-09, 14:40
Version 42 2021-02-08, 12:10
Version 41 2021-01-15, 17:33
Version 40 2020-12-16, 09:17
Version 39 2020-11-23, 14:42
software
posted on 2021-12-16, 19:36 authored by Nicholas FraserNicholas Fraser, Bianca KramerBianca Kramer
<p>This repository contains code used to extract details of preprints related to COVID-19 and visualize their distribution over time. Work by <a href="https://orcid.org/0000-0002-7582-6339" rel="nofollow">Nicholas Fraser</a> and <a href="https://orcid.org/0000-0002-5965-6560" rel="nofollow">Bianca Kramer</a>.</p><p><br></p><p>Preprint data is currently updated on a weekly schedule - details of these releases can be found in <code>data/metadata.json</code>, where <code>release_date</code> refers to the date on which data was collected, and <code>sample_date</code> the cut-off point for preprints to be included based on their posting date.</p><p><br></p> <p>Note that this dataset is not exhaustive, but aims to collate information from some of the main sources of preprint metadata.</p><p><br></p><p>The process for collecting preprint metadata is documented fully <a href="https://github.com/nicholasmfraser/covid19_preprints/blob/master/covid19_preprints.md">here</a>. In general terms, preprint metadata are harvested from four main sources:</p> <ul><li> <p>Crossref (using the <a href="https://github.com/ropensci/rcrossref">rcrossref</a> package). All records with the <code>type</code> field defined as <code>posted-content</code> are harvested, as well as records from SSRN (where the <code>type</code> field is instead defined as <code>journal-article</code>). Preprint records are then matched to known preprint repositories based on <code>institution</code>, <code>publisher</code> and <code>group-title</code> metadata fields.</p></li><li> <p>DataCite (using the <a href="https://github.com/ropensci/rcrossref">rdatacite</a> package). All records with the <code>resourceType</code> field defined as <code>Preprint</code> are harvested. Preprint records are matched to known preprint repositories based on <code>client</code> fields.</p> </li><li> <p>arXiv (using the <a href="https://github.com/ropensci/aRxiv">aRxiv</a> package). Records are harvested by searching directly for COVID-19 related keywords in titles or abstracts using the built-in search functionality of the arXiv API.</p> </li><li> <p>RePEc (using the <a href="https://github.com/ropensci/oai">oai</a> package)). All record types are initally harvested, and subsequently filtered for those with the <code>Type</code> field defined as <code>preprint</code>.</p> </li></ul> <p>For all sources, preprints are classified as being related to COVID-19 on the basis of keyword matches in their titles or abstracts (where available). The search string is defined as: <code>coronavirus OR covid-19 OR sars-cov OR ncov-2019 OR 2019-ncov OR hcov-19 OR sars-2</code>.</p> <p>In some cases, multiple preprint metadata records are registered for a single preprint (e.g. ChemRxiv registers a new Crossref record for each new version of a preprint). In these cases, only the earliest posted version is included in this dataset. Additionally, some preprints are deposited to multiple preprint repositories - in these cases all preprint records are included.</p>

History

Related Materials

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC