figshare
Browse
1/1
12 files

Center for Open Science Preprint Analysis

dataset
posted on 2019-04-24, 02:25 authored by Tom NarockTom Narock, Evan GoldsteinEvan Goldstein
This dataset contains results from analysis of 9 Center for Open Science (COS) preprint systems. Data were collected in December of 2018 using the Open Science Framework (OSF) Application Programming Interface (API, https://developer.osf.io/). The 9 preprint systems analyzed were eartharxiv, engrxiv, lawarxiv, lissa, marxiv, mindrxiv, paleorxiv, pssyarxiv, and socarxiv. These system were chosen because they met the following three conditions:

1. A service must have at least 100 total manuscripts to enable meaningful statistics
2. The services provides english language manuscripts to enable topic analysis
3. The manuscripts must be accessible through the OSF API, which will enable us to analyze the service through programmatic means.

Each of the 9 preprint services has its own subdirectory containing two files. A .log file (e.g. eartharxiv.log, engrxiv.log) and a keyword count log file (e.g. eartharxiv_keywords_count.log, engrxiv_keywords_count.log). The former is a delimited text file where semi-colon is used as the delimiter. Paper titles often have commas in them and using semi-colons to seperate columns allows us to preserve the titles. The semi-colon delimited columns are:

identifier; preprint provider; preprint doi; peer review doi; preprint publication date; peer review publication date; title; author list; keyword list

identifier is a unique identifier supplied by the COS preprint system at the time of manuscript submission.

The keyword count files summarize how many times each COS keyword is used. These files each have two columns with semi-colon again being the delimiter. The two columns are: keyword and count

RatioData.csv summarizes results returned from the UnPaywall API. It contains preprint-to-postprint ratios for each of the 9 COS preprint systems.

The software used to generate this dataset is available at: https://doi.org/10.5281/zenodo.2649580

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC