Khan NWB 2018.pdf (30.81 MB)

Does Data Sharing Influence Data Reuse in Biodiversity? A Citation Analysis

Download (30.81 MB)
posted on 2018-12-04, 09:46 authored by Nushrat Khan, Mike ThelwallMike Thelwall, Kayvan KoushaKayvan Kousha
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px 'Times New Roman'}

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px 'Times New Roman'}

Making research data openly accessible promotes reproducibility in science. Previous studies have suggested that articles that publicly share research data have higher citation rates in biological and social sciences.

However, information about how and whether data is reused is not often openly accessible from research data repositories. This study focuses on Biodiversity datasets published on Global Biodiversity Information Facility (GBIF) because there is frequent reuse of research data in this field. GBIF was used as a data source since it provides citation count for datasets, not a commonly available feature for most repositories. Metadata from 38,878 datasets were collected through the GBIF API.

The data shows that biodiversity datasets on GBIF are frequently updated, which is unusual for research data. Analysis of dataset types, citation counts, creation and update time of datasets suggests that citation rates vary for different types of datasets.

‘Occurrence' datasets that have more granular information have higher citation rates than checklist and metadata-only datasets. Correlation tests also suggest that more frequently updated datasets tend to receive more citations. An analysis of the number of occurrence datasets published between 2007-2018 and the number of citations received indicate that, similarly to articles, it takes 2-3 years to accrue most citations for datasets.

Furthermore, an analysis of dataset title texts suggests that datasets about some regions, including China, Brazil, Atlantic, Australia and India, appear more frequently than others.

The results are suggestive that data reuse and data citation are common in Biodiversity, and that more enriched and regularly maintained datasets attract more citations. Therefore, including citation counts for datasets in repositories can help to reveal how data citation practices differ in various fields and whether citation evidence can be used to promote the impact of research data.