Supplementary material to manuscript: Analyzing data citation practices to the Data Citation Index
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Supplementary material to an analysis on data citation practices based on the Data Citation Index from Thomson Reuters. This database launched in 2012 aims to link data sets and data studies with citation received from the rest of their citation indexes. Funding bodies and research organizations are increasingly demanding the need of researchers to make their scientific data available in a reusable and reproducible manner, aiming to maximize the allocation of funding while providing transparency on the scientific process. The DCI harvests citations to research data from papers indexed in the Web of Knowledge. It relies on the information provided by the data repository as data citation practices are inconsistent or inexistent in many cases. The findings of this study show that data citation practices are far from common in most research fields.. Some differences have been reported on the way researchers cite data: while in the areas of Science and Engineering & Technology data sets were the most cited, in Social Sciences and Arts & Humanities data studies play a greater role. 88.1% of the records have received no citation, but some repositories show very low uncitedness rates. While data citation practices are rare in most fields, they have expanded in disciplines such as Crystallography or Genomics. We conclude by emphasizing the role the DCI may play to encourage consistent and standardized citation of research data which will allow considering its use on following the research process developed by researchers, from data collection to publication.