figshare
Browse
semsacc-scientometrics2019.zip (6.79 MB)

Material of the article "The practice of self-citations: a longitudinal study"

Download (6.79 MB)
Version 6 2019-08-10, 14:47
Version 5 2018-08-05, 07:10
Version 4 2018-08-05, 07:09
Version 3 2018-08-05, 07:08
Version 2 2018-08-05, 07:06
Version 1 2018-08-05, 07:06
dataset
posted on 2019-08-10, 14:47 authored by Silvio PeroniSilvio Peroni
This package contains the materials, data, and results of the experiments introduced in the article "The practice of self-citations: a longitudinal study" by Silvio Peroni, Francesco Poggi, Andrea Giovanni Nuzzolese, Paolo Ciancarini, Aldo Gangemi, e Valentina Presutti, submitted to Scientometrics.

In particular, it contains:

1. A README.txt file (this file).

2. The directory "data" which contains the original data used for the experiments. In particular it contains two CSV files called "author-self-citations.csv" and "author-network-self-citations.csv". The first file counts all the author self-citations (i.e. those where the citing article and the cited article share at least one author) for each of the articles analysed. Instead, the second file counts all the author network self-citations (i.e. those when a co-author of any author of the citing article is also the author of the cited article) for the same set of articles. The tabular structure followed for these two files is the same:
- "id" is the local identifier of the article in consideration;
- "year" is the year of publication of the article;
- "category" is the discipline to which the article belongs to;
- "citation" is the number of bibliographic references in its reference list, i.e. the citations that it does to other works;
- "self" is the number of bibliographic references that denotes a self-citation.

3. The directory "evaluation" that contains several CSV files and images describing, for each discipline considered, the difference in the means of the number of self-citations before and after 2012. In particular, the sub-directory "author-self-citations" contains two CSV files, i.e. "author-self-citations-1957-2016.csv" and "author-self-citations-2009-2016.csv", that contains the aforementioned data about author self-citations considering the 1959-2016 and the 2009-2016 publication windows for the articles, and the related diagram showing the confidence intervals computed in both cases. Similarly, the sub-directory "author-network-self-citations" contains other 2 CSV files and a diagram that describe the same information concerning author network self-citations. The data in the CSV are structured as follows:
- "category": is the discipline in consideration;
- "# p[year <= 2012]" is the number of articles published by 2012;
- "mean p[year <= 2012]" is the mean of self-citations per article, considering those ones published by 2012;
- "st p[year <= 2012]" is the standard deviation of the previous mean;
- "# p[year > 2012]" is the number of articles published after 2012;
- "mean p[year > 2012]" is the mean of self-citations per article, considering those ones published after 2012;
- "st p[year > 2012]" is the standard deviation of the previous mean;
- "diff" is the difference between the two means;
- "ci-low" is the lower confidence interval limit (margin of error) of the previous difference;
- "ci-high" is the higher confidence interval limit of the previous difference.
All these CSV files and diagram are also accompanied by additional figures we used for the study.

4. The directory "script" that contains three Python scripts that have been used for calculating the aforementioned results and diagrams. In particular:
- "analyse_data.py" has been used to create the CSV files contained in the directory "evaluation" starting from the information contained in the directory "data";
- "error_diagram.py" has been used to create the two diagrams included in the directory evaluation;
- "analyse_pandas.py" has been used to create all the additional figures included in the directory "evaluation".

All the documents and data in this package are released with a CC0 waiver (https://creativecommons.org/publicdomain/zero/1.0/legalcode), while the Python scripts are licensed with an ISC Licence (https://opensource.org/licenses/ISC).

Funding

Parially funded by ANVUR

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC