BIP! DB: A Dataset of Impact Measures for Scientific Publications

dataset

posted on 2021-10-04, 16:11 authored by Thanasis VergoulisThanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea MannocciAndrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia ManolaNatalia Manola, Paolo ManghiPaolo Manghi

This dataset contains impact measures (metrics/indicators) for ~119Μ scientific articles. In particular, for each article we have calculated the following measures:

Citation count: The total number of citations, reflecting the "influence" (i.e., the total impact) of an article.
Incubation Citation Count (3-year CC): This is a time-restricted version of the citation count, where the time window length is fixed for all papers and the time window depends on the publication date of the paper, i.e., only citations 3 years after each paper’s publication are counted. This measure can be seen as an indicator of a paper's "impulse", i.e., its initial momentum directly after its publication.
PageRank score: This is a citation-based measure reflecting the "influence" (i.e., the total impact) of an article. It is based on the PageRank¹ network analysis method. In the context of citation networks, PageRank estimates the importance of each article based on its centrality in the whole network.
RAM score: This is a citation-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the RAM² method and is essentially a citation count where recent citations are considered as more important. This type of “time awareness” alleviates problems of methods like PageRank, which are biased against recently published articles (new articles need time to receive a “sufficient” number of citations). Hence, RAM is more suitable to capture the current “hype” of an article.
AttRank score: This is a citation network analysis-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the AttRank³ method. AttRank alleviates PageRank’s bias against recently published papers by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher’s preference to read papers which received a lot of attention recently. This is why it is more suitable to capture the current “hype” of an article.

We provide five compressed CSV files (one for each measure/score provided) where each line follows the format “DOI score”. The parameter setting of each measure is encoded in the corresponding filename. For more details on the different measures/scores see our extensive experimental study⁴ and the configuration of AttRank in the original paper.³

The data used to produce the citation network on which we calculated the provided measures have been gathered from (a) the OpenCitations’ COCI dataset (Sep-2021 version), (b) a MAG^5,6 snapshot from Jul-2021, and (c) a Crossref snapshot from Jan-2021. The union of all distinct DOI-to-DOI citations that could be found in these sources have been considered (entries without a DOI were omitted).

Note: This is the 6th release of this dataset. You can find the previous releases here: https://doi.org/10.5281/zenodo.4386934

References:

R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839
K. Wang et al., “A Review of Microsoft Academic Services for Science of Science Studies”, Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045

Find our Academic Search Engine built on top of these data here. Further note, that we also provide all calculated scores through BIP! Finder’s API.

Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license.

More details about BIP! DB can be found in our pre-print:

T. Vergoulis, I. Kanellos, C. Atzori, A. Mannocci, S. Chatzopoulos, S. La Bruzzo, N. Manola, P. Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. arXiv 2021, 2101.12001

We kindly request that any published research that makes use of BIP! DB cite the above article:

Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi. "BIP! DB: A Dataset of Impact Measures for Scientific Publications". arXiv:2101.12001