PubMed Classification

Posted on 25.11.2021 - 13:45 by Peter Sjögårde

This collection contains a classification of PubMed publications from 1995 onwards. The classification has been created by the use of clustering (or community detection) in a citation network.

The classification has been obtained basically using the procedure in Waltman and van Eck (2012) and the Leiden algorithm (Traag et al., 2019)). Citation relations are from NIH Open Citation Collection (OCC, iCite 2019). Note that the coverage of the OCC and thereby the classification is better for recent years. At the time for the first version of this collection (Feb 2021) the OCC coverage of PubMed articles from 2019 was about 94% and for the year 2000 about 88%.

The classification is hierarchical and has 4 levels. The most granular level corresponds to research topics. Topics are grouped into research specialties, specialties into disciplines and disciplines to broad research areas. Each publication belongs to exactly one class at each level. Granularity at the level of topics and specialties has been set to approximately the size of classes obtained in Sjögårde and Ahlgren (2018) and Sjögårde and Ahlgren (2020) respectively.

Labels have been created by extracting noun phrases from titles, author keywords, medical subject headings (MeSH), journals and author addresses. The three most relevant terms have been selected by a term-weighting approach and concatenated into a label. The labeling approach is described in Sjögårde et al. (2020)

Example of classification of a publication

“Comparison of balance and stabilizing trainings on balance indices in patients suffering from nonspecific chronic low back pain.” Hosseinifar M, Akbari A, Mahdavi M, Rahmati M. J Adv Pharm Technol Res. 2018 Apr-Jun;9(2):44-50. doi: 10.4103/japtr.JAPTR_130_18. PMID: 30131936.

Discipline: pain; low; anesthesiology

Specialty: ergonomics; posture; vibration

Topic: abdominal muscle; transversus abdominis; core stability


iCite, Hutchins, B. I., & Santangelo, G. (2019). ICite Database Snapshots (NIH Open Citation Collection). The NIH Figshare Archive. Collection.

Sjögårde, P., & Ahlgren, P. (2018). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics. Journal of Informetrics, 12(1), 133–152.

Sjögårde, P., & Ahlgren, P. (2020). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of specialties. Quantitative Science Studies, 1(1), 207–238.

Sjögårde, P., Ahlgren, P., & Waltman, L. (2021). Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches. Journal of the Association for Information Science and Technology, 72(7), 853–869.

Traag, V. A., Waltman, L., & Eck, N. J. van. (2019). From Louvain to Leiden: Guaranteeing well-connected communities. Scientific Reports, 9(1), 5233.

Waltman, L., & van Eck, N. J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology, 63(12), 2378–2392.


Sjögårde, Peter (2021): PubMed Classification. figshare. Collection.
Select your citation style and then place your mouse over the citation text to select it.


need help?