This collection contains a classification of PubMed publications from 1995 onwards. The classification has been created by the use of clustering (or community detection) in a citation network.
The classification has been obtained basically using the procedure in Waltman and van Eck (2012) and the Leiden algorithm (Traag et al., 2019)). Citation relations are from NIH Open Citation Collection (OCC, iCite 2019). Note that the coverage of the OCC and thereby the classification is better for recent years. At the time for the first version of this collection (Feb 2021) the OCC coverage of PubMed articles from 2019 was about 94% and for the year 2000 about 88%.
The classification is hierarchical and has 4 levels. The most granular level corresponds to research topics. Topics are grouped into research specialties, specialties into disciplines and disciplines to broad research areas. Each publication belongs to exactly one class at each level. Granularity at the level of topics and specialties has been set to approximately the size of classes obtained in Sjögårde and Ahlgren (2018) and Sjögårde and Ahlgren (2020) respectively.
Labels have been created by extracting noun phrases from titles, author keywords, medical subject headings (MeSH), journals and author addresses. The three most relevant terms have been selected by a term-weighting approach and concatenated into a label. The labeling approach is described in Sjögårde et al. (2020)
Example of classification of a publication
“Comparison of balance and stabilizing trainings on balance indices in patients suffering from nonspecific chronic low back pain.” Hosseinifar M, Akbari A, Mahdavi M, Rahmati M. J Adv Pharm Technol Res. 2018 Apr-Jun;9(2):44-50. doi: 10.4103/japtr.JAPTR_130_18. PMID: 30131936.
Discipline: pain; low; anesthesiology
Specialty: ergonomics; posture; vibration
Topic: abdominal muscle; transversus abdominis; core stability
iCite, Hutchins, B. I., & Santangelo, G. (2019). ICite Database Snapshots (NIH Open Citation Collection). The NIH Figshare Archive. Collection. https://doi.org/10.35092/yhjc.c.4586573
Sjögårde, P., & Ahlgren, P. (2018). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics. Journal of Informetrics, 12(1), 133–152. https://doi.org/10.1016/j.joi.2017.12.006
Sjögårde, P., & Ahlgren, P. (2020). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of specialties. Quantitative Science Studies, 1(1), 207–238. https://doi.org/10.1162/qss_a_00004
Sjögårde, P., Ahlgren, P., & Waltman, L. (2021). Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches. Journal of the Association for Information Science and Technology, 72(7), 853–869. https://doi.org/10.1002/asi.24452
Traag, V. A., Waltman, L., & Eck, N. J. van. (2019). From Louvain to Leiden: Guaranteeing well-connected communities. Scientific Reports, 9(1), 5233. https://doi.org/10.1038/s41598-019-41695-z
Waltman, L., & van Eck, N. J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology, 63(12), 2378–2392. https://doi.org/10.1002/asi.22748