This collection contains a classification of PubMed publications from 1995 onwards. The classification has been created by the use of clustering (or community detection) in a citation network.

The classification has been obtained basically using the procedure in Waltman and van Eck (2012) and the Leiden algorithm (Traag et al., 2019)). Citation relations are from NIH Open Citation Collection (OCC, iCite 2019). Note that the coverage of the OCC and thereby the classification is better for recent years. At the time for the first version of this collection (Feb 2021) the OCC coverage of PubMed articles from 2019 was about 94% and for the year 2000 about 88%.

The classification is hierarchical and has 4 levels. The most granular level corresponds to research topics. Topics are grouped into research specialties, specialties into disciplines and disciplines to broad research areas. Each publication belongs to exactly one class at each level. Granularity at the level of topics and specialties has been set to approximately the size of classes obtained in Sjögårde and Ahlgren (2018) and Sjögårde and Ahlgren (2020) respectively.

Labels have been created by extracting noun phrases from titles, author keywords, medical subject headings (MeSH), journals and author addresses. The three most relevant terms have been selected by a term-weighting approach and concatenated into a label. The labeling approach is described in Sjögårde et al. (2020)

Example of classification of a publication

“Comparison of balance and stabilizing trainings on balance indices in patients suffering from nonspecific chronic low back pain.” Hosseinifar M, Akbari A, Mahdavi M, Rahmati M. J Adv Pharm Technol Res. 2018 Apr-Jun;9(2):44-50. doi: 10.4103/japtr.JAPTR_130_18. PMID: 30131936.

Discipline: pain; low; anesthesiology

Specialty: ergonomics; posture; vibration

Topic: abdominal muscle; transversus abdominis; core stability


Sjögårde, Peter (2021): PubMed Classification. figshare. Collection.
