PubMed classification 202401
The classification contains about 21 million PubMed publications from 1995 onward. It has been created using clustering in a citation network.
The January 2024 update is a complete new version of the classification based on new clustering and labeling.
File description
PMID_cluster_relation_[date].csv contains the relation between PMIDs and clusters. Four levels are included:
- Level 1 - Topics - Most granular
- Level 2 - Specialties
- Level 3 - Disciplines
- Level 4 - Discipline group - Most coarse
Labels
For each level there is a table with labels (e.g. labels_lev1_[date].csv), related by an id (e.g lev1_cluster_id).
Stats
For each level there is a table with statistics (e.g. lev1_stats). The table includes the columns below. For more information about the "Clinical", "Human", "Animal" and "Molecular/Cellular Biology" categories, see https://nih.figshare.com/collections/iCite_Database_Snapshots_NIH_Open_Citation_Collection_/4586573
- p - The number of publications in the cluster in the initial clustering.
- pct_clinical - The proportion of clinical articles
- sum_clinical - The number of clinical articles
- pct_human - The average of the fraction of MeSH terms that are in the "Human" category
- sum_human - The sum of fraction of MeSH terms that are in the "Human" category
- pct_animal - The average of the fraction of MeSH terms that are in the "Animal" category
- sum_animal - The sum of fraction of MeSH terms that are in the "Animal" category
- pct_molecular_cellular - The average of the fraction of MeSH terms that are in the "Molecular/Cellular Biology" category
- sum_molecular_cellular - The sum of fraction of MeSH terms that are in the "Molecular/Cellular Biology" category
Visualizations: