figshare
Browse
DATASET
stats_lev2_202401.csv (193.64 kB)
DATASET
stats_lev3_202401.csv (15.67 kB)
DATASET
stats_lev4_202401.csv (2.35 kB)
DATASET
labels_lev1_202401.csv (5.05 MB)
DATASET
labels_lev2_202401.csv (98.66 kB)
DATASET
labels_lev3_202401.csv (6.36 kB)
DATASET
labels_lev4_202401.csv (0.85 kB)
DATASET
PMID_cluster_relation_202401.csv (493.03 MB)
DATASET
stats_lev1_202401.csv (7.6 MB)
1/0
9 files

PubMed classification 202401

Version 7 2024-02-13, 12:54
Version 6 2023-06-12, 11:19
Version 5 2023-02-21, 07:14
Version 4 2022-09-16, 09:27
Version 3 2022-04-07, 09:09
Version 2 2021-11-25, 13:36
Version 1 2021-10-14, 09:34
dataset
posted on 2024-02-13, 12:54 authored by Peter SjögårdePeter Sjögårde

The classification contains about 21 million PubMed publications from 1995 onward. It has been created using clustering in a citation network.

The January 2024 update is a complete new version of the classification based on new clustering and labeling.

File description

PMID_cluster_relation_[date].csv contains the relation between PMIDs and clusters. Four levels are included:

  • Level 1 - Topics - Most granular
  • Level 2 - Specialties
  • Level 3 - Disciplines
  • Level 4 - Discipline group - Most coarse

Labels

For each level there is a table with labels (e.g. labels_lev1_[date].csv), related by an id (e.g lev1_cluster_id).

Stats

For each level there is a table with statistics (e.g. lev1_stats). The table includes the columns below. For more information about the "Clinical", "Human", "Animal" and "Molecular/Cellular Biology" categories, see https://nih.figshare.com/collections/iCite_Database_Snapshots_NIH_Open_Citation_Collection_/4586573

  • p - The number of publications in the cluster in the initial clustering.
  • pct_clinical - The proportion of clinical articles
  • sum_clinical - The number of clinical articles
  • pct_human - The average of the fraction of MeSH terms that are in the "Human" category
  • sum_human - The sum of fraction of MeSH terms that are in the "Human" category
  • pct_animal - The average of the fraction of MeSH terms that are in the "Animal" category
  • sum_animal - The sum of fraction of MeSH terms that are in the "Animal" category
  • pct_molecular_cellular - The average of the fraction of MeSH terms that are in the "Molecular/Cellular Biology" category
  • sum_molecular_cellular - The sum of fraction of MeSH terms that are in the "Molecular/Cellular Biology" category

Visualizations:

See the figshare collection for further description.

History