Building a PubMed Knowledge Graph

This collection is shared privately
PubMed is an important resource for the medical domain, but useful concepts cannot be easily extracted from it or are disambiguated, which has significantly hindered the knowledge discovery. We developed a solution by constructing a PubMed knowledge graph (PKG), where we extracted bio-entities from 29 million PubMed articles, disambiguated author names, integrated funding data through NIH ExPORTER, affiliation history and educational background of authors from ORCID, and fine-grained affiliation data from MapAffil. By integrating the credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and fundings.


15.5 GB