Folding the human proteome using BioNeMo: A fused dataset of structural models for machine learning applications.

This dataset contains predicted protein structures for 42,042 distinct human proteins derived from the UniProt reference proteome UP000005640. The dataset was generated by combining state-of-the-art modeling-tools AlphaFold 2, OpenFold, and ESMFold, provided within NVIDIA’s BioNeMo platform, as well as homology modeling using Innophore’s CavitomiX platform. Our dataset is offered in both unedited and edited formats for diverse research requirements. The unedited version contains structures as generated by the different prediction methods, whereas the edited version contains refinements, including a dataset of structures without low prediction-confidence regions and structures in complex with predicted ligands based on homologs in the PDB.


