Mondo Disease Ontology: harmonizing disease concepts around the world
Standards exist for describing gene variants (e.g. HGVS), but there is not a definitive standard for encoding diseases for information exchange. Existing sources of disease definitions include the National Cancer Institute Thesaurus (NCIt), the Online Mendelian Inheritance in Man (OMIM), SNOMED CT, ICD, ICD-O, OncoTree, MedGen, and numerous others. However, these standards partially overlap and often conflict, making it difficult to align knowledge sources - for example variant interpretation or drug responsiveness. This need to integrate information has resulted in a proliferation of mappings between disease entries in different resources; these mappings lack completeness, accuracy, and precision, and are often inconsistent between resources. The UMLS provides intermediate concepts through which other resources can be mapped, but these mappings also suffer from the same challenges: they are not guaranteed to be one-to-one, especially in areas with evolving disease concepts such as rare disease. Further, the UMLS is not intended for classification, for example, it contains cycles.
In order to computationally utilize our collective knowledge sources for diagnostics and to reveal underlying mechanisms of diseases, we need to understand which terms are truly equivalent across different resources. This will allow integration of associated information, such as treatments, genetics, phenotypes, etc. We therefore created the Mondo Disease Ontology to provide a logic-based structure for unifying multiple disease resources. Mondo is created by a combination of algorithmic equivalency determination using the kBOOM algorithm, and expert curation. Mondo does provide equivalence mappings to other disease resources, but in contrast to other mapping sets, Mondo precisely annotates each mapping using strict semantics, so that we know when two diseases are precisely equivalent or merely closely related - allowing computational integration of associated data.
Mondo, NCIt and the Human Phenotype Ontology (HPO) are used to structure disease and phenotype descriptions in a number of resources, such as ClinGen, the Monarch Initiative, and the Gabriella Miller Kids First Data Resource, which is a curated database of clinical and genetic sequence data from pediatric patients with structural abnormalities or childhood cancers.