Machine reasoning about phenotypes: enhancing expert knowledge about the genetics of a fossil transition

<div>The Devonian era transition from aquatic fins to terrestrial limbs in tetrapodamorph vertebrates is well-studied in the fossil record, and the genes responsible for the complex suite of anatomical changes have been the topic of much speculation. A recent review by Mastick and Mabee (MM) found evidence for 162 different fin-limb candidate genes in the evo-devo literature. As a test case for the usefulness of machine reasoning about phenotypes, we asked to what extent would an expert system recover the same set of candidate genes using only knowledge about (i) the fin-limb phenotypes from the relevant fossil taxa and (ii) the phenotypes from perturbing individual genes, as catalogued by the relevant model organism (zebrafish, mouse, Xenopus) and human databases. We used the Phenoscape Knowledgebase (kb.phenoscape.org) to compute an information theoretic measure of semantic similarity between ontologically curated phenotypes as an indication of the strength of a candidate gene association. The distribution of phenotypic semantic similarity scores between fossil and gene phenotypes is significantly displaced upwards in the MM candidates relative to the non-candidates. To understand the reasons for genes that performed counter to expectation, we examine the clustering of candidates and non-candidates within protein interaction networks. Our results demonstrate the potential of machine reasoning to accurately rank the strength of evidence for candidate genes when presented with a large volume of descriptive phenotype information. This approach could in principle be used to replace, evaluate and/or enhance candidate gene hypotheses culled from the literature. <br></div>