figshare
Browse
Supek_ECCB2012_expression-level-signatures-in-branch-lengths-of-protein-evolutionary-trees.pdf (9.04 MB)

Signatures of gene expression levels in branch lengths of protein evolutionary trees

Download (0 kB)
poster
posted on 2015-01-13, 19:57 authored by Fran SupekFran Supek, Marina Marcet-Houben, Toni Gabaldón

Introduction. Over 3500 microbial genomes have been sequenced and this number rises rapidly, leading to new insight in pathogenicity and drug resistance, host-microbiome interactions, survival in extreme environments etc.

Microbes may adapt to diverse environments by changes in gene expression, encompassing changes in basal expression levels, and changes in transcriptional regulation. Predicting such changes will prove useful in elucidating various aspects of microbial physiology and ecology.

Motivation, goals. It is known that the expression level of a gene influences the rate of evolutionary change in the corresponding protein: highly expressed proteins evolve slower. We aim to systematically investigate whether this correlation is strong enough to be predictive of gene / protein levels in practical terms. Furthermore, a tremendous amount of high-throughput phylogenetics data is available e.g. in PhylomeDB, presenting a unique opportunity for unbiased large-scale screens. We aim to further develop a data mining metodology for systematic screens for evolutionary signatures in phylogenomics data, in theory applicable to any gene functional property.

Methods. Reconstructing phylogenetic trees for all proteins across 19 diverse bacterial genomes (phylomes) allowed us to compare the terminal branch lengths in the trees to experimentally measured mRNA levels. In addition, we have systematically examined a number of other features extracted from the phylogenetic trees: (A) topological similarity to the consensus tree, (B) tree length, root-to-tips, (C) terminal branch lengths, (D) gene family distribution breadth across genomes, and (E) number & age of duplications. As a baseline, we compared against the prediction accuracy of codon biases, a widely accepted sequence signature of basal expression levels.

The predictive power of each set of features was evaluated using Random Forests (ensembles of M5' regression trees in Weka).

Results. The signatures of evolutionary history at the protein sequence level captured by our phylogenetic tree descriptors predict gene expression equally well as the codon biases at the DNA level. The two sources of expression-related evolutionary signal complement each other to some extent.

A combination of only two sets of phylogenetic features was highly informative: (i) the terminal branch lengths, and (ii) the # and age of duplications. The first finding is consistent with the slower evolution rate of highly expressed proteins, while the second might possibly reflect the divergence in expression levels after duplications.

The predictive ability of the tree features varies greatly between the 19 bacterial genomes examined. Additionaly, the correlation coeffients between the codon bias and the tree features are themselves correlated.

(poster presented at the ECCB 2012 conference)

History