figshare
Browse
Krisko-Supek-Genome-Biology-2014_inferring_gene_function_from_evolution_of_codon_bias_in_microbes.xlsx (292.03 kB)

Inferring gene function from evolutionary change in signatures of translation efficiency

Download (0 kB)
dataset
posted on 2015-01-01, 17:18 authored by Fran SupekFran Supek

Krisko, Copic, Gabaldon, Lehner and Supek. Genome Biol 2014. doi:10.1186/gb-2014-15-3-r44
"Inferring gene function from evolutionary change in signatures of translation efficiency"

Additional file 18. All COG gene families found to be significantly enriched/depleted for
highly expressed (HE) genes across 911 microbial genomes in any of the 24 examined
phenotypes or environments. The HE genes were determined by quantifying the use of
optimal codons, which was compared against the composition of intergenic DNA
using a randomization test.

(Table A) All 200 shown expression-phenotype links have (i) Fisher's exact test
for enrichment P<0.01, (ii) magnitude of enrichment >2x or <0.5x, and
(iii) Random Forest (RF) randomization test for confounding
phenotypes/environments/phylogeny P<0.01. A total of 187 unique COGs are
represented in the list.

The "numOrgs" column shows a number of organisms positive for the given trait,
and the total number of organisms where that trait is defined. For instance,
traits named "Mammalian pathogen: (tissue or organ)" are positive for pathogens
infecting that organ, negative for all other mammalian pathogens,
and undefined for the rest of examined Bacteria and Archaea. In some cases
(e.g. halophiles), many microbes have undefined values due to missing annotations.

In every COG, column "a" is the number of HE genes in microbes positive for the given trait,
"b" is the count of non-HE genes in the same microbes, "c" is HE genes in microbes where
the trait is negative and "d" is the count of non-HE genes in the negative organisms.
"Enrich" is the enrichment = a/(a+b) / ( c/(c+d) ).

(Table B) Same as table A, but with the threshold for the RF test for counfounders
slightly relaxed to P<0.05, yielding 167 additional COG-phenotype links (total: 367).

(Table C) A set of 2386 inferred COG-phenotype links with broader coverage due to relaxed
criteria for magnitude of enrichment (1.5x or 0.66x, instead of 2x or 0.5x); the criteria
for significance of the enrichment remained the same (P<0.01 in Fisher's exact test).
Additionally, the COG-phenotype links in this table did not have to pass the
RF randomization test for confounders (meaning there is no control for phylogeny,
genome size/GC content, or the inter-correlation of phenotypes).

 

History