Systematically identify phylogenetic markers at different taxonomic levels for bacteria and archaea

For a variety of kinds of studies it is useful to have a collection of so-called “phylogenetic marker” genes for the taxa of interest. Such genes would ideally be found in most or all of the taxa under study and phylogenetic trees of them would produce reliable and robust phylogenetic results. Related to this, such marker genes would ideally be resistant to issues that make phylogenetic results for a gene family not accurately reflect species history. Examples of such issues include difficulty in generating alignments, extensive lateral gene transfer, convergent evolution, and highly variable rates of evolution between taxa or over time. We are interested in phylogenetic studies of bacteria and archaea and describe here an extensive analysis of potential phylogenetic marker genes in these taxa by analyzing available complete genome sequences. We describe here a protocol for the automatic identifications of phylogenetic markers at different taxonomic levels. The protocol uses rapid searching and clustering algorithms to generate protein families for a selection of genomes. Phylogenetic trees are then built for the families and clades from the trees are automatically sampled and evaluated for features that make them potentially useful “marker” genes for both phylogenetic analysis (e.g., universality across genomes of interest) and ecological studies (e.g. evenness in copy number). Potential marker families are then further assessed using multiple comparative and phylogenetic analyses. Using this approach, we have identified 114 phylogenetic markers for “all bacteria” (expanding on previous collections that are of ~ 30 in size) including 40 markers that cover bacteria and archaea simultaneously. In addition, we have identified 100s-1000s of markers for individual phyla of bacteria that should allow much more detailed automated phylogenetic analyses of these groups than possible previously.

 

###############
Markers at Different Taxonomic Level (hmm, amino acid sequences and alignments)

actinobacteria.tgz
alphaproteobacteria.tgz
archaea.tgz
bacteria_and_archaea.tgz
bacteria.tgz
bacteroidetes.tgz
betaproteobacteria.tgz
chlamydiae.tgz
chloroflexi.tgz
cyanobacteria.tgz
deinococcus-thermus.tgz
deltaproteobacteria.tgz
epsilonproteobacteria.tgz
firmicutes.tgz
gammaproteobacteria.tgz
spirochaetes.tgz
thermotogae.tgz

#################
Super-family and Sub-family relationships for the markers (No.1-114, other identification numbers
bare no relationship between the taxonomic level specific markers)

super_sub_family.xls

################
Universality,Evenness and Monophyletic Value Measurements

Trees: tree.tgz
Measurement for Figure 2 and 3: Figure2and3.tgz
Perl scripts: scripts.tgz

 

 

Keyword(s)

License

CC BY 4.0