Systematically identify phylogenetic markers at different taxonomic levels for bacteria and archaea

<p>For a variety of kinds of studies it is useful to have a collection of so-called “phylogenetic marker” genes for the taxa of interest. Such genes would ideally be found in most or all of the taxa under study and phylogenetic trees of them would produce reliable and robust phylogenetic results. Related to this, such marker genes would ideally be resistant to issues that make phylogenetic results for a gene family not accurately reflect species history. Examples of such issues include difficulty in generating alignments, extensive lateral gene transfer, convergent evolution, and highly variable rates of evolution between taxa or over time. We are interested in phylogenetic studies of bacteria and archaea and describe here an extensive analysis of potential phylogenetic marker genes in these taxa by analyzing available complete genome sequences. We describe here a protocol for the automatic identifications of phylogenetic markers at different taxonomic levels. The protocol uses rapid searching and clustering algorithms to generate protein families for a selection of genomes. Phylogenetic trees are then built for the families and clades from the trees are automatically sampled and evaluated for features that make them potentially useful “marker” genes for both phylogenetic analysis (e.g., universality across genomes of interest) and ecological studies (e.g. evenness in copy number). Potential marker families are then further assessed using multiple comparative and phylogenetic analyses. Using this approach, we have identified 114 phylogenetic markers for “all bacteria” (expanding on previous collections that are of ~ 30 in size) including 40 markers that cover bacteria and archaea simultaneously. In addition, we have identified 100s-1000s of markers for individual phyla of bacteria that should allow much more detailed automated phylogenetic analyses of these groups than possible previously.</p> <p> </p> <p>###############<br>Markers at Different Taxonomic Level (hmm, amino acid sequences and alignments)</p> <p>actinobacteria.tgz<br>alphaproteobacteria.tgz<br>archaea.tgz<br>bacteria_and_archaea.tgz<br>bacteria.tgz<br>bacteroidetes.tgz<br>betaproteobacteria.tgz<br>chlamydiae.tgz<br>chloroflexi.tgz<br>cyanobacteria.tgz<br>deinococcus-thermus.tgz<br>deltaproteobacteria.tgz<br>epsilonproteobacteria.tgz<br>firmicutes.tgz<br>gammaproteobacteria.tgz<br>spirochaetes.tgz<br>thermotogae.tgz</p> <p>#################<br>Super-family and Sub-family relationships for the markers (No.1-114, other identification numbers<br>bare no relationship between the taxonomic level specific markers)</p> <p>super_sub_family.xls</p> <p>################<br>Universality,Evenness and Monophyletic Value Measurements</p> <p>Trees: tree.tgz<br>Measurement for Figure 2 and 3: Figure2and3.tgz<br>Perl scripts: scripts.tgz</p> <p> </p> <p> </p>