TCS: A new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction
Jia-Ming Chang, Paolo Di Tommaso, and Cedric Notredame. TCS: A new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction, Mol Biol Evol first published online April 1, 2014, doi:10.1093/molbev/msu117
Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological se- quences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure based reference alignments. We also show how this measure can be used to im- prove phylogenetic tree reconstruction using both an established simulated dataset and a nov- el empirical yeast dataset. For this purpose, we describe a novel lossless alternative to site fil- tering that involves over-weighting the trustworthy columns. Our approach relies on the T- Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off be- tween speed and accuracy. We compared TCS to HoT, GUIDANCE, Gblocks and trimAl and found it to lead to significantly better estimate of structural accuracy as well as more accurate phylogenetic trees.