figshare
Browse

Evaluating fast maximum likelihood-based phylogenetic programs using empirical phylogenomic data sets

Published on by Xiaofan Zhou
High-throughput DNA sequencing technologies have transformed both the data amounts and the analytical tools employed in many biological disciplines. For example, the field of phylogenetics has witnessed dramatic increases in the sizes of data matrices assembled to resolve branches of the tree of life, which in turn have motivated the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood (ML) framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these four programs have been widely used and cited, a systematic evaluation and comparison of their performance using real data, particularly in the context of genome-scale analyses has so far been lacking. To address this gap in knowledge, we evaluated these four fast ML programs on a rich collection of 19 empirical phylogenomic data sets from diverse animal, plant, and fungal lineages with respect to likelihood maximization, topological accuracy, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategy (ten RAxML searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single trees inferred by the three programs yielded comparable coalescent-based species tree estimations using ASTRAL. For concatenation–based species tree inference, we found that IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete runs on supermatrices, whereas FastTree was the fastest of all programs in both types of analyses, but at the cost of the lower likelihood values and topological accuracy. Finally, we found that data matrix properties, such as the number of taxa and the information content, sometimes substantially influenced the relative performance of the programs. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses.

Cite items from this project

DataCite
3 Biotech
3D Printing in Medicine
3D Research
3D-Printed Materials and Systems
4OR
AAPG Bulletin
AAPS Open
AAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)
Academic Medicine
Academic Pediatrics
Academic Psychiatry
Academic Questions
Academy of Management Discoveries
Academy of Management Journal
Academy of Management Learning and Education
Academy of Management Perspectives
Academy of Management Proceedings
Academy of Management Review

cite all items

Funding

DEB-1442113, DEB-1442148, BER DE-FC02- 07ER64494, and Hatch project 1003258

Share

email