High-throughput DNA sequencing technologies have transformed both the data amounts and the analytical tools employed in many biological disciplines. For example, the field of phylogenetics has witnessed dramatic increases in the sizes of data matrices assembled to resolve branches of the tree of life, which in turn have motivated the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood (ML) framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these four programs have been widely used and cited, a systematic evaluation and comparison of their performance using real data, particularly in the context of genome-scale analyses has so far been lacking. To address this gap in knowledge, we evaluated these four fast ML programs on a rich collection of 19 empirical phylogenomic data sets from diverse animal, plant, and fungal lineages with respect to likelihood maximization, topological accuracy, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategy (ten RAxML searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single trees inferred by the three programs yielded comparable coalescent-based species tree estimations using ASTRAL. For concatenation–based species tree inference, we found that IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete runs on supermatrices, whereas FastTree was the fastest of all programs in both types of analyses, but at the cost of the lower likelihood values and topological accuracy. Finally, we found that data matrix properties, such as the number of taxa and the information content, sometimes substantially influenced the relative performance of the programs. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses.
Cite items from this project
DataCiteDataCite
3 Biotech3 Biotech
3D Printing in Medicine3D Printing in Medicine
3D Research3D Research
3D-Printed Materials and Systems3D-Printed Materials and Systems
4OR4OR
AAPG BulletinAAPG Bulletin
AAPS OpenAAPS Open
AAPS PharmSciTechAAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität HamburgAbhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)ABI Technik (German)
Academic MedicineAcademic Medicine
Academic PediatricsAcademic Pediatrics
Academic PsychiatryAcademic Psychiatry
Academic QuestionsAcademic Questions
Academy of Management DiscoveriesAcademy of Management Discoveries
Academy of Management JournalAcademy of Management Journal
Academy of Management Learning and EducationAcademy of Management Learning and Education
Academy of Management PerspectivesAcademy of Management Perspectives
Academy of Management ProceedingsAcademy of Management Proceedings
Academy of Management ReviewAcademy of Management Review