figshare
Browse

Drosophila 25 species phylogeny

dataset
posted on 2017-09-28, 14:22 authored by Gregg ThomasGregg Thomas, Matthew Hahn
We used the peptide sequences for 25 Drosophila species along with an outgroup species, Musca domestica, that were available on NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/) and FlyBase (ftp://flybase.net/12_species_analysis/genomes/) as of April 21, 2017. We employed the clustering algorithm MCL to predict orthologous groups of peptides and used the 1,634 ortho-groups that were predicted as being single-copy in all species for phylogeny reconstruction. These groups were aligned with PASTA and the alignments were concatenated and given to RAxML with the PROTGAMMAJTTF model to estimate the species topology and relative branch lengths. Other methods based on individual gene trees (average consensus and ASTRAL) resulted in the exact same topology. We then used r8s to estimate branch lengths in terms of millions of years with three calibration points: 49-59 mya for the common ancestor of D. pseudoobscura and D. melanogaster, 64-74 mya for the common ancestor of D. grimshawi and D. melanogaster, and 156 mya for the common ancestor of D. melanogaster and Musca domestica.

File summary:
1. drosophila-25spec-concordance-factor-tree.png
An image of the phylogeny with branches scaled by relative number of substitutions. Nodes are labeled with concordance factors. Concordance factors are calculated for each node in the species phylogeny as: (# of gene trees with that node / total # of gene trees).

2. drosophila-25spec-node-age-tree.png
An image of the phylogeny with branches scaled by millions of years. Nodes are labeled with their ages in millions of years.

3. drosophila-25spec-time-tree.tre
The Newick string of the phylogeny with branches scaled by millions of years. Nodes are labeled with concordance factors.

4. drosophila-25spec-tree.tre
The Newick string of the phylogeny with branches scaled by relative number of substitutions. Nodes are labeled with concordance factors.

History