Sequence alignment for phylogenetic tree construction

This alignment was used to build a tree with our MAGs, all taxa previously identified by Burgess et al. (2012) with complete genomes available on NCBI (downloaded 2017-09-06), and all archaeal and bacterial genomes previously used in Hug et al. (2016). The genomes used in this tree and a mapping file can be found on figshare.

(genomes in Hug et al.’s tree of life (2016): https://doi.org/10.6084/m9.figshare.6863594.v1, https://doi.org/10.6084/m9.figshare.6863744.v2, https://doi.org/10.6084/m9.figshare.6863813.v1; genomes from Burgess et al. (2012): https://doi.org/10.6084/m9.figshare.6863798.v1).

PhyloSift builds an alignment of the concatenated sequences for a set of core markers for each taxon. We used 37 of these single-copy marker genes (ribosomal protein S2 rpsB, S10 rpsJ, L1 rplA, L22, L4/L1e rplD, L2 rplB, S9 rpsl, L3 rplC, L14b/L23e rplN, S5, S19 rpsS, S7, L16/L10E rplP, S13 rpsM, L15, L25/L23, L6 rplF, L11 rplK, L5 rplE, S12/S23, L29, S3 rpsC, S11 rpsK, L10, S8, L18P/L5E, S15P/S13e, S17, S13 rplM, L24; and translation initiation factor IF-2, metalloendopeptidase, phenylalanyl-tRNA synthetase beta subunit, phenylalanyl-tRNA synthetase alpha subunit, tRNA pseudouridine synthase B, Porphobilinogen deaminase, and ribonuclease HII; i.e., PhyloSift markers DNGNGWU00001 - DNGNGWU00040 without DNGNGWU00004, DNGNGWU00008 and DNGNGWU00038). The amino acid alignment of these 37 concatenated genes was trimmed using trimAl v.1.2. Columns with gaps in more than 5% of the sequences were removed, as well as taxa with with less than 75% of the concatenated sequences. MAGs from ARK and ZAV that did not meet this threshold were manually kept in the alignment.

Categories

Keyword(s)

License

CC BY 4.0