Supplementary data for Shakya et al. (2017)
Shakya M, Soucy SM, and Zhaxybayeva O. "Insights into Origin and Evolution of α-proteobacterial Gene Transfer Agents", submitted.
GTA_Rhodobacterales_queries.zip: FASTA-formatted files of RcGTA homologs from Rhodobacterales that were used in BLAST searches of RefSeq database and 255 α-proteobacterial genomes.
RefSeq_bacterial_hits.zip: FASTA-formatted files of detected bacterial homologs of RcGTA genes in RefSeq database release 76. The filenames correspond to gene names listed in Supplementary Table S4.
RefSeq_viral_hits.zip: FASTA-formatted files of detected
viral homologs of RcGTA genes within RefSeq database release 76.
The
filenames correspond to gene names listed in Supplementary Table S4.
StructuralClusterHomologs.xlsx: An Excel spreadsheet with
information about
RcGTA homologs found in small clusters (SC) and large clusters (LC)
across α-proteobacterial genomes. The table contains the GI and
accession numbers of each
homolog, as well as accession number and taxonomic information of the
source
genome.
SC_and_LC_homologs_per_genome.zip: FASTA-formatted files of RcGTA structural cluster homologs identified during the screen of 255 fully sequenced α-proteobacterial genomes. Each file represents an individual cluster found within a genome, and name of the file contains the source genome name, genome accession number and type of cluster (LC or SC). Within file, definition line of each FASTA header is augmented with the type of cluster (SC or LC) and RcGTA gene name of the homolog (see first column of Supplementary Table 4 for notations).
individual_proteins_fa.zip: FASTA-formatted sets of individual RcGTA
structural cluster genes and their large cluster (LC) homologs used to create
the LC-locus alignment. The filenames correspond to gene names listed in
Supplementary Table S4.
individual_proteins_aln.zip: FASTA-formatted alignments of individual RcGTA
structural cluster genes and their large cluster (LC) homologs used to create
the LC-locus alignment. The filenames correspond to gene names listed in
Supplementary Table S4.
individual_trees.zip: NEWICK-formatted
phylogenetic trees reconstructed from the alignments in individual_protein.zip
file. These trees were used in analyses shown in Supplementary Table S3.
LC_locus.zip: FASTA-formatted LC-locus alignment and NEWICK-formatted phylogenetic tree of the LC-locus (the right panel of Figure 6).
PPD.zip: Pairwise phylogenetic distances (PPDs) of RcGTA homologs found in large clusters (LC), small clusters (SC), and viruses in tab-delimited text files, and FASTA-formatted alignments of RcGTA homologs used to calculate the PPDs. The data are shown in Supplementary Figure S4.
flanking_genes.zip: FASTA-formatted alignments and NEWICK-formatted phylogenetic trees of three genes that were found flanking large clusters detected in non-alpha-proteobacterial genomes. The trees are shown in Supplementary Figure S8.
reference_tree.zip: PHYLIP-formatted concatenated alignment of 99 alignments of genes conserved across α-proteobacteria (see Supplementary Table S2), and NEWICK-formatted phylogenetic trees reconstructed using this alignment (see Figure 6 and Supplementary Figure S3.)