figshare
Browse
1/1
15 files

Supplementary data for Shakya et al. (2017)

Version 4 2018-06-19, 21:46
Version 3 2017-10-27, 16:24
Version 2 2017-09-25, 14:23
Version 1 2017-09-17, 00:46
dataset
posted on 2018-06-19, 21:46 authored by Migun Shakya, Shannon M. Soucy, Olga ZhaxybayevaOlga Zhaxybayeva
This data set contains sequences, sequence alignments and phylogenetic trees used in the bioinformatic analyses presented in:

Shakya M, Soucy SM, and Zhaxybayeva O. "Insights into Origin and Evolution of α-proteobacterial Gene Transfer Agents", submitted.

File Contents:

Supplementary_Figures_final.pdf: Supplementary Figures S1-S9 referred to in the manuscript.

SupplementaryTables.pdf and SupplementaryTables.xlsx: Supplementary Tables S1-S5 referred to in the manuscript.

GTA_Rhodobacterales_queries.zip: FASTA-formatted files of RcGTA homologs from Rhodobacterales that were used in BLAST searches of RefSeq database and 255 α-proteobacterial genomes.

RefSeq_bacterial_hits.zip:
FASTA-formatted files of detected bacterial homologs of RcGTA genes in RefSeq database release 76. The filenames correspond to gene names listed in Supplementary Table S4.

RefSeq_viral_hits.zip: FASTA-formatted files of detected viral homologs of RcGTA genes within RefSeq database release 76. The filenames correspond to gene names listed in Supplementary Table S4.


StructuralClusterHomologs.xlsx: An Excel spreadsheet with information about RcGTA homologs found in small clusters (SC) and large clusters (LC) across α-proteobacterial genomes. The table contains the GI and accession numbers of each homolog, as well as accession number and taxonomic information of the source genome.


SC_and_LC_homologs_per_genome.zip: FASTA-formatted files of RcGTA structural cluster homologs identified during the screen of 255 fully sequenced α-proteobacterial genomes. Each file represents an individual cluster found within a genome, and name of the file contains the source genome name, genome accession number and type of cluster (LC or SC). Within file, definition line of each FASTA header is augmented with the type of cluster (SC or LC) and RcGTA gene name of the homolog (see first column of Supplementary Table 4 for notations).


individual_proteins_fa.zip: FASTA-formatted sets of individual RcGTA structural cluster genes and their large cluster (LC) homologs used to create the LC-locus alignment. The filenames correspond to gene names listed in Supplementary Table S4.


individual_proteins_aln.zip: FASTA-formatted alignments of individual RcGTA structural cluster genes and their large cluster (LC) homologs used to create the LC-locus alignment. The filenames correspond to gene names listed in Supplementary Table S4.


individual_trees.zip: NEWICK-formatted phylogenetic trees reconstructed from the alignments in individual_protein.zip file. These trees were used in analyses shown in Supplementary Table S3.


LC_locus.zip: FASTA-formatted LC-locus alignment and NEWICK-formatted phylogenetic tree of the LC-locus (the right panel of Figure 6).


PPD.zip: Pairwise phylogenetic distances (PPDs) of RcGTA homologs found in large clusters (LC), small clusters (SC), and viruses in tab-delimited text files, and FASTA-formatted alignments of RcGTA homologs used to calculate the PPDs. The data are shown in Supplementary Figure S4.


flanking_genes.zip: FASTA-formatted alignments and NEWICK-formatted phylogenetic trees of three genes that were found flanking large clusters detected in non-alpha-proteobacterial genomes. The trees are shown in Supplementary Figure S8.


reference_tree.zip: PHYLIP-formatted concatenated alignment of 99 alignments of genes conserved across α-proteobacteria (see Supplementary Table S2), and NEWICK-formatted phylogenetic trees reconstructed using this alignment (see Figure 6 and Supplementary Figure S3.)



Funding

The National Science Foundation (NSF-DEB 1551674 to O.Z.); the Simons Foundation (Investigator in Mathematical Modeling of Living Systems award 327936 to O.Z.); the Neukom Institute CompX award to O.Z.;

History