Metabolic Diversity within the Globally Abundant Marine Group II Euryarchaea Offers Insight into Ecological Patterns - Accepted

2018-11-27T18:46:43Z (GMT) by Benjamin Tully
Supplementary and Additional Data accompanying manuscript analyzing 250 MGII genomes. Includes FASTA files of genomic contigs, proteins, and Anvi'o databases for re-examining manually refined genomes.
AdditionalData_PhylogenomicMarkers-M120.tar.gz
FASTA format of the 120 marker proteins for all MGII genomes.
AdditionalData_MEROPS.pfam.hmm.tar.gz
Pfam HMMs used to identify MEROPS peptidases. The link between Pfams and MEROPS can be found in Supplementary Table 4.
AdditionalData_ProposedMotilityOperons-Genbank.tar.gz
GenBank files gathered from NCBI for the putative archaeal flagellum operon for Tully et al. (2018) genomes.
AdditionalData_AllGlobalSamples-rawreadcounts.tar.gz
Raw read counts for all MGII contigs generated from Tara Oceans and Ocean Sampling Day samples.

Supplementary Table 1. Information for all genomes used in study, including source ID numbers, clade assignment, completion stats (length, percent complete, percent contamination, percent strain heterogeneity), and source reference. Estimated completeness (%Comp); estimated contamination (%Contam); estimated strain heterogeneity (%Strain); Size is displayed in Mbp.
Supplementary Table 2. Information for analyzed genomes used in study, including clade and subclade assignment, completion stats (length, percent complete, percent contamination, percent strain heterogeneity), and source reference. Estimated completeness (%Comp); estimated contamination (%Contam); estimated strain heterogeneity (%Strain); percent GC (%G+C). All lengths are in Mbp. Approximate length determined by (Size × %Comp).
Supplementary Table 3. All genomes with identified archaeal flagellum components, including the number of identified components and a prediction of if a full operon is present. Genomes from Tully et al. (2018) used to visualize the putative operon have NCBI contig accession and operon protein IDs listed.
Supplementary Table 4. A breakdown of the peptidases from the MEROPS database with corresponding IDs and Pfams.
Supplementary Table 5. Corresponding metadata (environmental parameters) for Tara Oceans samples. Latitude (degrees North); Size (Tara size fraction scaled from 0-8); Depth (m); Temp (Temperature °C); Oxygen (μmol kg-1); Chloro (HPLC chlorophyll mg Chl m-3); PO4 (phosphate μM); NO2NO3 (nitrite + nitrate μM); Sal (salinity PSU)
Supplementary Table 6. Corresponding metadata (environmental parameters) for Ocean Sampling Day samples. Lat (latitude degree North); Depth (m); Temp (temperature °C); PO4 (phosphate μM); NO3 (nitrate μM); Oxygen (μmol kg-1); Coast (distance from coast in m).

Supplementary Data 1. Groups of MGII genomes with ≥98.5% ANI denoting representative and duplicate genome IDs.
Supplementary Data 2. Spreadsheet of the pairwise ANI and AAI values for MGIIa and MGIIb.
Supplementary Data 3. Counts of the number of proteins identified as an extracellular peptidase or carbohydrate-active enzyme for each genome.
Supplementary Data 4. Newick file of the phylogenomic tree generated using 120 concatenated marker proteins for the full redundant set of MGII genomes with ≥60 markers. Corresponds to Supplementary Figure 1.
Supplementary Data 5. Newick file of the phylogenomic tree generated using 120 concatenated marker proteins for the analyzed set of MGII genomes. Corresponds to Figure 1.
Supplementary Data 6. FASTA format of the 16S rRNA gene sequences present in the MGII genomes.
Supplementary Data 7. Newick file of the phylogenetic tree generated using the 16S rRNA gene sequences present in the MGII genomes. Corresponds to Supplementary Figure 4.
Supplementary Data 8. FASTA format of the putative proteorhodopsin sequences identified in the MGII genomes.
Supplementary Data 9. Newick file of the phylogenetic tree generated using the proteorhodopsin sequences present in the MGII genomes. Corresponds to Supplementary Figure 6.
Supplementary Data 10. Protein clusters generated using the Anvi’o pangenome workflow identified as either ‘Core MGII’, ‘Core MGIIa’, or ‘Core MGIIb’. Only includes proteins with putative KEGG assignments.
Supplementary Data 11. Derived relative fraction and RPKM values for all MGII genomes for all Tara Oceans and Ocean Sampling Day samples. Corresponds to Figure 5 and Supplementary Figure 8.
Supplementary Data 12. Spreadsheet of the PERMANOVA statistics (F statistic, p-value, and p-value Benjamini-Hochberg False Discovery Rate) and the results of the ad hoc pairwise PERMANOVA tests (p-value Bonferroni corrected).