figshare
Browse
1/2
24 files

Reconstruction of the Carbohydrate 6-O Sulfotransferase Gene Family Evolution in Vertebrates Reveals Novel Member, CHST16, Lost in Amniotes

Version 5 2020-07-14, 12:21
Version 4 2019-09-19, 23:04
Version 3 2019-09-02, 00:42
Version 2 2019-08-14, 00:01
Version 1 2019-08-13, 21:47
dataset
posted on 2020-07-14, 12:21 authored by Daniel Ocampo DazaDaniel Ocampo Daza, Tatjana Haitina
Master_table_C6OST.xlsx: Complete table of all identified C6OST sequences. Includes sequence names, chromosomal locations, database identifiers and annotation notes of all C6OST sequences identified in this study. Also includes complete list of species, species abbreviations and genome assemblies used in the study.

Sequence names include species abbreviations followed by chromosome/linkage group designations (if available) and gene symbols. Asterisks indicate incomplete/partial sequences. Paralogs (within-species duplicates) with uncertain phylogenetic relationships are designated as (1of2), (2of2) et c.

We have followed the phylogeny and classification of birds suggested by Prum et al. (2015) Nature 526:569–573 doi: 10.1038/nature15697, and of teleost fishes suggested by Near et al. (2012) PNAS 109:13698–703 doi: 10.1073/pnas.1206625109 and Betancur-R et al. (2017) BMC Evol. Biol. 17:162. doi: 10.1186/s12862-017-0958-3.

For invertebrate species, sequences were also sought using the profile-Hidden Markov Model search tool HMMER (hmmer.org) aimed at reference proteomes.

Master_C6OST_all.rtf: All identified C6OST sequences in FASTA format, in the same order as in Master_table_C6OST.xlsx. Rich Test Format file marking exon junctions in alternating colors.

Master_C6OST_all.fasta: All identified C6OST sequences in FASTA format, in the same order as in Master_table_C6OST.xlsx. FASTA format file for alignment/sequence viewing applications.

Ident_C6OST_seq.txt: List of identical C6OST sequences in this dataset.

Short_unused_C6OST_seq.txt: List of partial C6OST sequences in this dataset that are shorter than 50% of final alignments. These were not used in phylogenies.

190121_C6OST_full_align.fasta: Alignment including the full repertoire of C6OST sequences (CHST1, CHST2, CHST3, CHST4, CHST5, CHST6, CHST7, CHST16 and related genes) in a smaller set of species.

190121_C6OST_full_IQ-TREE.tar.gz: Phylogenetic analysis (IQ-TREE output files) for the full repertoire of C6OST sequences in a smaller set of species. This analysis corresponds to Figures 1-5 in the publication. The file 190305_CHST7_align.fasta.treefile includes the phylogenetic tree in Newick format.

The following files correspond to the alignments and phylogenetic analyses for each of the C6OST subfamilies with the full representation of species. These analyses correspond to Supplementary Figures S1-S8 in the publication. Within the IQ-TREE output files, the files ending on .treefile include the phylogenetic trees in Newick format.

190130_CHST1_align.fasta
190130_CHST1_IQ-TREE.tar.gz
190130_CHST2_align.fasta
190130_CHST2_IQ-TREE.tar.gz
190130_CHST3_align.fasta
190130_CHST3_IQ-TREE.tar.gz
190130_CHST4-5-6_align.fasta
190130_CHST4-5-6_IQ-TREE.tar.gz
190130_CHST16_align.fasta
190130_CHST16_IQ-TREE.tar.gz
190305_CHST7_align.fasta
190305_CHST7_IQ-TREE.tar.gz

Conserved_synteny_gene_lists_Ens83.xlsx: Lists of genes from the C6OST gene-bearing chromosome regions in the human, Carolina anole lizard, spotted gar and zebrafish genomes. Lists are arranged by Ensembl protein family predictions (Ensembl version 83) and number of times each protein family is represented on C6OST-bearing chromosome regions (column named '#').

Conserved_synteny_data.xlsx: Chromosomal/conserved synteny data. Includes chromosomal locations and database identifiers of all C6OST-neighboring genes identified in the study. New gene symbols/names suggested by this study are highlighted in yellow. This file also includes all identified conserved synteny blocks in the human, chicken, Western clawed frog, spotted gar, zebrafish and medaka genomes.

Channel_catfish_CHST1a_region.xlsx: Genes neighboring CHST1a in the channel catfish genome. Used to identify the orthologous region of the zebrafish genome, which lacks a CHST1a gene.

Anolis_Chr2_conserved_synteny.xlsx: Genes neighboring the "CHST4/5-like" gene on Carolina anole lizard chromosome 2. Used to identify the orthologous regions of the human and spotted gar genomes.

Inshore_hagfish_cons_synteny.xlsx: Genes neighboring the inshore hagfish (Eptatretus burgeri) C6OST genes. Used to infer orthology between jawless vertebrate and jawed vertebrate C6OST genes.

Funding

Genomic and developmental approaches to illuminate the evolution of chitinous tissues in vertebrates

Swedish Research Council

Find out more...

Development of cranial tendons in the musculoskeletal system of vertebrates

Swedish Research Council

Find out more...

History