Phylogenetic analyses of 34 syntenic gene families in visual opsin gene-bearing chromosome regions

Version 7 2015-08-20, 10:17

Version 6 2015-08-20, 10:17

Version 5 2013-06-24, 16:21

dataset

posted on 2015-08-20, 10:17 authored by David Lagman, Daniel Ocampo DazaDaniel Ocampo Daza, Jenny Widmark, Görel Sundström, Dan Larhammar

Sequence based phylogenetic analyses of 34 vertebrate gene families identified in an analysis of conserved synteny in chromosome regions containing the genes for visual opsins, the G-protein alpha subunit families for transducin (GNAT) and adenylyl cyclase inhibition (GNAI), the oxytocin and vasopressin receptors (OT/VP-R), and the L-type voltage gated calcium channels (CACNA1-L). For each gene family amino acid sequences were predicted from the Ensembl genome browser (http://www.ensembl.org) and used to create sequence alignments and phylogenetic trees. Vertebrate gene families were defined based on Ensembl protein family predictions. Database identifiers, location data, genome assembly information and annotation notes for all identified protein families and sequences are included in 'Supplemental Table 705852.xlsx' (Excel spreadsheet). This spreadsheet also includes informaction on 7 gene families that were discarded from the analyses. Gene families are identified by unique abbreviations based on approved HUGO Gene Nomenclature Committe (HGNC) gene symbols, or known aliases from the NCBI Entrez Gene database.

File information:

For each gene family an alignment file '...align.fasta', a neighbor joining tree '...NJ.phb' and a phylogenetic maximum likelihood tree '...PhyML.phb' are included. Alignments are included in FASTA format with the extension '.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). Corresponding figures for all phylogenetic trees are also included as PDF files. Sequence names/leaf names include species abbreviations (see below) as well as chromosome/linkage group/genomic scaffold numbers, with lowercase letters to distinguish sequences located on the same chromomosome, linkage group or scaffold. For the human sequences the full HGNC gene symbol is included.

The species included in these analyses were (abbreviations and common names in parenthesis): Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Monodelphis domestica (Mdo, grey short-tailed opossum), Gallus gallus (Gga, chicken), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka), Gasterosteus aculeatus (Gac, three-spined stickleback), Tetraodon nigroviridis (Tni, green spotted pufferfish), Ciona intestinalis (Cin, tunicate), Ciona savignyi (Csa, tunicate) and Drosophila melanogaster (Dme, fruit fly). In some analyses the following additional species were used: Sarcophilus harrisii (Sha, Tasmanian devil), Taeniopygia guttata (Tgu, zebra finch), Anolis carolinensis (Aca, Carolina anole lizard), Xenopus (Silurana) tropicalis (Xtr, Western clawed frog), Takifugu rubripes (Tru, Japanese pufferfish), Branchiostoma floridae (Bfl, Florida lancelet) and Caenorhabditis elegans (Cel, nematode).

The following vertebrate gene families are included in this file set:

ATP2B: ATPase, Ca++ transporting, plama membrane
B4GALNT: Beta-1,4-N-acetyl-galactosaminyl transferase
CACNA2D: Calcium channel, voltage-dependent, alpha 2/delta subunit
CAMK1: Calcium/calmodulin dependent protein kinase
CDK: Cyclin-dependent kinase, members 16, 17 and 18
CELSR: Cadherin, EGF LAG seven-pass G-type receptor (flamingo homolog, Drosophila)
CNTN: Contactin precursor
COPG: Coatomer protein complex, subunit gamma
ERC: ELKS/RAB6-interacting/CAST family
FLN: Filamin
GXYLT: Glucoside xylosyltransferase
IKBKE: Kinase epsilon and TANK-binding kinase
IQSEC: IQ motif and Sec7 domain containing
KDM: Lysine specific demethylase 5
KLHDC: Kelch domain containing 8
L1CAM: L1 cell adehesion molecule
LRRN: Leucine rich repeat neuronal
MAGI: Membrane associated guanylate kinase, WW and PDZ domain containing
PHTF: Putative homeodomain transcription factor
PLG: Plaminogen ortholog
PLXNA: Plexin A
PPM1: Protein phosphatase, Mg2+/Mn2+ dependent
PRICKLE: Prickle homolog
PTPN: Protein tyrosine phosphatase, non-receptor type
RBM: RNA binding motif protein
RSBN: Round spermatid basic protein
SEMA3: Sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin)
SRGAP: SLIT-ROBO Rho GTPase activating protein
SYP: Synaptophysin
TIMM: Translocase of inner mitochondrial membrane 17
TWF: Twinfilin
UBA: Ubiquitin-like modifier activating enzyme, members 1 and 7
USP: Ubiquitin specific peptidase, members 4, 11, 15 and 19
WNK: WNK lysine deficient protein kinase

Method details:

Alignments were created using the ClustalW sequence alignment algorithm with the following settings: Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20.

Phylogenetic analyses were carried out based on the included alignments using bootstrap-supported neighbor joining (NJ) as well as phylogenetic maximum likelihood (PhyML) methods supported by approximate likelihood ratio tests (aLRT). Phylogenetic trees are rooted with identified Drosophila melanogaster (fruit fly) sequences, if possible. Alternatively some phylogenetic trees are rooted with other identified invertebrate sequences (see Supplemental Table 1). The B4GALNT, PLG, PTPN, RBM, SEMA3 and USP trees are presented as mindpoint-rooted trees in the figures (PDF), however the phylogenetic tree files (.phb) are unrooted. NJ trees were made using standard settings in ClustalX 2.0.12 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. PhyML trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/‎) with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The amino acid substitution model was chosen based on ProtTest3.2 (http://code.google.com/p/prottest3/) results. The JTT model was applied for all gene families except B4GALNT, CACNA2D, COL, L1CAM, PLG, PPP, QSOX and UBA where the WAG model was chosen, and RPL and TWF where the LG model was chosen. PhyML trees are supported by approximate likelihood ratio tests (aLRT) with SH-like branch upports applied through PhyML.

For the CAMK and GXYLT gene families the PhyML trees were repeated (same settings) using a non-parametric bootstrap analysis with 100 replicates rather that aLRT in PhyML. These trees did not improve on the aLRT-supported tree topologies.

Phylogenetic analyses of 34 syntenic gene families in visual opsin gene-bearing chromosome regions

History

Usage metrics

Categories

Keywords

Licence

Exports