figshare
Browse
1/1
11 files

Phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1 and RH2 clades

dataset
posted on 2013-06-24, 16:28 authored by David Lagman, Daniel Ocampo DazaDaniel Ocampo Daza, Görel Sundström, Dan Larhammar

Sequence based phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1 and RH2 clades, with additional analyses including pinopsins, vertebrate ancient (V/A) opsins and Ciona intestinalis opsins. The phylogenetic analyses were made using amino acid sequences predicted from the Ensembl genome browser (http://www.ensembl.org) version 60 (Nov 2010) and the Lepisosteus oculatus (spotted gar) genome assembly LepOcu1 (http://www.ncbi.nlm.nih.gov/genome/assembly/327908/), as well as sequences identified in the NCBI RefSeq database. Database identifiers, location data, genome assembly, and annotation notes for all sequences are included in 'Supplementary Table OPN.xlsx' (Excel spreadsheet).

File information:

Alignment files are included in FASTA-format: 'align_visual_opsins.fasta' and 'align_visual_opsins_VA_pinops.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. The second alignment file includes additional pinopsin, V/A opsin and Ciona intestinalis opsin sequences, as detailed in 'Supplementary Table OPN.xlsx'. Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). The phylogenetic analyses were carried out based on the included alignments using both neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods. Phylogenetic trees are rooted with the human OPN3 amino acid sequence. Corresponding figures for all phylogenetic trees are also included as PDF files.

Sequence names/leaf names include species abbreviations (see below) as well as chromosome numbers where known. For the human and zebrafish sequences the full HGNC and ZFIN gene symbols are included. For other species the clade name is indicated in the sequence names/leaf names.

The species included in these analyses were (abbreviations and common names in parenthesis): Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Monodelphis domestica (Mdo, grey short-tailed opossum), Gallus gallus (Gga, chicken), Anolis carolinensis (Aca, Carolina anole lizard), Xenopus (Silurana) tropicalis (Xtr, Western clawed frog), Latimeria chalumnae (Lch, coelacanth), Lepisosteus oculatus (Loc, spotted gar), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka), Gasterosteus aculeatus (Gac, three-spined stickleback), Tetraodon nigroviridis (Tni, green spotted pufferfish), Geotria australis (Gau, pouched lamprey) and Ciona intestinalis (Cin, transparent sea squirt).

Method details:

Alignments were created using the ClustalW algorithm with the following settings: Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20. The alignments were edited manually in order to curate short, incomplete or highly divergent amino acid sequence predictions from the genome databases. In this way erroneous automatic exon predictions and exons that had not been predicted could be ratified.

Phylogenetic analyses were carried out based on the included alignments. NJ trees were made using standard settings in ClustalX 2.0.12 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. PhyML trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/‎) with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The JTT model of amino acid substitution was chosen using ProtTest 3.0 (https://bitbucket.org/diegodl/prottest3/downloads). PhyML trees are supported by a non-parametric bootstrap analysis with 100 replicates applied through PhyML.

History