figshare
Browse
1/1
15 files

Phylogenetic analyses of the vertebrate oxytocin and vasopressin receptor gene family

dataset
posted on 2013-06-24, 16:28 authored by Daniel Ocampo DazaDaniel Ocampo Daza, Dan Larhammar

Sequence based phylogenetic analyses of vertebrate oxytocin receptor (OTR) and vasopressin receptor (VPR) genes using amino acid sequences predicted primarily from the Ensembl (http://www.ensembl.org) and Pre Ensembl (http://pre.ensembl.org) genome browsers. These analyses are based on our previously published study identifying OTR and VPR sequences in vertebrate genomes, including previously unrecognised subtypes of V2 receptors - Ocampo Daza D., Lewicka M. and Larhammar D. (2012) The oxytocin/vasopressin receptor family has at least five members in the gnathostome lineage, inclucing two distinct V2 subtypes, General and Comparative Endocrinology 175(1):135-143 (link below). These updated analyses include more species and suggest an update of VPR gene nomenclature.

Species and genome assembly information, database identifiers, location data and annotation notes for all identified sequences are included in the Excel workbook 'Master_OTR_VPR_sequence_tables.xlsx'. These tables also detail the updated vs. outdated nomenclature. All identified and curated amino acid sequences are included in the FASTA file 'Master_OTR_VPR_sequences.fasta'.

Legends:

Sequences marked * are not full-length, sequences marked # are not full-length and the prediction of the intracellular loop 3 (IL3) is not clear. The sequence marked § is a putative pseudogene. See details in 'Master_OTR_VPR_sequence_tables.xlsx'. Numbers in sequence names indicate the chromosome/linkage group where known. 

File information 1:

Species included in these analyses, with abbreviations: human (Homo sapiens, Hsa), mouse (Mus musculus, Mmu), grey short-tailed opossum (Monodelphis domestica, Mdo), chicken (Gallus gallus, Gga), Carolina anole lizard (Anolis carolinensis, Aca), Western clawed frog (Xenopus tropicalis, Xtr), coelacanth (Latimeria chalumnae, Lch), spotted gar (Lepisosteus oculatus, Loc), zebrafish (Danio rerio, Dre), three-spined stickleback (Gasterosteus aculeatus, Gac), medaka (Oryzias latipes, Ola), Southern platyfish (Xiphophorus maculatus, Xma), Japanese pufferfish (Takifugu rubripes, Tru) and Elephant shark (Callorhinchus milii, Cmi).

Alignment file included in FASTA-format: 'align_OTR_VPR_edited.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. This alignment has been curated and edited as described in the Methods sections and Supplementary Material 3 of Ocampo Daza D. et al. (2012) Gen. Comp. Endocrinol 175(1) (link below), removing parts of the amino terminal, carboxy terminal and intracellular loop 3. The alignment was created using the MUSCLE algorithm applied through eBioX (http://www.ebioinformatics.org/ebiox/) using standard settings with 16 iterations. The alignment was edited manually in eBioX.

Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). All trees were made using the alignment described above. Corresponding figures for each phylogenetic tree are also included as PDF-files. Red nodes and support values indicate values lower than 50%.

The neighbor joining (NJ) tree, 'NJ_tree_OPR_VPR.phb', was made using standard settings in ClustalX 2.0 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates.

Phylogenetic Maximum Likelihood (PhyML) trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/‎) through the PhyML-aBayes application. One tree is supported by a non-parametric bootstrap analysis with 100 replicates, 'PhyML_tree_OTR_VPR_boot.phb', and one is supported by an SH-like approximate likelihood ratio test (aLRT), 'PhyML_tree_OTR_VPR_aLRT.phb'. Both PhyML trees were made with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The JTT model of amino acid substitution was chosen using ProtTest 3.0 (https://bitbucket.org/diegodl/prottest3/downloads).

File information 2:

The alignment file '120922_align_Tni.fasta' includes OTR and VPR sequences identified in the spotted green pufferfish (Tetraodon nigroviridis, Tni) genome. The alignment file '120922_align_Psi_Cpi.fasta' includes OTR and VPR sequences identified in the Chinese softshell turtle (Pelodiscus sinensis, Psi) and painted turtle (Chrysemys picta bellii, Cpi) genomes. These alignments are based on the alignment used for the study described in Ocampo Daza D. et al. (2012) Gen. Comp. Endocrinol 175(1) and were made using the ClustalW algorithm in ClustalX 2.0 (http://www.clustal.org/clustal2/) with standard settings (Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20).

For the spotted green pufferfish, only the automatic Ensembl predictions were used to verify all family members. For the two turtles, the identified seqences were curated manually in order to ratify erroneous automatic exon predictions and to predict exons or whole gene predictions that had not been identified. Genome assembly information, database identifiers, location data and annotation notes for these sequences are also included in the Excel workbook 'Master_OTR_VPR_sequence_tables.xlsx'. The un-aligned sequence predictions are included in the FASTA file 'Master_OTR_VPR_sequences.fasta'.

These sequences were tested in NJ trees made using standard settings in ClustalX 2.0 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. The file '120922_NJ_tree_Tni.phb' includes spotted green pufferfish and the file '121022_NJ_tree_Psi_Cpi.phb' includes the two turtle species. Both tree files are in Phylip/Newick format. Corresponding figures for each phylogenetic tree are also included as PDF-files, with the spotted green pufferfish and turtle sequences marked in color.

History