Phylogenetic analyses of 47 syntenic gene families in SSTR gene-bearing chromosome regions

Version 8 2015-08-20, 10:24

Version 7 2015-08-20, 10:24

dataset

posted on 2015-08-20, 10:24 authored by Daniel Ocampo DazaDaniel Ocampo Daza, Christina A Bergqvist, Dan Larhammar, Görel Sundström

Sequence based phylogenetic analyses of 47 gene families identified in an analysis of conserved synteny around somatostatin receptor gene-bearing chromosome regions. For each gene family amino acid sequences were predicted from the Ensembl genome browser (http://www.ensembl.org) and used to create sequence alignments and phylogenetic trees. Gene families were defined based on Ensembl protein family predictions. Database identifiers, location data, genome assembly information and annotation notes for all identified protein families and sequences are included in 'Supplemental Table 2.xlsx' and 'Supplemental Table 3.xlsx' (Excel spreadsheets).

File information:

Gene families are identified by unique abbreviations based on approved HUGO Gene Nomenclature Committe (HGNC) gene symbols, or known aliases from the NCBI Entrez Gene database. For each gene family an alignment file '...align.fasta', a neighbor joining tree '...NJ_rooted.phb' and a phylogenetic maximum likelihood tree '...PhyML_rooted.phb' are included.

Alignments are included in FASTA format with the extension '.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. Alignments were created using the ClustalWS sequence alignment program with standard settings (Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20) through the JABAWS 2 tool in Jalview 2.7 (http://www.jalview.org/).

Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). The phylogenetic analyses were carried out based on the included alignments using bootstrap-supported neighbor joining (NJ) as well as phylogenetic maximum likelihood (PhyML) methods. Phylogenetic trees are rooted with identified Drosophila melanogaster (fruit fly) sequences, or with identified Ciona intestinalis or Ciona savignyi (tunicates), Branchiostoma floridae (Florida lancelet, amphioxus), or Caenorhabditis elegans (nematode) sequences if no fruit fly sequence could be found.

The NJ trees are supported by non-parametric bootstrap analyses with 1000 replicates, applied through ClustalX 2.0 (http://www.clustal.org/clustal2/) with standard settings. The PhyML trees are supported by non-parametric bootstrap analyses with 100 replicates made using the PhyML 3.0 algorithm (http://www.atgc-montpellier.fr/phyml/) with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma-shape parameters were estimated from the datasets; the number of substitution rate categories was set to 8; BIONJ was chosen to create the starting tree and the nearest neighbor interchange (NNI) tree improvement method was used to estimate the best topology; both tree topology and branch length optimization were chosen. The LG model of amino acid substitution, which is standard for PhyML 3.0, was chosen. Species abbreviations are applied as follows: Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Canis familiaris (Cfa, dog), Monodelphis domestica (Mdo, grey short-tailed opossum), Macropus eugenii (Meu, tammar wallaby), Ornitorhynchus anatinus (Oan, platypus), Gallus gallus (Gga, chicken), Taeniopygia guttata (Tgu, zebra finch), Meleagris gallopavo (Mga, turkey), Anolis carolinensis (Aca, Carolina anole lizard), Silurana (Xenopus) tropicalis (Xtr, Western clawed frog), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka), Gasterosteus aculeatus (Gac, three-spined stickleback), Tetraodon nigroviridis (Tni, green spotted pufferfish), Takifugu rubripes (Tru, fugu), Ciona intestinalis (Cin, tunicate), Ciona savignyi (Csa, tunicate), Branchiostoma floridae (Bfl, amphioxus), Caenorhabditis elegans (Cel, nematode) and Drosophila melanogaster (Dme, fruit fly). The following gene families are included in this file set: ABHD12: Abhydrolase domain containing 12 CFL: Cofilin and destrin (actin depolymerizing factor) FLRT: Fibronectin leucine rich transmembrane protein FOXA: Forkhead box A ISM: Isthmin homolog JAG: Jagged NIN: Ninein (GSK3B interacting protein) NKX2: NK2 homeobox 1 and 4 PAX: Paired box 1 and 9 PYG: Glycogen phosphorylase; brain, liver and muscle variants RALGAPA: Ral GTPase activating protein, alpha subunit RIN: Ras and Rab interactor SEC23: Sec23 homologs A and B SLC24A: Solute carrier family 24 members 3 and 4 SNX: Sorting nexin 5, 6 and 32 SPTLC: Serine palmitoyltransferase, long chain base subunit 2 and 3 VSX: Visual system homeobox ADAP: ArfGAP with dual PH domains ATP2A: ATPase, Ca++ transporting, cardiac muscle, fast twitch C1QTNF: C1q and tumor necrosis factor related protein CABP: Calcium binding protein 1, 3, 4 and 5 CACNA1: Calcium channel, voltage dependent, T type alpha subunit CREBBP: CREB binding protein CYTH: Cytohesin FAM20: Family with sequence similarity 20 FNG: Fringe homolog FSCN: Fascin homolog 1 and 2, actin-bundling protein GLPR: Glucagon, glucagon-like and gastric inhibitory polypeptide receptors GGA: Golgi-associated, gamma adapting ear containing, ARF-binding protein GRIN2: Glutamate receptor, ionotropic, N-methyl D-aspartate 2 KCNJ: Potassium inwardly-rectifying channel, subfamily J member 2, 4, 12 and 14 KCTD: Potassium channel tetramerisation domain containing 2, 5 and 17 METRN: Meteorin, glial cell differentiation regulator NDE: nudE nuclear distribution gene E homolog RAB11FIP: RAB11 family interacting protein 3 and 4 (class II) RADIL: Ras association and DIL domains/Ras interacting protein RHBDF: Rhomboid 5 homolog RHOT: Ras homolog gene family, member T1 and T2 RPH3A: Rabphilin 3A homolog/double C2-like domains, alpha SDK: Sidekick cell adhesion molecule SOX: Sex-determining region Y-box 8, 9 and 10 TEX2: Testis expressed 2 TNRC6: Trinucleotide repeat containing 6 TOM1: Target of myb1 TTYH: Tweety homolog USP: Ubiquitin specific peptidase 31 and 43 WFIKKN: WAP, follistatin/kazal, immunoglobulin, kunitz and netrin domain contaning