Phylogenetic analyses of the somatostatin receptor gene family

<p>Sequence based phylogenetic analyses of the somatostatin receptor gene family using amino acid sequences predicted from the Ensembl genome browser (<a href=""></a>) and the <em>Lepisosteus oculatus </em>(spotted gar) genome assembly LepOcu1 (<a href=""></a>). Database identifiers, location data, genome assembly information and annotation notes for all identified sequences are included in Supplemental Table 1.xlsx (Excel spreadsheet). </p> <p>File information:</p> <p>Alignment files are included in FASTA-format: 'SSTR_alignment.fasta' and 'SSTR_additional_alignment.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. The second alignment file includes additional teleost fish SSTR-sequences from the NCBI Reference Sequence database, as detailed in 'Supplemental Table 1.xlsx'. Alignments were created using the ClustalWS sequence alignment program with standard settings (Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20) through the JABAWS 2 tool in Jalview 2.7 (<a href=""></a>). The alignments were edited manually in Jalview in order to curate short, incomplete or highly divergent amino acid sequence predictions from the genome database. In this way erroneous automatic exon predictions and exons that had not been predicted could be ratified. </p> <p>Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (<a href=""></a>) and TreeView (<a href=""></a>). The phylogenetic analyses were carried out based on the included alignments using both neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods. Phylogenetic trees are rooted with the human kisspeptin receptor (<em>KISS1R</em>/<em>GPR54</em>) amino acid sequence. </p> <p>The NJ trees are supported by a non-parametric bootstrap analysis with 1000 replicates, applied through ClustalX 2.0 (<a href=""></a>) with standard settings: 'SSTR_NJ_rooted.phb' and 'SSTR_NJ_additional_rooted.phb'. The second phylogenetic tree file includes additional teleost fish SSTR-sequences from the NCBI Reference Sequence database, as detailed above. </p> <p>The PhyML trees are supported by both a non-parametric bootstrap analysis with 100 replicates - 'SSTR_PhyML_bootstrapped_rooted.phb' - and an SH-like approximate likelihood ratio test - 'SSTR_PhyML_aLRT_rooted.phb'. Both trees were made using the PhyML 3.0 algorithm (<a href=""></a>) with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma-shape parameters were estimated from the datasets; the number of substitution rate categories was set to 8; BIONJ was chosen to create the starting tree and both the nearest neighbor interchange (NNI) and subtree pruning and regrafting (SPR) tree improvement methods were used to estimate the best topology; both tree topology and branch length optimization were chosen. The JTT model of amino acid substitution was chosen using ProtTest 3.0 (<a href=""></a>). </p> <p>Species abbreviations are applied as follows: </p> <p><em>Homo sapiens</em> (Hsa, human), <em>Mus musculus</em> (Mmu, mouse), <em>Canis familiaris</em> (Cfa, dog), <em>Monodelphis domestica</em> (Mdo, grey short-tailed opossum), <em>Gallus gallus</em> (Gga, chicken), <em>Anolis carolinensis</em> (Aca, Carolina anole lizard), <em>Silurana (Xenopus) tropicalis</em> (Xtr, Western clawed frog), <em>Latimeria chalumnae</em> (Lch, Comoran coelacanth), <em>Lepisosteus oculatus</em> (Loc, spotted gar), <em>Danio rerio</em> (Dre, zebrafish), <em>Oryzias latipes</em> (Ola, medaka), <em>Gasterosteus aculeatus</em> (Gac, three-spined stickleback), <em>Tetraodon nigroviridis</em> (Tni, green spotted pufferfish), <em>Takifugu rubripes</em> (Tru, fugu) and <em>Drosophila melanogaster</em> (Dme, fruit fly).</p>