10.6084/m9.figshare.103144.v1 Daniel Ocampo Daza Daniel Ocampo Daza Christina A Bergqvist Christina A Bergqvist Dan Larhammar Dan Larhammar Phylogenetic analyses of the insulin-like growth factor binding protein (IGFBP) family figshare 2012 igfbp phylogeny phylogenetic trees evolution Molecular Biology Bioinformatics Evolutionary Biology 2012-12-06 15:50:14 Dataset https://figshare.com/articles/dataset/Phylogenetic_analyses_of_the_insulin-like_growth_factor_binding_protein_IGFBP_family/103144 <p>Phylogenetic re-analyses of Insulin-like Growth Factor Binding Proteins (IGFBPs) based on amino acid sequences. The sequences and alignment described in <em>Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 </em>(link below) were used to analyze additional IGFBP sequences identified in the genome databases of <em>Anolis carolinensis</em> (anole lizard),<em> Latimeria chalumnae</em> (coelacanth)<em> </em>and <em>Lepisosteus oculatus </em>(spotted gar). Phylogenetic trees were made using neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods, both supported by bootstrap analyses (details below). Figures (PDF-files) of the finished trees are included in the files <em>IGFBP_NJ_figure.pdf </em>and <em>IGFBP_PhyML_figure.pdf</em>. Branch colors are based on chromosomal locations and follow the trees published in <em>Ocampo Daza et al. (2011) </em>(link below).</p> <p><strong>Species abbreviations</strong></p> <p><em>Homo sapiens</em> (Hsa, human), <em>Mus musculus</em> (Mmu, mouse), <em>Canis familiaris</em> (Cfa, dog), <em>Monodelphis domestica</em> (Mdo, opossum), <em>Gallus gallus</em> (Gga, chicken), <em>Taeniopygia guttata</em> (Tgu, zebra finch), <em>Anolis carolinensis </em>(Aca, anole lizard), <em>Latimeria chalumnae </em>(Lch, coelacanth), <em>Lepisosteus oculatus </em>(Loc, spotted gar), <em>Danio rerio</em> (Dre, zebrafish), <em>Oryzias latipes</em> (Ola, medaka),<em>Gasterosteus aculeatus</em> (Gac, stickleback), <em>Tetraodon nigroviridis</em> (Tni, green-spotted pufferfish),<em>Takifugu rubripes</em> (Tru, fugu), <em>Ciona intestinalis</em> (Cin, vase tunicate), <em>Ciona savignyi </em>(Csa, Pacific transparent tunicate) and <em>Branchiostoma floridae </em>(Bfl, Florida lancelet). </p> <p><strong>Sequences used</strong></p> <p>Detailed information about all sequences that were used is included in the file <em>Sequence_info_Tab1.xlsx </em>(MS Excel spreadsheet). This includes database identifiers and chromosome/linkage group locations as well as notes on the manual curation/annotation of the sequences.</p> <p><strong>Alignment</strong></p> <p>The full amino acid sequence alignment used for the phylogenetic analyses is included in an interleaved format (.aln) and a sequential format (.fasta) in the files <em>IGFBP_alignment_interleaved.aln</em> and <em>IGFBP_alignment_sequential.fasta</em>. The alignment was made using the ClustalW algorithm and edited manually as described in <em>Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 </em>(link below). Anole lizard, coelacanth and spotted gar sequences marked with asterisks are fragments and do not span the full length of the alignment (details in the file <em>Sequence_info_Tab1.xlsx</em>). </p> <p><strong>Phylogenetic analysis, NJ method</strong></p> <p>The Neighbor Joining tree was made in ClustalX 2.0, with settings as described in <em>Ocampo Daza et al. (2011)</em> (link below). The tree is supported by a bootstrap analysis with 1000 bootstrap replicates. The raw output is included in the file <em>IGFBP_NJ.txt </em>and the final tree, rooted with the lancelet IGFBP sequence, is included in the file <em>IGFBP_NJ_rooted.phb</em>. Both files are in the Newick/Phylip data format. </p> <p><strong>Phylogenetic trees, PhyML method</strong></p> <p>The Phylogenetic Maximum Likelihood tree was made using the PhyML3.0 algorithm implemented through the web-based interface available at <em>http://www.atgc-montpellier.fr/phyml/</em>. The following settings were used: </p> <p>. Amino acid subst. model : LG<br>. Proportion of invariable sites : estimated<br>. Number of subst. rate categs : 8<br>. Gamma distribution parameter : estimated<br>. 'Middle' of each rate class : mean<br>. Amino acid equilibrium frequencies : empirical<br>. Optimise tree topology : yes<br>. Tree topology search : NNIs<br>. Starting tree : BioNJ<br>. Add random input tree : no<br>. Optimise branch lengths : yes<br>. Optimise substitution model parameters : yes</p> <p>The tree is supported by a bootstrap analysis with 100 bootstrap replicates. The final tree, rooted with the lancelet IGFBP sequence, is included in the file <em>IGFBP_PhyML.phb </em>(Newick/Phylip format). The raw output files of the PhyML analysis are included in the following files:</p> <p>. <em>igfbp_ml_121119_phy_stdout.txt</em></p> <p>. <em>igfbp_ml_121119_phy_phyml_tree.txt</em></p> <p>. <em>igfbp_ml_121119_phy_phyml_stats.txt</em></p> <p>. <em>igfbp_ml_121119_phy_phyml_boot_trees.txt</em></p> <p>. <em>igfbp_ml_121119_phy_phyml_boot_stats</em></p> <p><strong>File formats</strong></p> <p>All phylogenetic data is included in the Newick/Phylip format. For more information on the PhyML output files and data formats, see <em>http://www.atgc-montpellier.fr/download/papers/phyml_manual_2009.pdf</em>.</p>