%0 Generic %A Ocampo Daza, Daniel %A A Bergqvist, Christina %A Larhammar, Dan %D 2012 %T Phylogenetic analyses of the insulin-like growth factor binding protein (IGFBP) family %U https://figshare.com/articles/dataset/Phylogenetic_analyses_of_the_insulin-like_growth_factor_binding_protein_IGFBP_family/103144 %R 10.6084/m9.figshare.103144.v1 %2 https://ndownloader.figshare.com/files/3068738 %2 https://ndownloader.figshare.com/files/3068744 %2 https://ndownloader.figshare.com/files/3068750 %2 https://ndownloader.figshare.com/files/3068759 %2 https://ndownloader.figshare.com/files/3068765 %2 https://ndownloader.figshare.com/files/3068774 %2 https://ndownloader.figshare.com/files/3068777 %2 https://ndownloader.figshare.com/files/3068786 %2 https://ndownloader.figshare.com/files/3068798 %2 https://ndownloader.figshare.com/files/3068804 %2 https://ndownloader.figshare.com/files/3068807 %2 https://ndownloader.figshare.com/files/3068816 %2 https://ndownloader.figshare.com/files/3068822 %K igfbp %K phylogeny %K phylogenetic trees %K evolution %K Molecular Biology %K Bioinformatics %K Evolutionary Biology %X

Phylogenetic re-analyses of Insulin-like Growth Factor Binding Proteins (IGFBPs) based on amino acid sequences. The sequences and alignment described in Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 (link below) were used to analyze additional IGFBP sequences identified in the genome databases of Anolis carolinensis (anole lizard), Latimeria chalumnae (coelacanth) and Lepisosteus oculatus (spotted gar). Phylogenetic trees were made using neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods, both supported by bootstrap analyses (details below). Figures (PDF-files) of the finished trees are included in the files IGFBP_NJ_figure.pdf and IGFBP_PhyML_figure.pdf. Branch colors are based on chromosomal locations and follow the trees published in Ocampo Daza et al. (2011) (link below).

Species abbreviations

Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Canis familiaris (Cfa, dog), Monodelphis domestica (Mdo, opossum), Gallus gallus (Gga, chicken), Taeniopygia guttata (Tgu, zebra finch), Anolis carolinensis (Aca, anole lizard), Latimeria chalumnae (Lch, coelacanth), Lepisosteus oculatus (Loc, spotted gar), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka),Gasterosteus aculeatus (Gac, stickleback), Tetraodon nigroviridis (Tni, green-spotted pufferfish),Takifugu rubripes (Tru, fugu), Ciona intestinalis (Cin, vase tunicate), Ciona savignyi (Csa, Pacific transparent tunicate) and Branchiostoma floridae (Bfl, Florida lancelet). 

Sequences used

Detailed information about all sequences that were used is included in the file Sequence_info_Tab1.xlsx (MS Excel spreadsheet). This includes database identifiers and chromosome/linkage group locations as well as notes on the manual curation/annotation of the sequences.

Alignment

The full amino acid sequence alignment used for the phylogenetic analyses is included in an interleaved format (.aln) and a sequential format (.fasta) in the files IGFBP_alignment_interleaved.aln and IGFBP_alignment_sequential.fasta. The alignment was made using the ClustalW algorithm and edited manually as described in Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 (link below). Anole lizard, coelacanth and spotted gar sequences marked with asterisks are fragments and do not span the full length of the alignment (details in the file Sequence_info_Tab1.xlsx). 

Phylogenetic analysis, NJ method

The Neighbor Joining tree was made in ClustalX 2.0, with settings as described in Ocampo Daza et al. (2011) (link below). The tree is supported by a bootstrap analysis with 1000 bootstrap replicates. The raw output is included in the file IGFBP_NJ.txt and the final tree, rooted with the lancelet IGFBP sequence, is included in the file IGFBP_NJ_rooted.phb. Both files are in the Newick/Phylip data format. 

Phylogenetic trees, PhyML method

The Phylogenetic Maximum Likelihood tree was made using the PhyML3.0 algorithm implemented through the web-based interface available at http://www.atgc-montpellier.fr/phyml/. The following settings were used: 

. Amino acid subst. model : LG
. Proportion of invariable sites : estimated
. Number of subst. rate categs : 8
. Gamma distribution parameter : estimated
. 'Middle' of each rate class : mean
. Amino acid equilibrium frequencies : empirical
. Optimise tree topology : yes
. Tree topology search : NNIs
. Starting tree : BioNJ
. Add random input tree : no
. Optimise branch lengths : yes
. Optimise substitution model parameters : yes

The tree is supported by a bootstrap analysis with 100 bootstrap replicates. The final tree, rooted with the lancelet IGFBP sequence, is included in the file IGFBP_PhyML.phb (Newick/Phylip format). The raw output files of the PhyML analysis are included in the following files:

. igfbp_ml_121119_phy_stdout.txt

. igfbp_ml_121119_phy_phyml_tree.txt

. igfbp_ml_121119_phy_phyml_stats.txt

. igfbp_ml_121119_phy_phyml_boot_trees.txt

. igfbp_ml_121119_phy_phyml_boot_stats

File formats

All phylogenetic data is included in the Newick/Phylip format. For more information on the PhyML output files and data formats, see http://www.atgc-montpellier.fr/download/papers/phyml_manual_2009.pdf.

%I figshare