10.6084/m9.figshare.1162875.v1 John Wares John Wares Thomas F. Turner Thomas F. Turner Mitochondrial diversity in rainbow trout and close relatives figshare 2014 alignment mitochondrial sequence trout salmonid Zoology Evolutionary Biology 2014-09-09 16:55:15 Dataset https://figshare.com/articles/dataset/Mitochondrial_diversity_in_rainbow_trout_and_close_relatives/1162875 <p>This is an old but perhaps useful effort from postdoctoral work on trout diversity while JPW was working with TFT at the University of New Mexico. Here we aligned all available (as of 2002) D-loop sequence data for rainbow trout (Oncorhynchus mykiss) and close relatives, including Gila trout and Mexican golden trout. Much of these data were only available in the gray literature; as such this is an alignment of data that may not otherwise be accessible to researchers.</p> <p><strong>Methods</strong></p> <p>Sequence data from the mitochondrial control region were compiled from previously published data sets (Bagley and Gall 1998) (Nielsen papers) and collected <em>de novo</em> from the protected southwestern trout species (<em>O. gilae gilae</em> and <em>O. g. apache</em>; Genbank accessions AF517763-AF517767; Wares et al. 2004).  Table 1 lists the full set of haplotypes.  Additionally, the complete sequence of the mitochondrial control region for <em>O. mykiss</em> (Digby et al. 1992) was included for cross-referencing among haplotypes in different studies.  We aligned these data using ClustalW; the conserved sequence blocks CSB-2 and CSB-3 (Digby et al. 1992) were included as ‘anchors’ for aligning highly variable regions.  To ensure positional homology, we performed a Bayesian analysis of alignment stability (ProAlign 0.2, Löytynoja and Milinkovitch 2003).  Sites with < 90% posterior probability of positional homology were excluded.</p> <p>The most-parsimonious (MP) haplotype trees were generated using PAUP*4.0b10 (Swofford 2002), counting gaps/indels as a fifth character state.  Each tree was midpoint-rooted as an approximate means of finding the coalescent root haplotype (Wares et al. 2001) without knowledge of haplotype frequencies in each population from previously published data sets.  The set of basal haplotypes that appeared as the midpoint root or within a single substitution of this position in 95% of the first 1000 MP trees is considered the ‘root’ set of haplotypes in the <em>O. mykiss</em> species complex.  Majority-rule consensus scores (based on the first 1000 MP trees) were calculated for each clade.</p> <p> </p> <p>An additional measure of clade support was obtained using MrBayes 3.0 (F. Ronquist and J. P. Huelsenbeck).  The best-fit base model was chosen using likelihood-ratio tests (ModelTest 3.06; Posada and Crandall 1998).  Because there are often few parsimony-informative sites in intraspecific data sets, typical methods for inferring statistical support (e.g. nonparametric bootstrapping) often only weakly support intraspecific clades.  However, Bayesian methods test the likelihood of genealogical hypotheses rather than the likelihood of the data (Huelsenbeck et al. 2001) and estimate the probability that a clade represents a true historical pattern (Huelsenbeck et al. 2002; but see Suzuki et al. 2002, Douady et al. 2003).  The analysis started from a random tree and ran for 10<sup>7</sup> cycles, sampling every 100 trees.  We ran four MCMC chains with the initial 5 x 10<sup>5</sup> cycles (5%) discarded as burn-in; this limit was obtained by monitoring likelihood scores graphically for convergence on a stationary value, and this analysis was repeated three times to ensure global stability of our estimate.  A Dirichlet distribution was assumed for the base frequency parameters and an uninformative prior was used for the topology. </p> <p> </p> <p><strong>Results</strong></p> <p>From the aligned data set of 442 bp of D-loop sequence data, 30 were excluded due to <90% probabilities of positional homology (alignment available from JPW). Four indel positions were observed. Of 63 variable sites, 48 are parsimony-informative. The best-fit model of substitution is HKY (Ti:Tv = 3.99; Hasegawa et al. 1985) with gamma-distributed rate variation (α=0.087). These sequence data only marginally reject a model of clocklike evolution (χ2 = 94.117, d.f. =72, p = 0.041). The haplotype network is shown here as Figure 1. The alignment and explanatory table are also part of this Fileset.</p> <p> </p>