Supplemental Material for Li and Ralph, 2018

2018-11-20T18:19:29Z (GMT) by Peter L. Ralph Han Li
<div>Supplementary Tables and Figures for Li and Ralph, "Local PCA Shows How the Effect of Population Structure Differs<br>Along the Genome"<br></div><div><br></div><div><br>Table S1.<br><br>Correlations between MDS coordinates of genomic regions between runs with<br>different parameter values. To produce these, we first ran the algorithm with<br>the specified window size and number of PCs (k) on the full Medicago truncatula<br>dataset. Then to obtain the correlation between results obtained from<br>parameters A in the row of the matrix above and parameters B in the column of<br>the matrix above, we mapped the windows of B to those of A by averaging MDS<br>coordinates of any windows of B whose midpoints lay in the corresponding window<br>of A; we then computed the correlation between the MDS coordinates of A and the<br>averaged MDS coordinates of B. This is not a symmetric operation, so these<br>matrices are not symmetric. As expected, parameter values with smaller windows<br>produce noisier estimates, but plots of MDS values along the genome are<br>visually very similar.<br><br><br>Figure S1.<br><br>PCA plots for chromosome arms 2L, 2R, 3L, 3R and X of the Drosophila<br>melanogaster dataset.<br><br>Figure S2.<br><br>PCA plots for all 22 human autosomes from the POPRES data.<br><br>Figure S3.<br><br>PCA plots for all 8 chromosomes in the Medicago truncatula dataset.<br><br><br>Figure S4.<br><br>MDS visualizations of the Gaussian genotypes described in the Appendix, for 50<br>individuals from each of three populations. (top) The first quarter, middle<br>half, and final quarter of the chromosome each have different population<br>structure, as expected, despite the possibility for PC switching within each.<br>(bottom) The same picture results even after marking a random 50% of the<br>genotypes in the first half of the chromosome as missing.<br><br>Figure S5.<br><br>MDS visualizations of the results of individual-based simulations using SLiM<br>(see Appendix for details). All simulations are neutral, and recombination is:<br>(top) constant; (top middle) varies stepwise by factors of two in seven<br>equal-length segments, with highest rates on the ends, so the middle segment<br>has a recombination rate 64 times lower than the ends; (bottom middle)<br>according to the HapMap human female chromosome 7 map. The bottom figure shows<br>PCA maps corresponding to the three colored windows of the last (HapMap)<br>situation; the outlying regions are long regions of low recombination rate, so<br>that region can be dominated by a few correlated trees, similar to an<br>inversion. The (inset) provides a key to the locations of the individuals on<br>the spatial landscape.<br><br><br>Figure S6.<br><br>MDS visualizations of the results of individual-based simulations using SLiM<br>(see Appendix for details). All simulations incorporate linked selection by<br>allowing selected mutations to appear in the same two regions of the genome:<br>the one-sixth of the genome immediately before the halfway point, and the last<br>one-sixth of the genome. (top) Constant recombination rate. (top middle)<br>Stepwise varying recombination rate. (bottom middle) Constant recombination<br>rate with spatially varying effects of selection. (bottom) PCA plots<br>corresponding to the highlighted corners of the last MDS visualization, showing<br>how spatially varying linked selection has affected patterns of relatedness.<br>The (inset) provides a key to the locations of the individuals on the spatial<br>landscape.<br><br>Figure S7.<br><br>MDS visualizations for each chromosome arm of Drosophila melanogaster, as in<br>Figure 2, except that the method was run using five PCs (k=5) instead of<br>two.<br><br>Figure S8.<br><br>The proportion of data in each window that are missing, compared to the value<br>of the first MDS coordinate for the Drosophila melanogaster data from Figure 2.<br><br>Figure S9.<br><br>PCA plots for the three sets of genomic windows colored in Figure 2, on each<br>chromosome arm of Drosophila melanogaster. In all plots, each point<br>represents a sample. The first column shows the combined PCA plot for windows<br>whose points are colored green in Figure 2; the second is for orange windows;<br>and the third is for purple windows.<br><br>Figure S10.<br><br>Variation in structure for windows of 1,000 SNPs across Drosophila melanogaster<br>chromosome arms: without inversions. As in Figure 2, but after omitting for<br>each chromosome arm individuals carrying the less frequent orientation of any<br>inversions on that chromosome arm. The values differ from those in Figure 4 in<br>the window size used and that some MDS values were inverted (but relative<br>orientation is meaningless as chromosome arms were run separately, unlike for<br>Medicago). In all plots, each point represents one window along the genome.<br>The first column shows the MDS visualization of relationships between windows,<br>and the second and third columns show the midpoint of each window against the<br>two MDS coordinates; rows correspond to chromosome arms. Vertical lines show<br>the breakpoints of known polymorphic inversions. <br><br>Figure S11.<br><br>Recombination rate, and the effects of population structure for Drosophila<br>melanogaster: this shows the first MDS coordinate and recombination rate (in<br>cM/Mbp), as in Figure 4, against each other. Since the windows underlying<br>estimates of Figure 4 do not coincide, to obtain correlations we divided the<br>genome into 100Kbp bins, and for each variable (recombination rate and MDS<br>coordinate 1) averaged the values of each overlapping bin with weight<br>proportional to the proportion of overlap. The correlation coefficient and<br>p-values for each linear regression are as follows: 2L: correlation=0.52,<br>r^2=0.27; 2R: correlation=0.43, r^2=0.18; 3L: correlation=0.47, r^2=0.21; 3R:<br>correlation=0.46, r^2=0.21; X: correlation=0.50, r^2=0.24.<br><br><br>Figure S12.<br><br> MDS plots for human chromosomes 1-8. The first column shows the MDS<br>visualization of relationships between windows, and the second and third<br>columns show the midpoint of each window against the two MDS coordinates; rows<br>correspond to chromosomes. Colorful vertical lines show the breakpoints<br>of known valid inversions, while grey vertical lines show the breakpoints of<br>predicted inversions.<br><br>Figure S13.<br><br>MDS plots for human chromosomes 9-16, as in Supplementary Figure S12.<br><br>Figure S14.<br><br>MDS plots for human chromosomes 17-22, as in Supplementary Figure S12.<br><br>Figure S15.<br><br>Comparison of PCA figures within outlying windows (center column) and flanking<br>non-outlying windows (left and right columns) for the two windows having<br>outlying MDS scores on chromosome 8.<br><br>Figure S16.<br><br>MDS visualization of variation in the effects of population structure amongst<br>windows across all human autosomes simultaneously. The small group of<br>windows with positive outlying MDS values lie around the inversion at 8p23.<br><br>Figure S17.<br><br>First MDS coordinate against gene density for all 8 chromosomes of M. truncatula.<br>The first MDS coordinate is significantly correlated with gene count (r=0.149, p=2.2e-16). <br><br>Figure S18.<br><br>MDS visualizations of the effects of population structure for all 8 chromosomes <br>of the Medicago truncatula data, using windows of 10000 SNPs.<br><br><br><br>Figure S19.<br><br>PCA plots for regions colored in Figure S18 on all 8 chromosomes of<br>Medicago truncatula: (A) green, (B) orange, and (C) purple.<br><br><br></div>