Hufford et al. 2012
Files and data from
Hufford, M.B.*, X. Xun*, J. van Heerwaarden*, T. Pyhäjärvi*, J-M. Chia, R.A. Cartwright, R.J. Elshire, J.C. Glaubitz, K.E. Guill, S. Kaeppler, J. Lai, P.L. Morrell, L.M. Shannon, C. Song, N.M. Springer, R.A. Swanson-Wagner, P. Tiffin, J. Wang, G. Zhang, J. Doebley, M.D. McMullen, D. Ware, E.S. Buckler, S. Yang, J. Ross-Ibarra. 2012. Comparative population genomics of maize domestication and improvement. Nature Genetics 44:808-811
55K SNP data in Hapmap_55K.zip
Files (rho.LR, rho.teo, rho.mz) are estimates of rho using Hudson's maxhap for teosinte, landraces, and maize from Hufford et al. Nat. Gen. 2012 (though these data were not published with the paper). The files are a bit redundant, but each line looks like:
1 100 141 100 14 0.000098
Which is chromosome, window, and number of SNPs repeated twice, followed by the MLE of rho. So the line above would be the 100th 10kb window on chromosome 1 (on reference genome AGPv1), with a rho=0.000098. I would be hesitant to trust the values with low S (definitely <10 and probably <20 <30), as those probably reflect noise more than anything else.
Summary statistics for 10kb windows genome-wide and for genes in the maize v2 filtered gene set.
Files are for genes in teo, LR, and maize as above, as well as 10kb windows for all 3 taxa in one file.
See details in the paper for criteria for calling SNPs, data used for statistics, etc.
locus: GRM name of gene in the filtered gene set
S: number of Segregating sites
ThetaW: Watterson's estimate of theta (per locus)
ThetaPi: nucleotide diversity (per locus)
ThetaH: Fay and Wu (2000) estimator (per locus)
TajD: Tajima's D
seqbp: # of bp sequenced. this should be used as the denominator to calculate per bp. values of the above statistics.
The modified version of the XP-CLR code used in the paper.