10 files

Hufford et al. 2012

posted on 23.03.2014, 08:54 by Jeffrey Ross-Ibarra

Files and data from 

Hufford, M.B.*, X. Xun*, J. van Heerwaarden*, T. Pyhäjärvi*, J-M. Chia, R.A. Cartwright, R.J. Elshire, J.C. Glaubitz, K.E. Guill, S. Kaeppler, J. Lai, P.L. Morrell, L.M. Shannon, C. Song, N.M. Springer, R.A. Swanson-Wagner, P. Tiffin, J. Wang, G. Zhang, J. Doebley, M.D. McMullen, D. Ware, E.S. Buckler, S. Yang, J. Ross-Ibarra. 2012. Comparative population genomics of maize domestication and improvement. Nature Genetics 44:808-811


55K Data

55K SNP data in Hapmap_55K.zip



Files (rho.LR, rho.teo, rho.mz) are estimates of rho using Hudson's maxhap for teosinte, landraces, and maize from Hufford et al. Nat. Gen. 2012 (though these data were not published with the paper). The files are a bit redundant, but each line looks like:

1 100 141 100 14 0.000098

Which is chromosome, window, and number of SNPs repeated twice, followed by the MLE of rho. So the line above would be the 100th 10kb window on chromosome 1 (on reference genome AGPv1), with a rho=0.000098. I would be hesitant to trust the values with low S (definitely <10 and probably <20 <30), as those probably reflect noise more than anything else.

Popgen Stats

Summary statistics for 10kb windows genome-wide and for genes in the maize v2 filtered gene set. 

Files are for genes in teo, LR, and maize as above, as well as 10kb windows for all 3 taxa in one file.

See details in the paper for criteria for calling SNPs, data used for statistics, etc.

Columns are:

locus: GRM name of gene in the filtered gene set

S: number of Segregating sites

ThetaW: Watterson's estimate of theta (per locus)

ThetaPi: nucleotide diversity (per locus)

ThetaH: Fay and Wu (2000) estimator (per locus)

TajD: Tajima's D

seqbp: # of bp sequenced. this should be used as the denominator to calculate per bp. values of the above statistics.



The modified version of the XP-CLR code used in the paper.