Empirical Comparisons of Different Statistical Models to Identify and Validate Kernel Row Number-Associated Variants from Structured Multiparent Mapping Populations of Maize
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Advances in next generation sequencing technologies and statistical approaches enable genome-wide dissection of phenotypic traits via genome-wide association studies (GWAS). Although multiple statistical approaches for conducting GWAS are available, the power and false discovery rates of many approaches have been mostly tested using simulated data. Empirical comparisons of single variant (SV) and multi-variant maize (MV) GWAS approaches have not been conducted to test if a single approach or a combination of SV and Bayesian MV is effective, through identification and cross-validation of trait associated loci. In this study, kernel row MPP number (KRN) data were collected from a set of 6,230 entries derived from the Nested Association Mapping (NAM) population and related populations. Three different types of GWAS analyses were performed: 1) single-variant (SV), 2) stepwise regression (STR) and 3) a Bayesian-based multi-variant (MV) models. Using SV, STR, and MV models, 257, 300, and 442 KRN-associated variants (KAVs) were identified in the initial GWAS analyses. Of these, 231 KAVs were subjected to genetic validation using three unrelated populations that were not included in the initial GWAS. Genetic validation results suggest that the three GWAS approaches are complementary. Interestingly, KAVs in low recombination regions were more likely to exhibit associations in independent populations than KAVs in recombinationally active regions, probably as a consequence of linkage disequilibrium. The KAVs identified in this study have the potential to enhance our understanding of the developmental steps involved in ear development.