Two-stage genome-wide search for epistasis with implementation to Recombinant Inbred Lines (RIL) populations
Gene expression data (filter.genes.csv):
Gene expression Arabidopsis thaliana data was downloaded from the TAIR database website (ftp://ftp.arabidopsis.org/home/tair/Microarrays/analyzed_data/affy_data_1436_10132005.zip). A sample of 211 RIL population individuals derived from a cross between two inbred accessions, Bayreuth-0 (Bay-0) and Shahdara (Sha), was used. Transcript (mRNA) levels were quantified using Affymetrix whole-genome microarrays with two replications (arrays) for each individual. The gene expression data was preprocessed by the Variance Stabilization Normalization method [31]. Traits with essentially no expression (15,566 traits) were filtered out by the EM algorithm for a mixture of univariate normal distributions (using the ‘mixtools’ R package), leaving 7244 traits for the analysis.
Genetic map (genetic_map.csv):
The Arabidopsis Thaliana genetic map consists of 579 molecular markers. It was originally obtained from http://elp.ucdavis.edu/data/analysis/211_RILs_SFP_map/ 211_RILs_SFP_map.html. The genotypic data was preprocessed using the MultiPoint software (http://www.multiqtl.com) for the purpose of eliminating non-informative overlapping markers and those markers that cause local neighborhood instability in the map [30]. In total, 493 markers remained for the analysis.