figshare
Browse
1/1
7 files

Genotypes, Phenotypes and R code for genomic analysis of E. polybractea progeny trial population

Download all (287.85 MB) This item is shared privately
dataset
modified on 2018-03-16, 18:40
INSTRUCTIONS

To run the genomic prediction, do the following:

1. Download all files in this repository to one directory. unzip anything archived.
2. Open the R code in Rstudio. You will need several prerequisite libraries (dplyr, magrittr, cpgen, BLUPGA, RRblup)
3. run the code interactively (i.e. line by line, chunk by chunk). The results will be generated in a dataframe called 'df'



thunder.filtered.whole.culled.MAF02.VIF50.468.gds

This file contains 2.38m filtered SNPs with MAF > 0.02 for 468 trees, used for GWAS and genomic prediction. The GDS format is a highly compressed representation of additive genotypes and is easily read in R with the SNPRelate or GWASTools packages.

FT2_GWAS_pheno.pub.tsv

Tab-delimited file contains the matrix of phenotypes for the 468 trees. There are more phenotypes here than were used in the genomic prediction or GWAS.

CV_valsets.zip

This archive contains 10 txt files. Each txt file has 6 columns of 78 values. These are the individuals used for the validation sets during 6-fold cross validation. i.e. 78 individuals are used for each CV fold, and the whole CV process is performed 10 times with different sets of folds.
You need to unzip this file to work with the R script!

SNP ID files

These are three files containing the SNP IDs (actually indices within the full 2.38m SNP set) that identify which SNPs are to be used for 10k, 100k and 500k analyses.

GenomicSelection.pub.R

This code will run ABLUP, BayesB, and BLUP|GA (which includes GBLUP as a base case). It is designed to run multithreaded on Windows (one phenotype per thread).

Funding

Australian Research Council Linkage LP110100184