Estimation for High-Dimensional Multivariate Linear Mixed Models in Structured Populations

Complex traits are thought to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effect models (LMM) can account for correlations due to relatedness but are not applicable in high-dimensional (HD) settings where the number of predictors greatly exceeds the number of samples. False negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects’ relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM framework that simultaneously selects and estimates variables for structured populations in one step. Our method can accommodate several sparsity inducing penalties such as the lasso and elastic net, and also readily handles prior annotation information in the form of weights. Our algorithm is computationally efficient, scales to HD settings and we mathematically prove that it converges to a stationary point. Through simulations we show that when there are several correlated causal variants with small effects, our method has better power over the two-stage approach. We apply our method to identify SNPs that predict blood pressure in 20 large Mexican American pedigrees from the Genetic Analysis Workshop 18 data. This approach can also be used to generate genetic risk scores that can be useful for risk stratification and clinical decision making. Our algorithms are available in an R package (