Robust Variable and Interaction Selection for Logistic Regression and General Index Models

2017-11-14T16:08:58Z (GMT) by Yang Li Jun S. Liu

Under the logistic regression framework, we propose a forward-backward method, SODA, for variable selection with both main and quadratic interaction terms. In the forward stage, SODA adds in predictors that have significant overall effects, whereas in the backward stage SODA removes unimportant terms to optimize the extended Bayesian information criterion (EBIC). Compared with existing methods for variable selection in quadratic discriminant analysis, SODA can deal with high-dimensional data in which the number of predictors is much larger than the sample size and does not require the joint normality assumption on predictors, leading to much enhanced robustness. We further extend SODA to conduct variable selection and model fitting for general index models. Compared with existing variable selection methods based on the sliced inverse regression (SIR), SODA requires neither linearity nor constant variance condition and is thus more robust. Our theoretical analysis establishes the variable-selection consistency of SODA under high-dimensional settings, and our simulation studies as well as real-data applications demonstrate superior performances of SODA in dealing with non-Gaussian design matrices in both logistic and general index models. Supplementary materials for this article are available online.