Post-Selection Inference for Generalized Linear Models With Many Controls

<p>This article considers generalized linear models in the presence of many controls. We lay out a general methodology to estimate an effect of interest based on the construction of an instrument that immunizes against model selection mistakes and apply it to the case of logistic binary choice model. More specifically we propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest α<sub>0</sub>, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. These methods allow to estimate α<sub>0</sub> at the root-<i>n</i> rate when the total number <i>p</i> of other regressors, called controls, potentially exceeds the sample size <i>n</i> using sparsity assumptions. The sparsity assumption means that there is a subset of <i>s</i> < <i>n</i> controls, which suffices to accurately approximate the nuisance part of the regression function. Importantly, the estimators and these resulting confidence regions are valid uniformly over <i>s</i>-sparse models satisfying <i>s</i><sup>2</sup>log <sup>2</sup><i>p</i> = <i>o</i>(<i>n</i>) and other technical conditions. These procedures do not rely on traditional consistent model selection arguments for their validity. In fact, they are robust with respect to moderate model selection mistakes in variable selection. Under suitable conditions, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this article.</p>