figshare
Browse
GLM_Inference_MakingOnlineAppendix_A_to_E.pdf (339.45 kB)

Post-Selection Inference for Generalized Linear Models With Many Controls

Download (540.84 kB)
Version 2 2016-09-15, 07:58
Version 1 2016-03-22, 21:25
journal contribution
posted on 2016-09-15, 07:58 authored by Alexandre Belloni, Victor Chernozhukov, Ying Wei

This article considers generalized linear models in the presence of many controls. We lay out a general methodology to estimate an effect of interest based on the construction of an instrument that immunizes against model selection mistakes and apply it to the case of logistic binary choice model. More specifically we propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest α0, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. These methods allow to estimate α0 at the root-n rate when the total number p of other regressors, called controls, potentially exceeds the sample size n using sparsity assumptions. The sparsity assumption means that there is a subset of s < n controls, which suffices to accurately approximate the nuisance part of the regression function. Importantly, the estimators and these resulting confidence regions are valid uniformly over s-sparse models satisfying s2log 2p = o(n) and other technical conditions. These procedures do not rely on traditional consistent model selection arguments for their validity. In fact, they are robust with respect to moderate model selection mistakes in variable selection. Under suitable conditions, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this article.

History