Cross-Fitted Residual Regression for High-Dimensional Heteroscedasticity Pursuit

Abstract There is a vast amount of work on high-dimensional regression. The common starting point for the existing theoretical work is to assume the data generating model is a homoscedastic linear regression model with some sparsity structure. In reality the homoscedasticity assumption is often violated, and hence understanding the heteroscedasticity of the data is of critical importance. In this article we systematically study the estimation of a high-dimensional heteroscedastic regression model. In particular, the emphasis is on how to detect and estimate the heteroscedasticity effects reliably and efficiently. To this end, we propose a cross-fitted residual regression approach and prove the resulting estimator is selection consistent for heteroscedasticity effects and establish its rates of convergence. Our estimator has tuning parameters to be determined by the data in practice. We propose a novel high-dimensional BIC for tuning parameter selection and establish its consistency. This is the first high-dimensional BIC result under heteroscedasticity. The theoretical analysis is more involved in order to handle heteroscedasticity, and we develop a couple of interesting new concentration inequalities that are of independent interests.


Introduction
High-dimensional linear regression has been extensively studied during the past two decades. Many fundamental developments have been made among which sparse regularized estimation plays an essential role including the 1 penalization (the lasso) (Tibshirani 1996), the linearly constrained 1 minimization (the Dantzig selector) (Candes and Tao 2007), and folded-concave penalization (Fan and Li 2001), among others. For a comprehensive review on sparse regression, the readers are referred to Bühlmann and Van De Geer (2011), Fan et al. (2020), etc. The theories supporting the sparse regression methods typically begin with a common assumption that the data are generated from a homoscedastic linear model (see Fan et al. 2020) where i 's are independent and identically distributed model noise with mean zero and variance σ 2 . While it is convenient to consider the homoscedastic noise assumption in theory, there are real applications where this assumption is easily violated and we must consider heteroscedasticity effects in the model. For example, financial time series often exhibits nonconstant variance, due to varying volatility over the time. In genomic studies, outlying measurements can accumulate to cause heteroscedasticity, and it was shown by many studies (Wang, Wu, and Li 2012;Daye, Chen, and Li 2012) that not only the mean but also the scale of genetic responses could be affected by relevant genetic covariates. As mentioned above, most existing work on high dimensional analysis neglected the heteroscedasticity issue.
There are only a couple of articles that deal with the heteroscedasticity issue in high-dimensional regression. Sparse quantile regression (Wang, Wu, and Li 2012) generalizes the quantile regression (Koenker and Bassett 1978) by adopting folded-concave penalization for sparsity. It assumes the conditional 100τ % quantile of y i |x i is x T i β, and the sparse quantile estimator is defined bŷ where ρ τ (t) = t(τ − I {t<0} ) is the quantile check loss, and p λ (·) is some penalty function with tuning parameter λ. Wang, Wu, and Li (2012) recommended to fit several quantile functions for different τ values and then one can see the heteroscedasticity effects by comparing the fitted quantiles functions. The main idea is that under the model (1) the theoretical quantile functions should be parallel to each other. Therefore, nonparallel fitted quantile functions indicate a violation of the model (1) and hence heteroscedasticity effects. The underlying model for the sparse quantile regression approach is under which the τ quantile function is linear in x for every τ ∈ (0, 1). Due to the nondifferentiability of the check loss, sparse quantile regression can be computationally very expensive. Only recently, a fast and scalable algorithm based on ADMM was proposed and implemented in R package FHDQR (Gu et al. 2018). Moreover, sparse quantile regression does not directly show which variables are important in mean effects and which are important in the scale. To handle both issues, Gu and Zou (2016) proposed COSALES based on linear expectile regression (Newey and Powell 1987;Efron 1991): (γ ,φ) = arg min γ ,ϕ S n (γ , ϕ) + p j=1 p λ 1 (|γ j |) + p j=1 p λ 2 (|ϕ j |), i ϕ)}, and τ (u) = |τ − I(u < 0)|u 2 is the asymmetric square error loss. Theoretical justifications of COSALES were established under the model (2). COSALE is computationally very efficient because the asymmetric square error loss is convex and differentiable.
The model (2) is more or less a mathematical model for studying heteroscedacity. It may be difficult to apply model (2) in real applications. Note that the conditional scale of y i |x i can not be directly interpreted as x T i ω, but its absolute value. A general model for heteroscedasticity effects would be where f and g are some unknown functionals that belong to some function classes F 1 and F 2 , and i is model noise with mean zero. In this article, we focus on the linear version of model (3), where both f and g are modeled as linear functions of x. Indeed, low-dimensional version of this linear model has been widely considered in the literature (Feigl and Zelen 1965;Cox and Snell 1968;Cook and Weisberg 1983;Carroll and Ruppert 1988). The interpretation for model (4) is straightforward. The parameter γ * in the variance structure can be directly interpreted as follows: with other covariates being fixed, if the jth covariate increases by one unit, then the conditional standard deviation of response given covariates increases by 100(e γ j − 1)%. Note that the conditional mean of y|x is still a linear function of x. Thus, the usual sparse mean regression methods should be able to estimate β * well and recover its support under sparsity assumption on the mean effects. Let p be the dimension of covariates. It is directly implied by Fano's lemma that even we know β * , unless log p/n → 0, there is no consistent estimator of γ * . In many real applications, researchers are interested in knowing which variables are important for heteroscedasticity effects. For that, it is natural to assume γ * is sparse. The problem of estimating γ * is called high-dimensional heteroscedasticity pursuit. Quantile regression and COSALES do not work under the model (4) because the conditional quantile and expectile functions are no longer linear. Assuming the error distribution is normal, a penalized maximum likelihood estimator (MLE) was considered in Daye, Chen, and Li (2012): Note that Equation (5) is nonconvex. Daye, Chen, and Li (2012) observed that Equation (5) is bi-convex in that given γ solving β is a convex problem and given β solving γ is a convex problem. They proposed an alternating algorithm between solving β and γ , and for each iteration, they used coordinate descent to handle the computation. They implemented their method in an R package whose main function is written in Fortran. We tried their code in simulations and found the computation efficiency is too low to be practically useful: it took 9 hr to compute for n = 700, p = 5000. Moreover, no theoretical justification was given for the estimator. Note that the nonconvexity of the log-likelihood makes its analysis technically nontrivial.
In light of the above discussions, we now formally state the objective as follows: Heteroscedasticity pursuit: Can we design an explicit and efficient estimation procedure for estimating γ * in the model (4) under the sparsity assumption on γ * and log p/n → 0 without knowing the error distribution?
By "explicit" we mean that the whole procedure is clearly defined without any ambiguity such as the choice of starting value in an iterative algorithm. In this paper, we propose a novel crossfitted penalized residual regression method to estimate γ * in model (4). The method is shown to consistently estimate the heteroscedasticity effect in ultra-high dimension. From computational perspective, our algorithm is very fast since it reduces to computing several weighted Lasso-type regressions. Thus, our method can take advantage of the computational efficiency of 1 -penalized least squares (Efron et al. 2004). We also develop the first information criterion for consistent model selection under heteroscedasticity and high dimensions. The theoretical analysis of the new procedure is highly nontrivial compared to the homoscedastic regression model case. For that, we develop a couple of interesting new concentration inequalities that are of independent interests.
The rest of this article is organized as follows. In Section 2, we introduce the basic setup. In Section 3, we introduce our methodology and propose our cross-fitted penalized residual regression procedure. In Section 4, we establish the theoretical properties for our estimators. In Section 5, we introduce a highdimensional BIC for selecting tuning parameters and establish its model selection consistency. In Section 6, we present some simulation studies and a real data example to demonstrate the performance of our methodology. Some discussion is given in Section 7. The technical proofs for the theoretical results are given in appendices.

Basic Setup and Notation
We introduce some notation first. For an arbitrary index set A ⊂ {1, . . . , p}, any vector c = (c 1 , . . . , c p ) and any n × p matrix U, let c A = (c i , i ∈ A), and let U A be the submatrix with columns of U whose indices are in A. The complement of an index set A is denoted as A c = {1, . . . , p} \ A. For any finite set B, let |B| be the number of elements in B. For a vector c ∈ R p and q ∈ [1, ∞), let ||c|| q (or ||c|| q ) = ( p j=1 |c j | q ) 1 q be its q norm, let ||c|| ∞ (or ||c|| max ) = max j |c j | be its ∞ norm, and let ||c|| min = min j |c j | be its minimum absolute value. For a symmetric matrix M, let λ min (M) and λ max (M) be its smallest and largest eigenvalue, respectively. This is the common notation for eigenvalues of a matrix, and λ min , λ max should not be confused with the penalization parameter used in a penalty function. For symmetric matrices A and B, we denote A B if B − A is a positive semidefinite matrix. For any matrix G, let ||G|| = λ max (G T G) be its spectral norm. In particular, for a vector c, ||c|| = ||c|| 2 . For a, b ∈ R, let a ∧ b = min{a, b} and a ∨ b = max{a, b}. For two nonnegative sequences {a n } and {b n }, we write a n b n if there exists C > 0 such that b n ≤ Ca n for all n ≥ 1, and we write a n b n if a n b n and b n a n . Also, we use b n = o(a n ) to represent b n a n → 0. Let X = (X 1 , . . . , X p ) ∈ R n×p be the design matrix, with X j = (x 1j , . . . , x nj ) T containing observations for the jth variable, j = 1, . . . , p. The ith row of X can be written as . , x ip ) T . Let y = (y 1 , . . . , y n ) T be the n-dimensional response vector. We consider the following regression model: where β * 0 ∈ R, β * ∈ R p are unknown parameters that control the conditional mean, and γ * 0 ∈ R, γ * ∈ R p are unknown parameters that control the conditional scale; 1 , . . . , n are iid random variables that are independent of the covariates with Remark 1. We assume that E[| log | i ||] < ∞, which is satisfied by most continuous probability distributions. Without loss of generality, we can assume E[log Notice that one can always set the components of the first column of design matrix to be one, without loss of generality. We rewrite model (6) as follows, for notational convenience: where be the true support set of β * and γ * , respectively. Assume that |A 1 | = s 1 and |A 2 | = s 2 are relatively of smaller order compared to n, while p is allowed to increase exponentially with n. Penalized methods are used for sparse recovery. We consider the general folded concave penalty (Fan, Xue, and Zou 2014) in this article; namely, p λ (·) is a function defined on (−∞, ∞) satisfying: where a 1 and a 2 are two fixed positive constants. Special cases of general folded concave penalty are SCAD (Fan and Li 2001) and MCP (Zhang 2010). The SCAD penalty has the form which corresponds to a 1 = a 2 = 1. The MCP penalty function is defined as for some a > 1, which corresponds to a 1 = 1 − 1 a , a 2 = 1.

Heuristics for Penalized Residual Regression
Our goal is to produce an estimatorγ that can efficiently perform the estimation and support recovery for γ * . The mean coefficient β is a nuisance parameter in this problem, although estimation of β plays an important role in the estimation of γ * . To get some intuition, let us assume that β * is fully known, then recall from Equation (7) that , we can view this as a new linear model with {log | i |} n i=1 being iid errors. We would consider estimating γ * through minimizing 1 2n where p λ (·) is a folded concave penalty imposed on γ . To mimic the above idea, we may consider using 1 2n whereβ is some good estimator of β * . Note that y i − x T iβ is called fitted residual in regression, hence the above method for estimating γ * is called penalized residual regression.
We must show that we do have some appropriate estimator for β * such that the above procedure leads to a good estimator of γ * . Under high-dimensional setting, we also need to keep the computational efficiency of the whole procedure in mind. Thus, a natural way of estimating β * is to use penalized ordinary least squares, that is, minimizing over β ∈ R p . We show that despite the heteroscedasticity effects the penalized OLS estimator of β * still enjoys nice rates of convergence.

Cross-Fitted Penalized Residual Regression
It is very clear that the penalized residual regression is computationally very efficient. To handle the theoretical justification of the method, we need to carefully take care of the dependence betweenβ and data which causes technical challenges in the analysis of penalized residual regression. We further propose using cross-fitted penalized residual regression to estimate the heteroscedasticity effect γ * . The whole procedure is summarized in Figure 1. The cross-fitted technique allows us to handle a broad class of error distributions. Cross-fitting has been used for variance estimation in homoscedastic high dimensional linear regression in Fan, Guo, and Hao (2012) who showed that in sparse linear regression, the usual mean squared error estimator underestimates the true variance, and they proposed the so-called refitted cross-validation technique, which was shown to consistently estimate the variance. The idea was extended to estimating constant variance in high-dimensional sparse additive model in Chen, Fan, and Li (2018). To our best knowledge, this work is the first to extend cross-fitting to heteroscedastic models.
Remark 2. Empirical experiments suggest that the penalized residual regression without cross-fitting may also work well under certain distributions. However, penalized residual regression without cross-fitting has no rigorous theoretical justification under general error distributions sofar. Therefore, we do not present it in the present paper.
Next, we explain in detail the steps (O, R) in Figure 1.

Penalized OLS (O) in Figure 1
We take penalized OLS on Z (1) as an example (i.e., O 1 ) and illustrate how we get the initial estimatorβ(Z (1) ).β(Z (2) ) is computed through exactly the same procedure on a different dataset (with different tuning parameters), as indicated from Figure 1. We use folded-concave penalty instead of lasso penalty for estimating β * in penalized OLS, because the latter induces some unnecessary bias which is propagated into penalized residual regression. We use with p λ (·) being a folded concave penalty and λ > 0 being the tuning parameter. Following Fan, Xue, and Zou (2014), we use the local linear approximation (LLA) algorithm (Zou and Li 2008) for solving (12). We adopt zero vector as the initial value for the LLA algorithm.β(Z (1) ) is defined as the solution that the LLA algorithm gives after convergence. Similarly, we can definê β(Z (2) ) and let the corresponding tuning parameters beλ.

Penalized Residual Regression (R) in Figure 1
We take the penalized residual regression on Z (2) as an example (i.e., R 2 ) and explain how we getγ ( From the heuristics that we have provided, we consider minimizing over γ ∈ R p . Here, λ 1 is some tuning parameter that could differ from the previous ones. Again, we adopt LLA algorithm for solving (13), and we use zero vector as the initial value for the LLA algorithm. Theγ (Z (1) → Z (2) ) is defined as the solution we get after the LLA algorithm converges. Similarly, we can defineγ (Z (2) → Z (1) ) and letλ 1 be the corresponding tuning parameters. The final estimator of γ * , as can be seen from Figure 1, isγ ave =γ (Z (1) →Z (2) )+γ (Z (2) →Z (1) ) 2 . The nonzero elements ofγ ave form the estimated subset of variables that impact the scale function.

Theory
In this section we provide the theoretical justifications for the estimators.
We make the following assumption for the distribution of i 's: Remark 3. The assumption of sub-Gaussian random variable is common in the literature on high dimensional statistics. The condition E[log | i |] = 0 has been justified in Remark 1.
The assumption of error having Lipschitz continuous density is satisfied by most continuous probability distributions.

Consistency of BIC Tuning
Theorem 1 shows that the proposed estimator is good in principle. In practice, we also need to specify the tuning parameters used in the procedure. For homoscedastic regression models, the commonly used tuning methods include cross-validation and information criteria. However, the theory for CV or information criteria is not well understood under heteroscedastic regression models. In this section, we propose a new BIC for selecting the tuning parameters in our estimator and prove its selection consistency. Note that the study on high dimensional BIC has been reported in several papers such as Wang, Kim, and Li (2013) and Fan and Tang (2013) where the underlying model is the standard homoscedastic regression model. To the best of our knowledge, our theory of BIC for heteroscedastic regression is the first in the literature.
Under the conditions of Theorem 1, assume that there exists a positive constant c 0 such that for i = 1, 2, where P (i) A is the projection matrix onto the column space of X (i) Remark 4. The conditions (15) is the asymptotic model identifiability condition used in Wang, Kim, and Li (2013). Proposition 1 is for the selection consistency of BIC for mean regression under heteroscedasticity. As expected, it is similar to the previous results on BIC for mean regression under homoscedasticity. We only need to bound the influences due to heteroscedasticity, and the rest proof of Proposition 1 are similar to Wang, Kim, and Li (2013). Thus, we omit its proof for the sake of space.
Proposition 1 is an intermediate step in our BIC theory because our goal is to develop BIC for model selection with regard to the heteroscedasticity pursuit problem. We need to first construct a BIC for our estimator of γ * and then prove its selection consistency. To gain some intuition, suppose the true β * is known, then recall we have log |y i − x T i β * | = x T i γ * + log | i |, which can be treated as a homoscedastic regression. The BIC for the homoscedastic case can be applied. However, β * is unknown, so we can use its estimators to replace it. Following our cross-fitted procedure in Figure 1, we define the HBIC in R 1 and R 2 as (3) is the HBIC (2) tuned estimator from step O 2 in Figure 1. Likewise,β(Z (1) ) in HBIC (4) is the HBIC (1) tuned estimator from step O 1 in Figure 1. The tuning parameters in R 1 and R 2 are then chosen by minimizing HBIC (3) and HBIC (4) , respectively. The choice of C (i) n,p , i = 3, 4 is discussed in Theorem 2 in order to achieve model selection consistency.
where P (i) A is the projection matrix onto the column space of X (i) A , i = 1, 2. Also assume that φ := min i∈{1,2} min |A|≤K n λ min ( Remark 5. Similar to Equation (15), Equation (16) is the asymptotic model identifiability condition for the scale effects.

Numeric Results
We demonstrate the sparsity recovery and estimation accuracy of our procedure through several simulation examples and a real data example. For the nonconvex penalty function, we use the SCAD penalty.
ρ = 0.5; y = 6x 25 + e 0.5x 1 +0.5x 12 +0.5x 25 +0.5x 46 , where We used the first n 2 rows of the data in O 1 and R 1 steps in Figure 1 and the remaining n 2 rows of the data in O 2 and R 2 steps in Figure 1. We used 0 as the initial values for all the LLA algorithms involved in our procedure. The tuning parameters were selected by using the high-dimensional BIC method as shown in Section 5. We let C (i) n,p = log log n, i = 1, 2, 3, 4. Letβ ave be the average of two estimators of β * from O 1 and O 2 . LetÂ 1 = {j :β ave j = 0},Â 2 = {j :γ ave j = 0} be our selected submodel. After model selection, we refit γ * by the same cross-fitted residual regression on the selected submodel without using any penalty. This extra step does not change model selection results but may further reduce estimation bias due to penalization, which is related to the relaxed lasso (Meinshausen 2007;Hastie, Tibshirani, and Tibshirani 2017).
• CP: coverage probability, which is the proportion of replicates where A 2 ⊂Â 2 , A 2 = {j :γ * j = 0} is the active set of γ * , andÂ 2 = {j :γ j = 0} is the active set ofγ . • TP: true positives, which is the average size of the set A 2 ∩Â 2 . • FP: false positives, which is the average size of the set A c 2 ∩Â 2 .
In general, estimating the scale function is different from estimating the mean function. In our model, the error in estimating β * can be transferred into the estimation of γ * . To highlight the impact of estimating β * in the estimation of γ * , we consider an idealized estimators of γ * , denoted byγ oracle , by assuming the true support set of β * is known in our estimation procedure. In other words, we replace the estimators of β * in our original procedure by their corresponding oracle estimators of β * . This estimator,γ oracle , uses the information of the support set of β * , which is not available in practice. Its performance provides a benchmark for comparing an actual estimator. Note that this estimator avoids high-dimensionality in estimating β * .
The results for the idealized estimator (γ oracle ) and our actual estimator (γ ) are summarized in Table 1 and Table 2. We first discuss the selection results. We can see thatγ can cover the true NOTES: The sparsity recovery performance is measured by the last three columns. The ideal selection result would be CP = 1, TP = 1, FP = 0 for example 1-2, and CP = 1, TP = 2, FP = 0 for Examples 3-5 (shown in "Truth" rows). The standard errors listed in the parentheses are based on 400 replications. We can see that even under highly correlated design matrix (ρ = 0.85), the selection performance of our method is only slightly affected. Thus, in terms of variable selection, our proposed method is almost perfect. This confirms our theoretical results for the selection consistency of the proposed method. As for estimation accuracy, Table 1 and Table 2 show that the estimation error ofγ is very close to the error ofγ oracle in all examples. This comparison demonstrates that our actual estimatorγ is almost the same as the idealized estimatorγ oracle , which also confirms our theory. Moreover, it suggests that there is little room for improvement for the estimation performance of our estimator. Finally, we can see that our method works very well in the cases where the error does not follow normal distribution. This is consistent with the fact that our theory does not require the error distribution to be normal.
Remark 6. The implemented HHR algorithm in Daye, Chen, and Li (2012) was initially considered as a competing method of estimating γ * . However, their algorithm costs over 9 hours for one replication. So we did not present their results here.

A Real Data Example
In this section, we apply our procedure to a microarray dataset used in Scheetz et al. (2006). The dataset consists of summary gene expression values of 18,986 probe sets on 120 twelve-weekold male offspring rats, which were analyzed on a logarithmic scale. The goal is to find the genes whose expression share strong association with the expression of gene TRIM32 (corresponding to probe 1389163_at). This gene is found to cause Bardet-Biedl syndrome (Chiang et al. 2006), which is a human genetic disorder that affects many body systems including the retina. We apply the fused Kolmogorov filter (Mai and Zou 2015) to obtain a reduced set of 300 probes. The 300 probes are standardized so that they have mean zero and standard deviation one. The key motivation for using variable screening is to remove noise variables and boost the signal-to-noise ratio in data, which in practice can benefit the performance of statistical methods in further analysis. Many methods for variable screening has been proposed in the literature. The reader is referred to (Fan et al. 2020, chap. 8) for a comprehensive review of this topic. Here, we choose the fused Kolmogorov filter as a part of our procedure because it is a nonparametric model-free method and its strong theoretical guarantee and excellent empirical performance (Mai and Zou 2015). We first compare our method with the standard penalized least square developed for homoscedastic regression. We randomly split the data to generate a training set with 100 samples, and the rest were treated as testing data on which we can compute the prediction error. The whole process was repeated 100 times. Table 3 summarizes the prediction errors as well as variable selection outcomes with the standard errors listed in the parentheses. For our method, we can list the number of variables selected for the mean (|Â 1 |) and the scale (|Â 2 |), as well as the number of variables that affect both mean and scale (|Â 1 ∩Â 2 |). For SCAD penalized least square |Â 2 | is non-applicable. The  values along with their standard errors are reported in the last three columns of Table 3. We see that our method provides more accurate prediction, implying that the loss of prediction power of SCAD penalized least squares comes from the ignorance of possible heteroscedasticity of the data. We further examine the fitted model by our method. We fit our method on the whole dataset. In Table 4, we report the probes selected and their corresponding estimated coefficients, respectively. It is worth noting that probe X1396130_at is found to be related to both the mean and the scale.

Discussion
In this article we have proposed a log-linear model for modeling the heteroscedasticity in a high-dimensional regression model and have developed a cross-fitted penalized residual regression for estimating the heteroscedasticity effects with strong theoretical guarantee and excellent empirical performance. There are other ways to model heteroscedasticity. For example, it is known that the mixed-effect models can also be used to model heteroscedasticity. A high-dimensional linear mixed effects model can be written as follows are iid random errors with mean zero and variance σ 2 , β is the unknown fixed effects, {γ i } n i=1 are iid unknown random effects with mean zero and covariance matrix G, and {γ i } n i=1 are independent from { i } n i=1 . G is a q × q covariance matrix, which is typically unknown. In high dimensions, p and q are allowed to grow exponentially with n, and sparsity in both fixed effects and random effects is typically assumed. Under these settings, the conditional mean and variance of response given covariates are E[y i |x i , z i ] = x T i β * and var(y i |x i , z i ) = z T i Gz i + σ 2 . Therefore, the high-dimensional linear mixed effects model also allows for high-dimensional heteroscedasticity. The scale function is z T i Gz i + σ 2 or the log-scale function is 1 2 (z T i Gz i + σ 2 ). In our model the log-scale function is modeled as x T i γ * . As mentioned before, it is natural to consider a linear model for the log-scale function, which has been considered in the literature (Feigl and Zelen 1965;Cox and Snell 1968;Cook and Weisberg 1983;Carroll and Ruppert 1988;Daye, Chen, and Li 2012). For a positive univariate quantity, it is often more natural to consider its multiplicative effects, which can be well handled by logarithm transformation (Cleveland 1993).
Mathematically, it is hard to tell which model for the log-scale function is the best. We notice that existing popular works on high-dimensional mixed-effects model all focused on the mean function part and variable selection in mixed effects. It is shown that without knowing the true G matrix, such goals can be achieved (Fan and Li 2012). On the other hand, the estimation of heteroscedasticity in the linear mixed-effects model requires the estimation of the true G matrix, which seems to be very much understudied in the high-dimensional literature. It would be an important topic for future study if one uses the linear mixedeffects model for heteroscedasticity.