Post-selection Inference of High-dimensional Logistic Regression Under Case–Control Design

Abstract Confidence sets are of key importance in high-dimensional statistical inference. Under case–control study, a popular response-selective sampling design in medical study or econometrics, we consider the confidence intervals and statistical tests for single or low-dimensional parameters in high-dimensional logistic regression model. The asymptotic properties of the resulting estimators are established under mild conditions. We also study statistical tests for testing more general and complex hypotheses of the high-dimensional parameters. The general testing procedures are proved to be asymptotically exact and have satisfactory power. Numerical studies including extensive simulations and a real data example confirm that the proposed method performs well in practical settings.


Introduction
High-dimensional data, in which the number of regressors can be substantially larger than the sample size, have been actively studied in various scientific applications such as signal processing, econometrics, recommender systems and genomics. State-of-the-art statistical methodologies have been developed to high-dimensional regression; see Greenshtein and Ritov (2004), Meinshausen and Bühlmann (2006), Zhao and Yu (2006), Candés and Tao (2007), Fan, and Lv (2008), Huang, Ma, and Zhang (2008), Zhang and Huang (2008), Bickel, Ritov, and Tsybakov (2009), Huang, Horowitz, and Ma (2010), Fan and Lv (2010), Fan and Lv (2011), Raskutti, Wainwright, and Yu (2011), van de Geer (2011), andNegahban et al. (2012), among many others. It is known that variable selection methods and their associated theories mostly focus on point estimation. Statistical inference based on the selected model could be inaccurate if satisfactory model recovery cannot be delivered in the variable selection procedure.
Notably, interval estimation and hypothesis testing play a fundamental role in statistical inference for high-dimensional regression, and are largely untouched until the pioneering works of Zhang and Zhang (2014), van de Geer et al. (2014), Javanmard and Montanari (2014). In particular, Zhang and Zhang (2014) introduced the idea of de-biasing for constructing valid confidence intervals for single coordinate with the scaled lasso as the initial estimator. van de Geer et al. (2014) and Javanmard and Montanari (2014) considered to construct confidence intervals and tests for single or low-dimensional parameters by adjusting the bias caused by the lasso regularization. Zhang  Cheng (2017) studied a Gaussian multiplier bootstrap debiasing method for simultaneous testing in high-dimensional linear models. Important findings on another inference approach based on moment functions that satisfy Neyman's orthogonalization condition can be found in Kato (2015, 2019), Belloni, Chernozhukov, and Wei (2016), and Belloni, Chernozhukov, and Kaul (2017) for high-dimensional homoscedastic median regression, generalized linear models, linear models with error-in-variables and heteroscedastic quantile regression. Recently, Zhu and Bradic (2017) developed a novel method for testing general high-dimensional hypotheses, such as the sparsity level and the minimum signal strength of the regression parameters. Javanmard and Lee (2020) proposed a flexible framework for testing general hypotheses in highdimensional linear regression. All the aforementioned methods have been examined to be effective for constructing valid confidence intervals and hypothesis tests with prospective samples or random samples of the underlying population.
In prospective studies, a random sample of individuals is followed, and their respective outcome variables are observed and recorded. Contrary to prospective studies, in many medical studies or epidemiology, case-control study is a primary tool for studying the relationship of existing factors and rare disease incidence, by taking samples separately from the control population and the case population (Chen and Lo 1999;Chen 2001). One may over sample the minority class (case) and under sample the majority class (control) for more informative samples by the case-control sampling scheme. In econometrics, people are interested in the relationship of the predictors and the choices made by individual. There might be that one or some choices are rarely chosen, but are of particular interest. A random sample, unless of very large sample size, will contain very few subjects making those choices, and thus, results in poor estimates of the parameters of interest (Wang 2019). It would be more effective to divide the population into subsets/strata, each constitutes of subjects who made that particular choice. A choice-based sampling data is to take independent samples from each stratum (Manski and McFadden 1981). When the number of the predictors is small relative to the sample size, statistical analysis of case-control sampling has been studied in Anderson (1972), Prentice and Pyke (1979), , Chen and Lo (1999), Chen (2001), Fithian and Hastie (2014), Liu, Jiang, and Zhou (2014), Chen et al. (2017), Qin (2017), and Borgan et al. (2018). Recently, variable screening for high-dimensional categorical data under case-control design is studied in Xie et al. (2020).
In this article, under case-control design, we complement to the literature by proposing an inference procedure for single or low-dimensional components of the regression parameters in the high-dimensional logistic regression; we further develop a procedure for testing general and complex hypotheses of the high-dimensional regression parameters in logistic regression. Our method rests on a celebrated result for logistic regression under case-control study that, the prospective estimating equation derived from the maximum likelihood estimation is valid for a consistent estimator of the slope parameter, except for the intercept term . We develop a onestep procedure with the lasso estimator as an initial estimator. The proposed method is simple, easy to implement and computationally efficient. Under mild regularity conditions, we prove that the proposed de-biased estimator is asymptotically normal, which enables us to conduct confidence intervals and hypothesis testing for high-dimensional logistic regression under case-control studies. Moreover, under slightly stronger sparsity assumption, the general testing procedure for the highdimensional parameter is shown to be asymptotically exact and possesses satisfactory power.
The rest of this article is organized as follows. In Section 2, we present the model, the proposed de-biasing procedure and the general testing methodology. In Section 3, we study the theoretical properties. We evaluate the performance of the proposed procedure through extensive simulation studies in Section 4 and a real data example in Section 5. Technical details are deferred to the supplementary materials.

Model and Methodologies
Logistic regression (Cox 1958) is probably the most widely-used statistical tool to model categorical outcome variable, and it has been popular in biomedical science for several decades for the study of the effect of certain exposure to possible disease. Without loss of generality, we focus on binary response Y typically coded as 0/1, that is, Y = 0 represents the controls (nondisease) and Y = 1 represents the cases (disease). Consider a highdimensional logistic regression model where X = (X 1 , . . . , X p n ) is a p n -vector of explanatory variables, α is an intercept term and β = (β 1 , . . . , β p n ) is a vector of slope parameters. Let θ 0 = (α 0 , β 0 ) be the true value of θ = (α, β ) . Let f 1 (x) be the density function of the case population, that is, the conditional density function of X given Y = 1, and f 0 (x) be the density function of the control population, that is, the conditional density function of X given Y = 0. A casecontrol study is conducted by taking a random sample of n 1 cases, denoted by X 1 1 , . . . , X 1 n 1 , from the case population, and a random sample of n 0 , denoted by X 0 1 , . . . , X 0 n 0 , from the control population. Notice that n 1 and n 0 are prespecified and nonrandom in case-control studies. Thus, the observations consist of with sample size n = n 0 + n 1 . Throughout the article, the dimensionality p n can be larger than the sample size n. The main goal of this article is to construct valid inference for individual coordinate β 0j , j = 1, . . . , p n under case-control studies.
Remark 1. Our framework is different from the classical setting with iid samples, that is, (X i , Y i ) are iid pairs of covariates and outcome variable from model (2.1). Under the case-control sampling scheme in (2.2), no iid samples from model (2.1) are available. Moreover, the common simple random sampling (prospective sampling) ensures that the joint distribution of the samples (X i , Y i ), i = 1, . . . , n is the same as that of (X, Y) in model (2.1). However, under case-control sampling, the distribution of the covariate of cases X 1 (the covariate of controls X 0 ) is the same as the conditional distribution of X given Y = 1 (Y = 0) in the population. And the joint distribution of X 1 and X 0 are generally different from that of X in the population, unless n 1 /n (n 0 /n) coincides with the proportion of cases (controls) in the population.

A De-biased Lasso Estimator
Let E l (·) be the conditional expectation of X given Y = l, l = 0, 1, respectively. Suppose that n l /n → ρ l as n → ∞, l = 0, 1. Write Z = (1, X ) . An important finding on case-control logistic regression is that, the prospective estimating equation derived from the maximum likelihood estimation is still valid for a consistent estimator of the regression parameter β, except for the intercept term α (Prentice and Pyke 1979;. We present this finding in the following proposition.
are the proportions of controls and cases in the population, respectively. Then, b 0 is the solution to Note thatα 0 = α 0 , as W 1 and W 0 are generally different from the proportions n 1 /n and n 0 /n in case-control design. Proposition 1 indicates that the case-control sampling design only leads to biased estimation of the intercept α 0 , but not the slope parameter vector in logistic regression. With this view, we shall be able to conduct post-selection inference for the highdimensional slope parameters β. And the theoretical results in Section 3 are pertaining to b 0 , but not θ 0 .
Hence, for the high-dimensional logistic regression under case-control studies, we first consider to minimize the following penalized loss function Nevertheless, it is known that the lasso estimator is not rootn consistent and thus, does not have a tractable limiting distribution (Zhao and Yu 2006;Bühlmann and van de Geer 2011). To circumvent the problem, a novel idea to improve the initial estimator is to find a root that is closed to the solution toφ n (θ ) = 0 and asymptotically behaves like the oracle estimator. We then define the population version of the (p n + 1) × (p n + 1) Hessian matrix of φ n (θ ) as (2.4) Thus, a de-biased estimator based onβ init j by subtracting a term related to the subgradient of the loss function, evaluated at the estimatesθ init for β j , j = 1, . . . , p n , is given bỹ where −1 j+1 is the (j + 1)th column of −1 .

Nodewise Lasso Regression
Actually, −1 is usually unknown and has to be properly estimated. According to the definition of , a plug-in estimator of can be For notational convenience, we further write .
is often rank deficient and it is not invertible in high-dimensional case. To find a proper estimator of the inverse of , we use the nodewise regression (Meinshausen and Bühlmann 2006;van de Geer et al. 2014) by projecting each column of WZ on all its remaining columns. To be specific, a nodewise lasso estimator of −1 j+1 , j = 1, . . . , p n , is defined aŝ where || · || 2 is the Euclidean norm,Z j+1 is the (j + 1)th column ofZ ,Z −(j+1) is a submatrix of the design matrixZ with the (j + 1)th column deleted, and λ j is a tuning parameter which can be different from λ. Consequently, we define our proposed de-biased estimator of β j , j = 1, . . . , p n , aŝ ) . (2.5) The theoretical properties of the proposed de-biased estimator are presented in Section 3.

A General Test with Multiplier Bootstrap
In view of the state of the art in high-dimensional statistics, our next goal is to develop a valid procedure for testing the following general composite null hypothesis of the high-dimensional slope parameter β 0 under case-control design: for a given closed set B 0 ⊂ R p n . Remarkably, no assumptions on, for example, the convexity or the dimensionality, are made on the set B 0 . In other words, the size of B 0 can grow with the dimensionality p n . Such a null hypothesis is general enough to cover two important cases: is the support set, and k 1 , k 2 are prespecified nonnegative constants. The former is for testing the sparsity level, while the latter is for testing the minimum signal strength.
Motivated by Zhu and Bradic (2017), we use the Hausdorff distance based on the l 1 -norm, that is, d(B 0 , β 0 ) = min β∈B 0 ||β − β 0 || 1 , to measure the deviations from the null Nevertheless, b 0 is typically unknown. One may replace b 0 by the initial valueθ init given in Section 2.1, which leads to the so-called projection pursuit estimator; see Huber (1985) and Friedman (1987). That is (2.7) By the definition, ||θ −θ init || 1 can be regarded as an estimator for the Hausdorff distance d(B 0 , b 0 ). Next, we are in a position to introduce a test statistic based on the estimated Hausdorff distance so that larger magnitude of the value indicates larger discrepancy between the null hypothesis and the alternative Nonetheless, such a test would be theoretically challenging and practically infeasible when the dimensionality of the parameter p n n, as the regularization in the estimation cannot induce a tractable limiting distribution. Following Zhu and Bradic (2017), we consider the following test statistic whereδ ≡ (δ 1 , . . . ,δ p n ) ∈ R p n ×1 is a data-dependent vector to control the bias of the projection residuals. Heuristically, β init −β −δ approximately has zero mean under the null hypothesis H 0 . To avoid mathematical challenges arising from the reuse of the samples, the idea of sample splitting is adopted for estimatingδ. The full case-control samples )×p n are the nodewise lasso estimators of −1 based on the dataset G 1 and G 2 with the first column deleted, respectively.
Due to highly sophisticated dependencies in high-dimensions and the flexibility in the null set B 0 , it is hard to obtain an asymptotic pivotal distribution of T n . To circumvent the difficulty, we introduce an efficient multiplier bootstrap procedure to obtain a data-driven critical value of the test statistic T n ; see Zhang and Cheng (2017), andKato (2013, 2017). To this end, we define vectors S 1 , . . . , S n ∈ R p n ×1 , Then, the bootstrapped test statistic is defined as It can be shown that the asymptotic distribution of T n can be well approximated by the resampling distribution of T n while fixing the observations. Let μ be the nominal level. We repeatedly generate iid standard normal ran-

Asymptotic Properties
Some notations are needed. Define M = {j|β 0j = 0, j = 1, . . . , p n } and s 0 = p n j=1 I(j ∈ M). Let min (A) and max (A) be the smallest and largest eigenvalues of matrix A, respectively. For a p n -vector a = (a 1 , . . . , a p n ) , ||a|| 0 = p n j=1 I(a j = 0), ||a|| r = ( p n j=1 a r j ) 1/r for r ≥ 1, and ||a|| ∞ = max j=1,...,p n |a j |. For a random variable X, we define its sub-Gaussian norm as ||X|| ψ 2 = sup q≥1 q −1/2 (E|X| q ) 1/q ; For a random vector X, its sub-Gaussian norm is defined as ||X|| ψ 2 = sup ||v|| 2 =1 ||v X|| ψ 2 . We say X is sub-Gaussian if its sub-Gaussian norm ||X|| ψ 2 is bounded. For two sequences a n and b n , a n = o(b n ) if lim n→∞ a n /b n = 0, and a n = O(b n ) if |a n | ≤ Cb n for some constant C ≥ 0 independent of n, and a n b n if a n = O(b n ) and b n = O(a n ).
Recall that the design matrixZ = (Z 0 ,Z 1 ) withZ 0 = (Z 0 1 , . . . , Z 0 n 0 ) andZ 1 = (Z 1 1 , . . . , Z 1 n 1 ), ρ l = lim n→∞ n l /n, l = 0, 1. LetX = (X 0 1 , . . . , X 0 n 0 , X 1 1 , . . . , X 1 n 1 ) denote the design matrix without intercept. The following conditions are needed to establish the asymptotic properties: Condition (C1). There exist two positive constants c 1 and c 2 such that 0 < c 1 ≤ min Condition (C2). The rows ofZ are either sub-Gaussian or bounded, that is, max 1≤i≤n ||Z i || ψ 2 = O(1) or max 1≤i≤n ||Z i || ∞ = O(1). For both designs, assume that |α 0 | ≤ m and ||X β 0 || ∞ ≤ m for some constant m. In addition, 0 < ν l,min ≤ Condition ( is a regular condition under case-control design, which requires that the limiting user-specified proportion of Y = 1 and Y = 0 is not close to 0 or 1. To establish the nonasymptotic error bounds and asymptotic normality in high-dimensional case, we also require the convergence rate of n 0 /n is not too slow. This can be easily satisfied as n 0 , n, ρ 0 are user-specified quantities under case-control sampling. Condition (C2) assumes that the predictors follow some bounded distribution or sub-Gaussian distribution, and the smallest and largest eigenvalues of the corresponding covariance matrix are bounded away from 0 and ∞, respectively. Moreover, we require the boundedness of |α 0 | and ||X β 0 || ∞ . Such an assumption is regular for high-dimensional models (van de Geer et al. 2014;Battey et al. 2018), and it can be shown later that the boundedness condition still holds in the rare event case. Condition (C3) is a common condition on the precision matrix −1 . Condition (C4) is the sparsity conditions for β 0 and the precision matrix. Comparisons between Condition (C4) and those sparsity conditions in the existing literature are given in Remark 2. Condition (C5) is needed to ensure good behavior of the matrixZ −1 . Similar condition could be found in van de Geer et al. (2014). It is worth noting that, under the sub-Gaussian design, Condition (C5) could be derived directly from Conditions (C2)-(C3).
Condition (C2) requires the boundedness of |α 0 | and ||X β 0 || ∞ . Recall thatα 0 = α 0 + log{ρ 1 W 0 /(ρ 0 W 1 )}. In many existing studies, the population proportion of cases (controls) W 1 (W 0 ) is often regarded as free parameter independent of the sample size. For the rare-event case that W 1 → 0, following Wang (2020), a basic assumption for modeling rare event in logistic regression is that β 0 is fixed while α 0 → −∞ at certain rate such that W 1 → 0. Under mild conditions, the following proposition shows thatα 0 is still bounded in the rare event case.
Proposition 2 (Rare event). Consider the rare-event case that W 1 → 0 and α 0 → −∞. Assume that Condition (C1) holds, and X, a p n -vector of explanatory variables defined in (2.1), satisfies |X β 0 | < C for some constant C almost surely. Then, The proof of Proposition 2 is given in the supplementary materials. Proposition 2 indicates that Condition (C2) still holds in the rare-event case considered in Wang (2020). In other words, in the rare-event case, the assumptions that |α 0 | ≤ m in Condition (C2) could be replaced by the conditions in Proposition 2.
Theorem 1 gives the upper bounds and the convergence rates of the initial estimatorθ init in both l 1 -norm and l 2 -norm. Notably, b 0 is the truth parameter except the intercept α 0 . Since α 0 cannot be consistently estimated under case-control studies (Prentice and Pyke 1979), the resulting estimator for the intercept is a consistent estimator ofα 0 = α 0 + log{ρ 1 W 0 /(ρ 0 W 1 )}. Theorem 1 demonstrates this fact in high-dimensional case.
Theorem 2. Under Conditions (C1)-(C5), suppose that λ = O{ log(p n + 1)/n}, then for each j = 1, . . . , p n , Note that Theorem 1 is still true in the rare-event case as long as those conditions in Proposition 2 hold.
Remark 2. We give two sets of sparsity assumptions for the bounded design and sub-Gaussian design, respectively. When ||Z|| ∞ is bounded, we require s 0 , s 1 , p n and n satisfy the following condition: which are the same as the conditions in Corollary 3.1 in van de Geer et al. (2014) or Theorem 3.8 in Battey et al. (2018). For the sub-Gaussian case, we require It is not surprising that we need a stronger condition under the sub-Gaussian case. As pointed out by Battey et al. (2018), the sub-Gaussian design requires an extra factor, a polynomial of log p n , compared with the order under the bounded design.
In fact, it can be shown without further difficulty that the proposed de-biased estimator in (2.5) is still √ n-consistent and asymptotically normal, so long as the initial estimateθ init satisfies that ||θ init −b 0 || 2 ≤ C s 0 log(p n + 1)/n and ||θ init || 0 ≤ Cs 0 with high probability for some constant C > 0.

Theorem 3. For any fixed subset
Theorem 3 indicates that for any fixed subset G ⊆ {1, . . . , p n }, we have that for all z ∈ R G and Gj,j is the (j, j)th entry of G . Based on Theorem 3, one can conduct simultaneous inference for β 0G for any fixed subset G ⊂ {1, . . . , p n }. For instance, one may consider to test H 0 : Q 1 β 0 = 0 for Q 1 = (I |G| , 0) ∈ R |G|×p n that is of full rank. Here I |G| is an identity matrix with order |G|, the cardinality of G, which is fixed and does not vary with n and p n . Clearly, testing H 0 is equivalent to testing β 0j = 0 for all 1 ≤ j ≤ |G|. In particular, according to Theorem 3, the distribution of || √ n −1/2 GβG || 2 2 is asymptotically χ 2 (|G|) under H 0 . Let ξ μ be the (1 − μ)-quantile of χ 2 (|G|). Then, H 0 is rejected if max j∈G n|β j | 2 / Gj,j > ξ μ .
To study the theoretical properties for the general test, a relatively stronger sparsity condition is needed.
Condition ( Zhu and Bradic (2017). Since we are considering a testing problem with a general B 0 , the price to pay is a stronger sparsity condition. Specifically, Condition (C6) requires s 4 0 {log(p n )} q /n → 0 as n → ∞, while Condition (C4) imposes s 2 0 {log(p n )} q /n → 0 as n → ∞ for some constant q ≥ 1. The reason is that, different from the lasso estimatorθ init ,θ defined in (2.7) is in some irregular space B 0 . It would be hard to derive a sharp bound for ||θ − b 0 || 2 . We can only establish the bound for ||θ − b 0 || 2 via ||θ − b 0 || 1 . Thus, it is inevitable to have a higher power term of s 0 in Condition (C6).

Remark 3. Condition (C6) is similar to those conditions in
Theorem 4 shows that the asymptotic distribution of T n can be well approximated by the resampling distribution of T n given the observations. In particular, the proposed general test procedure is asymptotically exact at level μ; in other words, the probability of committing the Type I error tends to μ as n → ∞.
Next, we investigate the power of the proposed test. For a given suitably positive sequence c n , we define the alternative hypothesis as H 1,n : min β∈B 0 ||β − β 0 || ∞ ≥ c n . Theorem 5 presents a power guarantee for the proposed test, indicating that the test has power to correctly reject H 0 with deviation greater than O(n −1/4 ).
Remark 4. Statistical inference on generalized linear models has been studied by several important works, for example, van de Geer et al. (2014), Belloni, Chernozhukov, and Wei (2016), Zhu and Bradic (2017), and Zhang and Cheng (2017). The main difference between our work and the aforementioned works is that, their results are established with random samples or prospective samples, that is, the observations are independent and identically distributed pairs of (X, Y).

Simulation Studies
Extensive simulations are conducted to examine the finitesample performance of the proposed method. We generate independent data from the following model . . . , β 0j , . . . , β 0p n ) , β 0j = 0 for 1 ≤ j ≤ s 0 and β 0j = 0 otherwise. The predictor vector X =  (X 1 , . . . , X p n ) follows N (0, ). For each case, the intercept α 0 is set, respectively, to achieve the population percentage of cases Pr(Y = 1) = 0.05, that is, a rare incidence rate. We collect casecontrol samples by separately taking a random of n 1 cases from the case population and a random sample of n 0 controls from the control population according to (2.2). Different combinations of (n 0 , n 1 , p n , s 0 ) are tried and the results are shown in the figures and tables. In the simulations, the tuning parameters λ and λ j , j = 1, . . . , p n are chosen by the 10-fold crossvalidation. The results are based on 200 replications for each configuration.
Let CI j denotes a two-sided 95%-confidence interval for β 0j , j = 1, . . . , p n . For each scenario, we report the results of coverage probabilities and interval lengths by taking average over all coordinates, coordinates on the support S 0 , and coordinates on S c 0 , respectively, that is, Avglength(CI j )/s 0 , where P(·) and Avglength(·) are the empirical probability and the averaged length of the confidence intervals based on 200 replications, respectively. In addition, for the testing procedure, we present the empirical size and power of the test statistic || √ n −1/2 GβG || 2 2 under the null-hypothesis H 0 : β 0j = 0 among all j ∈ G at the significance level of μ = 0.05. The results are based on 200 replications. The configurations about G can be found in Table 2. For comparison, we also collect prospective samples with sample size n = n 0 + n 1 and compute the debiased estimators based on the prospective samples.
The results are summarized in Tables 1 and 2. It can be seen that the proposed method gives more accurate coverage probabilities and shorter interval lengths than the counterpart with prospective samples across different settings. The performance of the proposed method gets better as the sample size increases. By contrast, the performance of the de-biased estimator based on prospective samples is largely discounted in imbalanced cases in terms of low coverage probabilities of S 0 and larger AL S 0 , AL S c 0 and AL. Similar to our observation for the confidence intervals, the proposed test based on the test statistic || √ n −1/2 GβG || 2 2 is reliable and powerful in terms of the empirical sizes and powers that are close to the nominal size 0.05 and 1, respectively.
Next, for cases (i)-(ii) with n 0 = 150, n 1 = 150, p n = 400 and s 0 = 5, we present the histograms of our proposed debiased estimate √ n(β j − β 0j )/σ j , j = 1, 2, 9, 10, in Figures 1  and 2 to check the asymptotic normality ofβ j , whereσ j is given in Theorem 2. Similarly, we collect prospective samples with size 300 and compute the de-biased estimator based on the prospective samples for comparison. Note that j = 1, 2 are on the support of β 0 and j = 9, 10 are not. Figures 1 and 2 confirm the asymptotic normality of the proposed de-biased estimator. On the contrary, the de-biased estimator for j = 1 and j = 2 based on the prospective samples fails to converge in the imbalanced cases.
Lastly, we examine the point estimation for case (i) under different settings of (n 0 , n 1 , p n , s 0 ); see Figure 3. The performance of the initial Lasso estimator and the de-biased estimator with the prospective samples is also displayed. For ease of presentation, the estimates for the first 100 coordinates are plotted in Figure 3. It can be seen that, for j ∈ S 0 , the proposed debiased estimator is less biased than the Lasso estimator and the counterpart based on the prospective samples. For j ∈ S c 0 , the proposed de-biased estimator is close to zero. In other words, the proposed de-biased estimator performs better than the debiased estimator based on the prospective samples in terms of bias-correction, which is consistent with our theoretical results.

Simulated Data: The General Testing Problem
In the second part, we conduct simulations to check the performance of the general test for two null hypotheses of the form where k 1 and k 2 are prespecified constants. The former is to test the sparsity level and the latter is to test minimum signal strength. In the simulation setting, the true parameter β 0 is set to be β 0j = 1.5 for 1 ≤ j ≤ 5 and zero otherwise. For the design matrix, we consider two scenarios: (a) is Toeplitz with ij = 0.8 |i−j| ; (b) is block diagonal, that is, ii = 1, ij = 0.9 for 5(k − 1) + 1 ≤ i = j ≤ 5k with k = 1, . . . , p n /5 and ij = 0 otherwise, where z denotes the integer part of z.
We set k 1 = 5 and k 2 = 1.0. Table 3 summarizes the average Type I errors of the proposed test for the two cases. It can be seen that the proposed test can effectively cope with both convex and nonconvex null sets B 0 and performs closely to the nominal level μ = 0.05. Moreover, the proposed test works reasonably well across the dimensionality p n and performs better than the counterpart based on prospective samples.
Next, we check the power of the proposed test in detecting the alternative with null hypothesis in (a) and (b). Set p n = 400. Different combinations of (k 1 , k 2 ) are considered and their corresponding results are reported in Table 4. The averages of the rejection probabilities are reported in Table 4. One can see from Table 4 that the empirical powers for testing the two cases are satisfactory. NOTE: Proposed, the proposed de-biased method; prospective, the de-biased method based on the prospective samples.

Application
As an illustration, we apply the proposed method to the colorectal cancer methylome dataset (Fennell et al. 2019), which is available in the EMBL-EBI Arrayexpress database (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-8148/?query=case+control). The goal of this study is to select attributes that are significantly related to the colorectal cancer.   The response variable is coded as 1 (colorectal cancer) and 0 (nondisease). The dataset consists of 248 samples with 27,951 attributes, among which there are n 1 = 216 cases and n 0 = 32 controls. We fit the logistic regression model to this case-control dataset. Although over 25,000 attributes are represented on the Illumina HumanMethylation450 BeadChip arrays, many of them are not expressed in the colorectal cancer. Hence, two preprocessing steps are carried out: (a) remove any attribute for which more than 90% observations among the 248 samples is zero, as the majority of value zero in observation can be regarded as lack of information; (b) some preliminary screening method such as Xie et al. (2020) can be applied to remove those attributes with low correlation to reduce the high-dimensionality to a more feasible size of 300 attributes. After the two preliminary steps, the sample size n = 248 and the number of attributes p n = 300.

Supplementary Materials
The supplementary material contains all technical proofs.