Single-index Thresholding in Quantile Regression

Abstract Threshold regression models are useful for identifying subgroups with heterogeneous parameters. The conventional threshold regression models split the sample based on a single and observed threshold variable, which enforces the threshold point to be equal for all subgroups of the population. In this article, we consider a more flexible single-index threshold model in the quantile regression setup, in which the sample is split based on a linear combination of predictors. We propose a new estimator by smoothing the indicator function in thresholding, which enables Gaussian approximation for statistical inference and allows characterizing the limiting distribution when the quantile process is interested. We further construct a mixed-bootstrap inference method with faster computation and a procedure for testing the constancy of the threshold parameters across quantiles. Finally, we demonstrate the value of the proposed methods via simulation studies, as well as through the application to an executive compensation data.


Introduction
Threshold regression is useful for identifying subgroups with heterogeneous effects determined by some threshold variables. Heterogenous effects are commonly seen in many applications from economics, biomedicine and marketing. For example, research showed that countries may have different growth patterns according to large and small initial endowment or exports (Hansen 2000;Durlauf, Johnson, and Temple 2005;Christopoulos, McAdam, and Tzavalis 2021). The effect of body mass index on blood pressure was found to be different for subgroups with high and low BMI (Kaufman et al. 1997;Kerry et al. 2005;Tesfaye et al. 2007;Zhang, Wang, and Zhu 2014). In personalized medicine, the efficacy of treatment was shown to vary according to patients' various characteristics such as demographic information and pretreatment covariates (Rothwell 1995;Wang et al. 2007;VanderWeele et al. 2019). The estimation of heterogeneous effects and the identification of subgroups are important in these applications.
This article focuses on threshold quantile regression, which was first introduced by Cai and Stander (2008), Cai (2010), and Galvao, Montes-Rojas, and Olmo (2011) in the context of autoregression, and further studied by Lee, Seo and Shin (2011), Li et al. (2011), Yu (2013, and Su and Xu (2019) in other settings. Different from the mean regression, threshold quantile regression allows researchers to study the effect heterogeneity in different tails of the outcome distribution, which may be of more interest in applications such as medical cost, birth weight, income, and so on.
One limitation in the existing threshold quantile regression models is that the threshold splits the sample based on a single CONTACT Huixia Wang judywang@gwu.edu Department of Statistics, The George Washington University, Washington, DC 20052. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA. and observed variable. Numerous studies have shown that subgroups may be characterized by several variables together. For example, Yu and Fan (2020) argued that the effect of experience on salary tends to vary with both education and gender. Furthermore, Shen and He (2015) and Fan, Song and Lu (2017) showed that a risk score, defined to be some function of multiple predictors, is helpful for identifying the subgroup of AIDS patients who benefit more from the treatment. Jiang et al. (2014) demonstrated that multiple SNPs (single-nucleotide polymorphism), rather than any individual SNP, tend to alter the risk for developing a particular disease. The importance of using numerous covariates to identify optimal subgroup for treatment was also discussed in He, Lin, andTu (2018), andVanderWeele et al. (2019). These applications suggest that it would be more desirable to make use of multiple covariates to determine the subgroup and heterogeneous effects in applications.
Motivated by the above insights, we consider the following single-index threshold quantile regression model: where y i is the response from the ith subject, Q y (τ |·) denotes the τ th conditional quantile of the response given covariates, τ ∈ (0, 1) is the quantile level of interest, z i is the vector of design variables,z i is a subset of z i , and x i includes the threshold variables. The parameters β(τ ) and δ(τ ) are regression coefficients: β(τ ) measures the baseline effect of z i on the τ th quantile of y i , and δ(τ ) are the threshold effect, depicting the heterogeneous effect ofz i in the subgroup. The parameters γ (τ ) are threshold coefficients, which determine the single-index threshold, an unknown linear combination of variables. In this article, we focus on the fixed-threshold-effect framework by assuming δ(τ ) = 0.
The threshold parameters γ (τ ) play a central role for determining the single index and therefore the subgroup with heterogeneous effects. Inference on γ (τ ) could help to identify important threshold variables and determine the characteristics of the subgroup. However, the estimation and uncertainty quantification for γ (τ ) are complicated by the indicator function involved in Model (1). Existing work on threshold regression under the fixed-threshold-effect framework showed that direct estimators of threshold parameters have nonstandard limiting distributions, which are too complicated to be practically useful for statistical inference; see Chan (1993), Bai (1995), Yu (2013), and Yu and Fan (2020), to name a few. Some other researchers developed inference methods based on the asymptotic theory under the shrinking-effect framework, but the methods are limited to the cases with small threshold effects; see Hansen (2000), Caner (2002), Kuan, Michalopoulos and Xiao (2017), and Su and Xu (2019). The problem is even technically more challenging for Model (1) due to the nondifferentiability of the loss function involved in quantile regression (Koenker 2005) and the additional nonconstant covariates in thresholding.
To overcome the challenges, we propose a smoothed estimator by smoothing the indicator function in the thresholding. We show that the resulting estimators of both regression and threshold coefficients are asymptotically normal and they are asymptotically independent of each other. In addition, we show that the smoothed estimator of γ (τ ) has the convergence rate of h n /n, where h n → 0 is the bandwidth involved in the smoothing of the indicator function. While the unsmoothed estimator of γ (τ ) could have a higher convergence rate, the proposed smoothing leads to three big advantages. First, the asymptotic normality of the threshold coefficient estimator facilitates the statistical inference based on the direct variance estimation. Second, the asymptotic normality theory also allows us to develop a computationally efficient mixed-bootstrap algorithm, which is convenient and less sensitive to the choice of bandwidth than the direct variance estimation. Third, under this framework, we are able to establish the asymptotic properties for the quantile process, which are useful for characterizing and comparing subgroups across quantiles. Particularly, we develop new procedures for testing the constancy of γ (τ ) across τ in a region of quantiles. Such tests can help researchers to identify commonalities across quantiles, which can be subsequently used to improve the efficiency of subgroup analysis.
To our best knowledge, this article is the first that formally studies the estimation and inference for single-index threshold quantile regression models. Among the limited literature, Seo and Linton (2007) and Yu and Fan (2020) are the two that also studied single-index threshold but in the mean regression setup. Yu and Fan (2020) considered the direct estimator without smoothing, and proposed a nonparametric posterior interval method for inference on the threshold parameters. Seo and Linton (2007) considered smoothed estimation and suggested Wald-type, likelihood-type and bootstrap methods for inference. However, these inference methods either do not apply to models with heteroscedastic errors, for which quantile regression has more appeals, or are computationally intensive and even more so for quantile regression. In this article, we develop new inference procedures to address the unique challenges rising from quantile regression, which require different techniques both computationally and theoretically.
The rest of the article is organized as follows. In the next section, we discuss the identification problem, present the proposed smoothed estimator and the profiled estimation procedure, and establish the large sample properties of the estimator. In Section 3, we propose a Wald-type method and a mixed-bootstrap method for inference. Section 4 is devoted to testing the constancy of threshold parameters across quantiles. Simulation studies are reported in Section 5, followed by an application to an executive compensation data in Section 6. Some concluding remarks are provided in Section 7. Technical proofs are relegated to the Appendix and the online supplement.

Smoothed Estimator and Asymptotic Properties
We will first address the identifiability issue of Model (1). Note that the same model continues to hold if γ (τ ) is divided by some positive constant. Although the normalization constraint γ (τ ) 2 = 1 is often used in the literature to ensure the identifiability, we consider a different method for easier interpretation. The identification of γ (τ ) requires that there exists a variable x 1 whose coefficient is nonzero, and that the probability distribution of x 1 conditional on x 2 = x/x 1 is absolutely continuous with respect to Lebesgue measure (Horowitz 1992). The notation x/x 1 denotes variables in x excluding x 1 . Let the first element in x 2 be the intercept. Model (1) can be rewritten as where ψ(τ ) = γ −1 (τ )/γ 1 (τ ) with γ 1 (τ ) and γ −1 (τ ) corresponding to x 1 and x 2 , respectively. If γ 1 (τ ) is positive, the normalization does not alter β(τ ) and δ(τ ). If γ 1 (τ ) is negative, β(τ ) and δ(τ ) can be redefined to make Model (2) consistent with Model (1).

Remark 1.
The identifiability constraint in model (2) requires the specification of one threshold variable x 1 whose coefficient is nonzero. In practice, this variable can be determined by domain knowledge, for example, in the executive compensation study considered in Section 6. When the prior knowledge is not available, we can conduct a preliminary analysis to select this variable as follows. For a candidate x 1 , we can test the hypothesis H 0 : The rejection of H 0 would suggest that there exists a subgroup whose structure is influenced by x 1 , so that x 1 should be included in the threshold function, and not otherwise. We can test the hypothesis by extending the score-type specification testing method in Zhang, Wang, and Zhu (2014) or Yu (2013), which were designed for threshold regression with a single threshold variable. The critical values can be calculated by using similar resampling methods as in Fan, Song and Lu (2017) and Yu (2013). The score-type test is constructed under the null hypothesis and thus is computational more convenient than the likelihood-ratio-type test in Lee, Seo and Shin (2011) and the Wald-type test in Galvao et al. (2014).

The Smoothed Estimator
For notational convenience, thereafter we let θ(τ ) = (β(τ ) T , δ(τ ) T ) T denote the regression coefficients, and let η(τ ) = (θ (τ ) T , ψ(τ ) T ) T be the collection of all unknown parameters. Note that we can view the quantile function in Model (2) as a nonlinear function of η(τ ). Therefore, a natural and direct estimator of η(τ ) can be obtained as follows: where ρ τ (u) = {τ − I(u < 0)}u is the quantile "check" loss function (Koenker 2005). Under the fixed-threshold-effect framework of threshold regression, it was known that the direct estimator of the threshold parameter often has a higher convergence rate than that of the regression parameter, but the former has a nonstandard limiting distribution that is too complicated to be useful for statistical inference. This challenge was observed in both leastsquares estimation (Chan 1993;Yu and Fan 2020) and quantile regression (Bai 1995;Yu 2013). To overcome this challenge, some researchers established asymptotic distributions for the direct estimator under the shrinking-threshold-effect framework, which assumes that the threshold effect decays to zero as n increases; see for instance Hansen (2000), Caner (2002), and Kuan, Michalopoulos and Xiao (2017). While these asymptotic distributions can be approximated through simulation, they may be limited to the small effect case and could lead to conservative confidence intervals when the threshold effect size is fixed. In a recent article, Su and Xu (2019) thoroughly studied the conventional threshold quantile regression under the shrinking-threshold-effect framework, and proposed a likelihood-ratio-based test procedure. Unfortunately, the method cannot be applied to the single-index threshold quantile regression model, since the limiting distribution ofψ(τ ) will no longer be a two-sided Brownian motion, but rather become a two-sided Brownian field due to the additional nonconstant covariates in thresholding; see Yu and Fan (2020) for related discussions in mean regression. Consequently, it will be difficult to simulate the asymptotic distribution of the likelihood-ratio-type statistics.
For convenient statistical inference, we propose a smoothed estimator obtained by smoothing the indicator function in thresholding. Let G(·) be a bounded continuous function satisfying that lim s→−∞ G(s) = 0 and lim s→+∞ G(s) = 1. Note that G is analogous to a cumulative distribution function rather than a density function traditionally used in kernel estimation. The proposed smoothed estimator of η(τ ) is defined aŝ where η = (β T , δ T , ψ T ) T , and h n → 0 is a positive sequence of bandwidth parameters.
The idea of smoothing was also used in various contexts to ease the computational or theoretical challenges. Horowitz (1992) and Seo and Linton (2007) proposed to smooth the indicator link and threshold function in binary regression and the threshold mean regression to obtain asymptotic normality. Brown and Wang (2005), Fu, Wang and Bai (2010), and Pang, Lu and Wang (2012) considered induced smoothing to avoid direct estimation of the error density involved in the asymptotic covariance matrices for rank and quantile regression. The quantile score function was smoothed in Whang (2006) and Wang and Zhu (2011) to achieve higher-order accuracy of inference, and in Wang, Stefanski and Zhu (2012) to provide appropriate correction for covariate measurement errors. Our setup involves two different sources of unsmoothness, one in the quantile loss function and the other in the thresholding. We propose to only smooth the indicator function involved in the thresholding to obtain asymptotic normality to facilitate statistical inference, so the smoothing has the similar purpose as in Horowitz (1992) and Seo and Linton (2007). The nondifferentiable quantile loss function, on the other hand, brings more challenges to both computation and theory. We address the computational challenge below, and discuss the challenges on statistical inference in Section 3.
The minimization in Equation (3) is challenging because the objective function is not differentiable everywhere and it is not convex in ψ(τ ). However, for every given ψ(τ ), minimizing S n (η; h n ) is equivalent to solving a linear quantile regression problem. Based on this observation, we consider the following profiled estimation method. First, for a given ψ(τ ), define S n β (ψ(τ )),δ(ψ(τ )), ψ(τ ); h n , (4) The profiled estimator for the regression coefficients is thus defined asθ For practical implementation, we recommend to use the Nelder-Mead algorithm (in the R function "optim") when good initialization is available, and otherwise use the genetic algorithm for optimizing nonsmooth objective functions as used in Zhang et al. (2012) and Wang et al. (2018).

Asymptotics for a Single Quantile
For convenience, we define e i (τ ) denote the design vector when threshold parameters are known. Let w i denote all the variables including z i , x i , and f e(τ ) (·|w) denote the conditional density function of e(τ ). We begin by imposing a set of regularity conditions to obtain the asymptotic consistency.
are in the interior of compact subspaces and , respec- The probability distribution of x 1 conditional on x 2 has everywhere positive density with respect to Lebesgue measure for almost every x 2 . (e) The minimum eigenvalue of E{f e(τ ) (0|w)ZZ T |x} is bounded away from zero uniformly over x.
Our article is under the fixed-threshold-effect framework indicated by Assumption 1(b), that is, δ(τ ) = 0. Assumptions 1(c) and 1(d) are needed to establish the asymptotic equivalence between the smoothed and unsmoothed objective functions as h n → 0. Assumption 1(e) ensures the identification of regression coefficients. The following theorem presents the consistency of the smoothed estimator.
Given the established consistency, the next step is to derive the asymptotic distribution. We first make some linear transformation of variables.
We are now ready to define the following quantities involved in the asymptotic covariances ofψ(τ ) and θ (τ ): where a = G (s) 2 ds. To establish the asymptotic normality, we make the following additional assumptions. We let M denote a constant whose value may vary in different contexts.
The boundedness condition in 2(a) is assumed for theoretical convenience, but it can be relaxed at the expense of lengthier proofs. Assumption 2(b) imposes restrictions on the conditional density of thresholding. In Assumption 2(c), e(τ ) is assumed to be independent with x 1 so that f e(τ )|w (u|w) = f e(τ )|w 2 (u|w 2 ) to facilitate the proof. Assumptions 2(e) and 2(f) impose conditions on the smoothing function. One candidate kernel satisfying the conditions is G(x) = (x) + xφ(x), where and φ are distribution and density functions of standard normal.
Remark 2. As also observed in the threshold regression literature (Seo and Linton 2007;Yu 2013;Su and Xu 2019), the regression coefficient estimatorθ (τ ) is not affected asymptotically by the threshold parameter estimatorψ(τ ), and the latter converges at a faster rate. Consequently, statistical inference for regression coefficients θ 0 (τ ) can be conducted as if ψ 0 (τ ) were known, and existing inference methods such as the Wald and rank score methods for quantile regression (Koenker 2005) can be directly applied. The asymptotic independence is not surprising as it may appear. One intuitive explanation is as follows.
As also argued in Yu (2012) and Yu (2015) for mean threshold regression, only local information around the neighborhood of ψ 0 (τ ) is relevant to the estimation of the threshold parameter ψ 0 (τ ). On the other hand, the estimation of the regression coefficients θ (τ ) depends on global information or information related to moments of the data. Theoretically the asymptotic independence between the estimator for regression coefficients and that for threshold parameters depends on the bandwidth parameter h n used to smooth the threshold indicator function.
The proof of Lemma 1 shows that the asymptotic covariance of the two estimators converges to zero at a rate of h 1/2 n . That is, a larger bandwidth means that the information in a wider neighborhood of the boundary is used and this consequently leads to larger dependence between the two estimators. Similar asymptotic independence property was also observed in the boundary literature, for instance in Chernozhukov and Hong (2004), where the boundary is defined by the jump in the conditional density of the regression error, analogous to the jump in the conditional quantile function in our framework.
Remark 3. The convergence rate ofψ(τ ) is h n /n, which means that a faster convergence rate of the bandwidth will escalate the convergence ofψ(τ ). By Yu and Fan (2020) and Su and Xu (2019), we would conjecture that the convergence rate of the unsmoothed estimator of threshold parameters is n −1 under the fixed-threshold-effect framework, and n 2a−1 under the shrinking-threshold-effect framework with δ 0 (τ ) = O(n −a ). Therefore, we obtain the asymptotic normality at the expense of the convergence rate. Note that in an attempt to solve these theoretical challenges, Su and Xu (2019) has also sacrificed the convergence rate. The convergence rate of our proposed smoothed estimator satisfies n −3/4 h n /n n −2/3 for k = 1 and n −3/4 h n /n n −5/8 for k > 1. Therefore, the convergence rate is not faster than n −3/4 , but is faster than n −5/8 .

Asymptotics for the Quantile Process
In practice, it is valuable to study the quantile process to obtain a complete analysis of the stochastic relationship between variables. The theoretical analysis can also help us develop testing tools to test whether the threshold parameters ψ(τ ) vary with quantiles or not. If ψ(τ ) is found to be constant across τ (or across a region of quantiles), composite quantile estimation can be used to provide a more efficient estimator for ψ(τ ). As a result of smoothing, when ψ(τ ) is treated as a stochastic process, the limiting process is Gaussian and can be characterized by its mean and covariance kernel.
Theorem 3. Suppose that Model (2), and Assumptions 1 and 2 hold for all τ ∈ T , where T is a compact set in (0, 1). In addition, assume that the parameter space in Assumption 1 and constant M in Assumption 2 are independent of τ , and when ψ 0 (τ ) = ψ 0 (τ ), V ψ (τ , τ ) = 0, and the two processes are asymptotically independent. Here, the independence between two stochastic processes is defined by that all finite-dimensional marginals of the two processes are independent of each other.

Inference on Threshold Parameters
As mentioned in Remark 1, the inference on regression parameters can be done as if ψ 0 (τ ) were known. Thus, we focus on the inference for threshold parameters. The threshold parameters ψ(τ ) play a central role for determining the single index and therefore the subgroup with heterogeneous effects. Inference on ψ(τ ) could help to identify important threshold variables and determine the characteristics of the subgroup. We propose two procedures, including a Wald-type method and a mixedbootstrap method, together with a step for finite sample bias correction to improve finite sample performance.

Wald-type Method
Based on the asymptotic normality in Theorem 2, a Wald-type method can be constructed for inference on ψ(τ ) by directly estimating the asymptotic covariance matrix, denoted as From the proof for Lemmas 1 and 4(c) in the supplement, we obtain the following asymptotic representations of V ψ (τ ) and Q ψ (τ ): . Thus, we can consistently estimate both V ψ (τ ) and Q ψ (τ ) by their sample version: where l n is an appropriately chosen bandwidth, are calculated by plugging in the parameter estimation. Following Powell (1986), when l n → 0, and nl 2 n → ∞, we can verify thatˆ The asymptotic covariance matrix depends on a number of quantities, including the conditional density of e(τ ), the kernel function G(·), as well as the bandwidth h n . Our numerical study and previous research on quantile regression showed that the Wald-type method could be unstable in finite samples and it is sensitive to the choice of bandwidth used in estimating the conditional density function; see Kocherginsky et al. (2005) and Wang and Zhu (2011). Based on the theoretical properties of the smoothed estimator, we propose a new bootstrap procedure as an alternative inference method in the next subsection.

Mixed-bootstrap Method
Thanks to the asymptotic normality of the smoothed estimator, we can construct an alternative and consistent inference approach based on the bootstrap, which avoids direct estimation of the asymptotic covariance matrix ofψ(τ ). However, the standard bootstrap method is computationally demanding, and even more so in our setup, as we need to perform profile estimation and solve quantile minimization multiple times for each resampling. To reduce the computational burden, we propose a new mixed-bootstrap procedure by using the theoretical properties of the smoothed estimator. Specifically, based on the Bahadur representation of the smoothed estimator, given in Equation (A.6) in the appendix, we first perturb the score functions to estimate the regression coefficients, which requires no optimization but only a summation calculation, and then update the threshold parameters. This procedure avoids optimizing the profiled objective function and thus is computationally efficient.
The detailed mixed-bootstrap procedure is as follows: • Step 1: Obtain the resampled counterpart ofθ (τ ) as follows: where {g i } n i=1 are iid nonnegative random weights with mean and variance both equal to one. Multiplying the score functions by (g i − 1) guarantees that the mean ofθ * (τ ) conditional on the data isθ(τ ). The consistent estimation of Q θ (τ ) can be obtained bŷ where l n → 0 and √ nl n → ∞. The estimationQ θ (τ ) can be obtained by the function "summary.rq" from the R package quantreg; see page 81 of Koenker (2005) for the detailed implementation.
• Step 2: Givenθ * Theψ * (τ ) is obtained by minimizing the weighted quantile loss function withθ * (τ ) plugged in. Since the optimization is only with respect to ψ(τ ), it is simpler than that in the profiled estimation in Equation (4). Any existing optimization algorithms including those designed for nonlinear quantile regression in Koenker and Park (1996), Hunter and Lange (2000) can be used in Step 2. • Step 3: Repeat Steps 1 and 2 for B times and calculate The sample covariance of {E (b) (τ )} B b=1 can then be used to approximate the covariance of nh −1 n ψ (τ ) − ψ 0 (τ ) .
Based on the asymptotic normality, we can construct the where z 1−α/2 is the (1 − α/2)th quantile of standard normal distribution andσ j is the bootstrap standard deviation of nh −1 nψj (τ ).
The following theorem justifies that the asymptotic covariance of the mixed-bootstrap estimator is the same as that of the smoothed estimator. We need to note that the mixed-bootstrap procedure relies on the asymptotic normality and the Bahadur representation of the smoothed estimator, and thus cannot be applied to the unsmoothed estimator.
Theorem 4. Suppose that Model (2) at τ ∈ (0, 1), and Assumptions 1 and 2 hold. In addition, {g i } n i=1 are iid weights with mean and variance both equal to one and sup i≥1 |g i | ≤ M almost surely. Then conditional on

Bias-corrected Confidence Interval
Both the Wald-type and mixed-bootstrap methods rely on the asymptotic normality of the smoothed estimatorψ(τ ). As discussed in Remark 2, the smoothing facilitates the convenient inference at the expense of slower convergence rate and bias. A smaller bandwidth h n leads to smaller bias but further deviation from the normality. Although the bias is asymptotically ignorable, in finite samples this may affect the coverage of confidence intervals. To further improve the finite sample performance, we propose a bias correction method for confidence intervals of ψ 0 (τ ).
By the Bahadur representation of the smoothed estimator given in Equation (A.7) in the appendix, the bias ofψ(τ ) − ψ 0 (τ ) is The matrix Q ψ (η 0 (τ ); h n ) −1 , defined explicitly in Equation (A.5) in the appendix and abbreviated as (Q ψ ) −1 , can be estimated by the relationship where (τ ) = var ψ (τ ) − ψ 0 (τ ) can be estimated by the mixed-bootstrap procedure, which is more stable than the Wald-type method. Replacing the mean and covariance of f ψ by their sample counterparts, we can obtain the estimated bias, denoted asd(τ ). Then the bias-corrected 100(1 − α)% confidence interval of ψ j (τ ) can be constructed as follows: , 0 andσ j is the estimated standard deviation ofψ j (τ ), computed either by the Wald-type method or the mixed-bootstrap method. Here, we replaced j (τ ) byd + j (τ ) andd − j (τ ), since when the bias is small, treating the estimated bias as the true value can cause the confidence interval to have substantial undercoverage. The idea here, which is also employed by Qu and Yoon (2015), is to allow for, but not to enforce, a bias adjustment. By the proof of Lemmas 1 and 4 in the supplement, we can show thatd j (τ ) converges to zero at the rate of h k +2 n (k ≥ 1 is defined in Assumption 2(f)), while the term associated with the standard deviation in the confidence interval converges to zero at the rate of h n n . Therefore, when nh 2k +3 n → 0, the bias term diminishes faster than the standard deviation term. So the bias correction will not affect the asymptotic coverage of the confidence interval.

Testing for the Constancy of ψ(τ ) Across τ
The Model (2) allows both regression coefficients and the subgroup structure to vary with the quantile level τ . In many applications, however, it is not unusual that the subgroup structure, determined by the threshold parameters ψ(τ ), varies little across quantiles in a certain region. It will then be interesting to test the constancy of ψ(τ ) across τ . If the test suggests that ψ(τ ) is constant, this not only can simplify the interpretation of the model, but also can help construct more efficient estimation of the common parameters by combining information across quantiles, for instance, then by adopting the composite quantile estimation idea as in Zhang, Wang, and Zhu (2017) and Su and Xu (2019). Based on the asymptotic properties in Theorem 3, we propose methods for testing the constancy of ψ(τ ) across τ for two cases when the common parameter ψ 0 is known or left unspecified.

Constancy of ψ(τ )-Known ψ 0
We first consider testing the following hypotheses: where the null value ψ 0 is assumed to be known.
We focus on the process v n (·) = nh −1 n ψ (·) − ψ 0 , and construct the Kolmogorov-Smirnov (KS) type statistic as whereˆ (τ ) can be any consistent estimation of var{v n (τ )} and v n (τ ) ˆ (τ ) = v n (τ ) Tˆ (τ )v n (τ ). The following theorem states the limiting distribution of the KS statistic under the null and the alternative hypotheses.

Theorem 5. Under the assumptions in Theorem
In our implementation, we also perform a bias correction and modify v n (·) as v n (·) = nh −1 n ψ (·) −d(·) − ψ 0 . As before we calculate the test statistic and critical value by adapting the mixed-bootstrap method to avoid direct estimation of (τ ). Specifically, in each resampling, we calculate E (b) No bias correction is needed in the mixed-bootstrap statistic since the bias of E (b) (τ ) conditional on the data is zero by the construction. It is practical to use a grid of equally spaced quantiles {τ 1 , . . . , τ m } ∈ T to approximate the entire process. Our numerical results show that the test is insensitive to the choice of m as long as the mesh of the quantile grid is not too large.

Constancy of ψ(τ )-Unknown ψ 0
In general, ψ 0 is unknown. In this case, we consider the hypotheses  Note that in this setup, ψ(τ ) is a function of τ , and ψ 0 represents the unknown nuisance parameter; both have to be estimated. The unknown parameters always jeopardize the distributionfree character of classical tests, which is called the "Durbin problem". This problem was first posed in Durbin (1973) for parametric empirical process. There are two ways to overcome the problem in the literature. One solution is to use the Khmaladze martingale transformation, exemplified by Koenker and Xiao (2002). Another solution is to use a resampling procedure, like that used by Chernozhukov and Hansen (2006). The simulation studies in Chernozhukov and Hansen (2006) suggested that resampling leads to an accurate size and better power than Khmaladzation.
We propose a different solution. As commented in Remark 2, the unsmoothed estimator for threshold parameters would converge at a faster rate than the smoothed estimator. Motivated by this, we could replace the unknown ψ 0 inṽ n (·) by a consistent unsmoothed estimatorψ, and definẽ The estimatorψ can be taken as the unsmoothed estimator obtained at any quantile level τ ∈ T . In practice, to achieve more estimation stability, we propose to estimate ψ 0 by the unsmoothed composite quantile estimator obtained under H 0 , defined asψ Sinceψ converges at a faster rate than the smoothed estimator ψ(·), we can show thatṽ n (·) andS n converge to the same asymptotic distributions as v n (·) = nh −1 n (ψ(·) − ψ 0 ) and S n = sup τ ∈T v n (τ ) ˆ (τ ) under H 0 . The critical value ofS n can then be calculated by following the same procedure as in Section 4.1.
Under Model (6), the threshold parameters are constant across quantiles when d = 0, while they are more complicated and quantile-specific when d > 0. For Case 2, the true conditional mean function is: Therefore, for Case 2, the covariates (z 1 , z 2 ) have threshold effects on the τ th quantile of Y but not on the mean. For such scenarios, the proposed single-index threshold quantile regression can capture the heterogeneous effects of covariates that may be overlooked by mean regression. Throughout our numerical studies, we let G(x) = (x) + xφ(x) be the kernel function used in the smoothed estimator. We consider four sample sizes n = 200, 500, 1000, and 2000, and repeat the simulation 1000 times for each scenario.

Estimation With Different Bandwidths
The proposed smoothed estimator depends on the bandwidth h n . Based on the theoretical conditions on h n in Assumptions 2(g), We consider the following rule of thumb h n = cs log(n)/ √ n, where c is a constant and s is the standard deviation of x 2 ψ 0 (τ ) + x 1 . In practice, s can be estimated by the sample standard deviation of x 2ψ (τ ) + x 1 withψ(τ ) as the unsmoothed estimator. We assess the sensitivity ofψ(τ ) against the bandwidth by varying c. Figure 1 shows the bias and standard deviation ofψ 0 (τ ) (the intercept in the thresholding) andψ 1 (τ ) (the threshold parameter for x 2 ) at τ = 0.5. As we expect, smaller bandwidth produces smaller bias and standard deviation, which is consistent with the discussion in Remark 3. Furthermore, Figure 2 presents the Normal Q-Q plots ofψ 0 (τ ) andψ 1 (τ ) at τ = 0.5 for n = 1000. Results confirm that the empirical distribution of threshold parameter estimation may deviate from normal distribution when the bandwidth approaches zero. The deviation is more obvious for Case 1. As a result, if we are interested in statistical inference based on the Gaussian approximation, a larger bandwidth is preferred, such as c ∈ [0.5, 2].

Inference for ψ(τ ) at a Given Quantile Level
We first compare the Wald-type and mixed-bootstrap methods for estimating the standard deviation (SD) ofψ 0 (τ ) andψ 1 (τ ). As suggested in Section 5.1, we focus on the bandwidth h n = cs log(n)/ √ n with c = 0.5, 1, and 2 as such choices lead to decent Gaussian approximation. For the Wald-type method, we choose l n = 1.843σ n −1/5 , whereσ is the sample deviation of For the mixed-bootstrap method, the random weights {g i } n i=1 are generated from the Rademacher distribution, which takes values 0 and 2, each with probability 0.5, and 1000 bootstrap replicates are then generated. We conduct simulations for τ = 0.5 and τ = 0.7. Table 1 report the means of estimated standard deviations (SD) ofψ 0 (τ ) andψ 1 (τ ). The Monte Carlo SD can serve as a gold standard. We observe that the estimated SD by the mixedbootstrap is very close to the Monte Carlo SD, while the Waldtype method shows unstable performance for Case 2. Table 2 report the coverage probabilities of the 95% confidence intervals for ψ 0 (τ ) and ψ 1 (τ ) obtained from different methods. The coverage probability of the Wald-type CI is generally low, especially for Case 2, partly due to the unstable SD estimation. The mixed-bootstrap method performs better, while in some cases the CIs have slight undercoverage because of the bias due to smoothing. This undercoverage is more visible for c = 2, since the larger bandwidth produces larger bias. Overall, the bias correction leads to improvement to both mixedbootstrap and Wald-type confidence intervals. We suggest to use   the bias-corrected mixed-bootstrap CI, since its coverage probability results are more stable and less sensitive to the bandwidth choice across different scenarios. Please refer to the Supplementary Material for additional simulation results, including cases with more threshold variables, and the comparison with the likelihood-ratio-based inference method in Su and Xu (2019), and with the mean inference method in Seo and Linton (2007).
We assess the Type I error and power of the two proposed KS-type tests: KS 1 and KS 2 , which assume known and unknown ψ 0 , respectively. We calculate test statistics based on two sets of quantile grids: m = 17 and m = 33 quantiles equally spaced between [0.1, 0.9]. For both methods, the critical values are obtained by using the mixed-bootstrap with 1000 bootstrap samples. When still using h n = cs log(n)/ √ n with c = 0.5, 1, 2, we found that c = 1, 0.5 tend to be somewhat conservative. The rejection percentages at the 5% significance level with c = 2 are summarized in Table 3. The two tests give similar performance: the sizes are well controlled under the null hypothesis, and the power increases gradually to one as d increases. In addition, the two tests seem to be insensitive to the number of quantile levels m used to calculate the statistics; our experience suggests that it is often sufficient to use quantile grids with the mesh less than 0.05.

An Application to the Executive Compensation
The rapid rise in executive compensation from the mid-1980s until 2005 has sparked an intense debate and investigation regarding the nature of the pay-setting process; see Bertrand and Mullainathan (2000), Garvey and Milbourn (2006), Frydman and Jenter (2010), and Edmans et al. (2017). Compensation is Table 2. Coverage percentages of the 95% confidence intervals for ψ 0 (τ ) and ψ 1 (τ ) in Cases 1-2 by the Wald-type (Wald), Wald-type with bias correction (WBC), mixedbootstrap (MB) and mixed-Bootstrap with bias correction (BBC), where h n = cs log(n)/ √ n.  Table 3. Rejection percentages of two proposed tests for testing the constancy of ψ(τ ) across τ ∈ [0.1, 0.9] with h n = 2s log(n)/ √ n. KS 1 and KS 2 are tests assuming known and unknown ψ 0 , respectively. m is the number of equally spaced quantiles in [0.1,0.9]. The nominal level is 5%. generally decided at the end of each fiscal year by the compensation committee of the board of directors, at which time they already know whether market performance is beneficial or not for the firm (Garvey and Milbourn 2006). Some researchers such as Bertrand and Mullainathan (2001) found that executives can benefit from luck. Luck represents the market performance, that is, changes in the firm performance that are beyond the chief executive officer's control. Further, Garvey and Milbourn (2006) found that the effect of luck on compensation is significantly larger when luck is up (luck>0) than when luck is down (luck<0). They refer to this phenomenon as "the asymmetry in pay for luck".
We will apply the proposed methods to an executive compensation data to study the heterogeneous effects of predictors on compensation. The executive's compensation data used in this article were drawn from Standard and Poor's Execu-Comp, and the firm's return data were drawn from the Center for Research in Security Prices (CRSP), available from https://wrds-www.wharton.upenn.edu/users/tou/. Our sample period covers the years 1993-2005, when the data was available. In total there are in total 15837 observations for 2508 firms and 3709 executives. The response y is defined as the change in the logarithm of total compensation. We consider two predictors, luck and skill, which are determined as follows. First, the firm performance (measured by the firm's one-year stock return) is regressed on equal-weighted and value-weighted industry returns. Then the predicting value (centered at zero) is taken to represent luck (i.e., market performance), and the residual is used to represent skill (i.e., firm-specific performance).
As an exploratory data analysis, we plot in Figure 3 the estimation results from quantile regression of y on luck or skill for the subgroups luck > 0 and luck < 0 separately at three quantiles τ = 0.2, 0.5, and 0.8. At all three quantiles, the effects of luck and skill appear to differ in two subgroups, indicating the "asymmetry in pay for luck" phenomenon. In addition, the asymmetry patterns tend to vary across quantiles. As pointed out in Garvey and Milbourn (2006), the asymmetry may not necessarily occur around the zero value of luck. In addition, as suggested in Yu and Fan (2020), the threshold may depend on both luck and the firm size. These motivate us to consider the following model: where L is luck, S is skill and F is the standardized firm size, measured by the market value of the firm in millions of dollars. The model allows us to analyze the heterogeneous effect of predictors at different quantiles. In addition, the response distribution appears to be heavy-tailed (Figure 4), so median regression may be more desirable than mean regression for analyzing this data. We fit the model in Equation (9) at 33 equally spaced quantile levels in [0.1, 0.9] by using the smoothed estimation method. The bandwidth is chosen as h n =ŝ(τ ) log(n)/ √ n, whereŝ(τ ) is the interquantile range of x Tψ (τ ), andψ(τ ) is the unsmoothed estimator. Since the firm size is heavily skewed, we use the interquantile range to get a robust estimate of the standard deviation. Figure 5 presents the estimates of δ 1 (τ ), δ 2 (τ ), ψ 0 (τ ),ψ 1 (τ ) across τ ∈ [0.1, 0.9] along with the 95% pointwise confidence bands. The confidence intervals for δ 1 (τ ), δ 2 (τ ) are constructed by inverting the rank score test in Koenker (2005) with estimated threshold parameters plugged in. The confidence intervals for ψ 0 (τ ), ψ 1 (τ ) are constructed by the mixed-bootstrap method with bias correction.
Some interesting findings emerge from Figure 5. First, results suggest that executives in the upper quantiles (τ ≥ 0.65) enjoy the enhanced "pay for luck, " while those at the lower quantiles suffer from reduced "pay for luck. " One possible explanation is that executives at the right tail of the compensation distribution may have more influence over their pay. As a result, they can insulate from bad luck and enjoy more rewards from good luck. On the other hand, executives at lower quantiles are likely to be   The estimation for the regression coefficients of luck and skill, (δ 1 (τ ), δ 2 (τ )), and the intercept and slope coefficient of firm size in the thresholding, (ψ 0 (τ ), ψ 1 (τ )), across τ ∈ [0.1, 0.9]. The shared areas are the corresponding 95% pointwise confidence bands. more risk-averse and have lower incentive contracts, exemplified by Becker (2006). Thus, these executives cannot enjoy more when the market is bull.
Second, Figure 5 suggests the threshold parameters tend to be common in three quantile regions. Thus, we apply the proposed KS-type test to test the constancy of ψ(τ ) across τ ∈ [0.1, 0.6], [0.65, 0.75], [0.8, 0.85] separately. Following the suggestion from the simulation study, we calculate the KS statistic based on equally distanced grids from the quantile region with meth equals 0.025, and let h n = 2ŝ log(n)/ √ n, whereŝ = 0.32 is the interquantile range of x Tψ (τ ) calculated at τ = 0.5. The resulting p-values are 0.61, 0.55, 0.41, suggesting that the thresholds do not vary significantly within each of the three quantile regions. On the other hand, testing the constancy of ψ(τ ) across τ ∈ [0.1, 0.9] gives the p-value of zero, indicating that the thresholds vary significantly across this wider region.
We now use the piecewise constancy information and obtain the composite quantile estimators for ψ(τ ) within each quantile region, using the same idea as in Equation (5) Bertrand and Mullainathan (2001) argued that larger firms typically have fewer large shareholders due to the more expensive shares. Thus, executives in larger firms can have more influence over their pay. This point was also noted in Garvey and Milbourn (2006).

Discussion
In this article, we consider single-index thresholding in quantile regression. This model completes the subgroup literature by characterizing subgroups with thresholding. Technically, the additional nonconstant variables in thresholding pose more challenges to the statistical inference of threshold parameters. We propose a smoothed estimator for convenient statistical inference both for a single quantile and for the quantile process. The proposed profile estimation algorithm works well with a modest number of threshold variables. However, when the number of candidate threshold variables is large, more efficient algorithms are needed to conduct estimation and variable selection. We develop tests to assess the constancy of the vector of threshold parameters ψ(τ ) across τ . The proposed method can be directly extended to test the constancy of a subset of ψ(τ ) across τ . If the test fails to reject the null hypothesis, a more efficient composite quantile estimation can be used to estimate the common parameter by pooling information across quantiles. We leave the study of theoretical properties of the composite quantile estimator and related inference to the future work.
For the inference of threshold parameters, an interesting question is whether a bootstrap method can be constructed directly for the nonsmoothed estimator. The standard bootstrap method was proven to be inconsistent for the threshold mean regression (Yu 2014). Common solutions to remedy the limitation of standard bootstrap include the smoothed bootstrap and the subsampling. We leave it as a future topic to investigate the possibility of constructing valid bootstrap methods for the nonsmoothed estimator in the single-index threshold quantile regression framework.
While the proposed methods work for models with multiple threshold variables, the computation could be challenging when the dimension of threshold variables is high. To accommodate high-dimensional covariates, one possible solution is to adapt the path-following algorithm in the recent article Feng, Ning and Zhao (2019) that is developed for binary threshold regression. The computational and statistical guarantees of this algorithm are largely unknown and require more careful investigation.

Supplementary Material
The online supplement contains additional simulation results, and the proofs for the theorems in the article.