Estimation and Testing of Varying Coefficients in Quantile Regression

In this article, we establish a novel connection between the null hypothesis H0 on the coefficients and a rank-reducible form of the varying coefficient model in quantile regression. We use B-splines to approximate the varying coefficients in the rank-reducible model, and make use of the fact that the null hypothesis H0 implies a unidimensional structure of a transformed coefficient matrix for the B-spline basis functions. By evaluating the unidimensional structure, we alleviate the difficulty of testing such hypotheses commonly considered in varying coefficient quantile models. We demonstrate through numerical studies that the proposed method can be much more powerful than the rank score test which is widely used in the quantile regression literature. Supplementary materials for this article are available online.


INTRODUCTION
Since the seminal work of Koenker and Bassett (1978), linear quantile regression has been extensively studied in the statistics and econometrics community. Quantile regression has also been emerged as an essential statistical tool in many scientific fields due to its natural association with a variety of interesting practical problems; see, for example, Fitzenberger, Koenker, and Machado (2002), Huang et al. (2008), and Baur, Dimpfl, and Jung (2012).
Fully nonparametric quantile regression (Chaudhuri 1991) is flexible and yet not effective when the covariates are multidimensional. In the presence of multi-dimensional covariates, varying coefficient quantile regression fills naturally in the gap between the interpretability of linear quantile regression and the flexibility of nonparametric quantile regression. The varying coefficient regression model was first introduced by Hastie and Tibshirani (1993) and comprehensively reviewed by Fan and Zhang (2008). Some important works along the line of varying coefficient quantile regression include Kim (2007), Wang, Zhu, and Zhou (2009), Cai and Xu (2009), and Lee, Mammen, and Park (2013). The varying coefficient quantile regression assumes that where Q τ (· | x, T ) represents the τ th conditional quantile of the response Y given x and T, x is the m dimensional covariate vector, T is a random variable and β 0τ (t) = {β 1,0τ (t), . . . , β m,0τ (t)} T is an unknown functional vector. To simplify notations, we omit the subscript τ from the function β 0τ (t) subsequently.
To estimate the varying coefficients in quantile regression model (1), Kim (2007) and Wang, Zhu, and Zhou (2009) used the regression splines to approximate the varying coefficients, and Cai and Xu (2009) considered the kernel method. Compared with parameter estimation, however, hypotheses testing has attracted less attention in quantile regression with varying coefficients. In the context of least-squares estimation, Fan and Zhang (2000) considered the hypotheses H 0 : β 0 (t) = β(t) versus H a : β 0 (t) = β(t), where β(t) is a given function, by constructing simultaneous confidence bands for coefficient curves. Fan, Zhang, and Zhang (2001) proposed a generalized likelihood ratio test which is applicable to nonparametric regression under the Gaussian error assumption. In a partially linear varying coefficient model with multi-dimensional covariates, Zhang and Wu (2012) used the kernel method to test the null hypotheses where C is a given m × (m − k) matrix with full rank, and c 0 is a prespecified (m − k)-vector, for 0 ≤ k < m. The null hypothesis (2) generalizes the hypothesis on contrasts in classical linear regression. With an appropriate choice of the matrix C, we are able to test the shifting effects between two coefficient curves, which is especially interesting when we compare groups at different levels. In a different context, such problems have been considered in Wei and He (2006). The least squares loss usually facilitates the theoretical development of asymptotic results due to explicit forms of the estimates. In this article, we consider the null hypothesis (2) for the varying coefficient quantile regression (1), when the vector c 0 is either known or unknown. We make the following three major contributions to the quantile regression literature.
1. We introduce a rank-reducible varying coefficient quantile regression, which includes the classical linear and full-rank varying coefficient quantile regression as special cases. We demonstrate through a case study that the rankreducible structure can capture local structures of the data, while the full rank model fail to produce a well-defined estimate due to data sparsity. 2. We use B-splines to approximate the varying coefficients, and derive the Bahadur representation of the resultant estimators for both the parametric and the nonparametric components of the rank-reducible varying coefficient model. The Bahadur representation facilitates subsequent statistical inferences. 3. We establish a novel connection between estimation of a rank-reducible structure and the null hypothesis on contrasts. We demonstrate through simulations that the proposed test is much more powerful than the usual rank score test under the varying coefficient quantile regression model.
The rest of this article is organized as follows. Section 2 motivates the rank-reducible varying coefficient quantile regression. We develop an estimation procedure to estimate the parameters in the rank-reducible quantile regression. An efficient algorithm is also provided. We give the large sample theory for the resultant estimates under different scenarios. In Section 3 we reveal that the null hypothesis H 0 implies a unidimensional structure of a transformed coefficient matrix of the B-spline basis functions. We then propose a test statistic to evaluate the unidimensional structure. We further demonstrate the usefulness of our proposal through simulations in Section 4 and an application in Section 5. This article concludes with a brief discussion in Section 6. All technical proofs are relegated in an online supplementary file.

ESTIMATION UNDER THE NULL HYPOTHESIS
In the classical testing methods, we may fit model (1), and consider the hypothesis test by using the distributions of the (standardized) estimates under the null hypothesis; or we can fit both the full and the rank-reducible model under the null hypothesis that H 0 : C T β 0 (t) = c 0 , then compare these two models. In both cases, we need to estimate the full model. By contrast, under certain hypotheses as considered by Kim (2007) and Wang, Zhu, and Zhou (2009), rank score tests can be implemented more easily after some proper reparameterization. We usually prefer a model with less complexity to improve estimation efficiency, especially when the sample size is relatively small and hypothesis test is of our primary interest. In this article we shall consider an alternative strategy. By establishing a connection between a rank-reducible structure and the null hypothesis, we also study the asymptotic of the resultant estimators.
To be precise, we let β 1 (t) and β 2 (t) denote the vectors composed of the first k and the last m − k elements of β 0 (t), respectively. In addition, we let E 0 = 1 k , I k , where 1 k is the k-vector composed of ones and I k is the k × k identity matrix. We reparameterize β 1 (t) as where γ 0 (t) = γ 01 (t), . . . , γ 0k (t) T . Clearly, γ 0j (t) = β 0j (t) − 1, for j = 1, . . . , k. We further let C T 1 and C T 2 be the respective matrices composed of the first k and the last m − k columns of the matrix C T , namely, C T = (C T 1 , C T 2 ). Without loss of generality, we suppose that C 2 is invertible. Let Then under the null hypothesis (2), we obtain As a consequence, the null hypothesis (2) motivates us to consider a rank-reducible structure of coefficient curves as follows, where the matrix E 0 = (1 k , I k ) and γ 0 (t) = {γ 01 (t), . . . , γ 0k (t)} T is a vector of unobserved functions. In (4), A 0 is an unknown (m − k) × (k + 1) matrix if either C or c 0 in (2) is unknown; A 0 is fully specified only if both C and c 0 in (2) are known. Note that model (1) with (4) indeed includes the classical linear quantile regression and the usual full-rank varying coefficient quantile regression as special cases.
Thus, far we establish a connection between rank-reducible structure (4) and the null hypothesis (2). In this scenario, the parameters in model (1) can be estimated by minimizing the quantile objective function where ρ τ (u) = u{τ − I (u < 0)}. In the objective function, the curve vector γ (t) is infinite dimensional, so we use the B-spline approximations for estimation.

Approximation and Computation
Let κ n (t) = {π 1 (t), . . . , π κ n +p (t)} T be the normalized Bspline basis functions of order p with κ n quasi-uniform knots (Pena 1997). We use a spline B T κ n (t) to approximate the function vector γ (t) in (5), where B is a k × (κ n + p) matrix. Since the sum of normalized B-spline basis functions at any t within a specified interval is always equal to 1 (Pena 1997), we then have 1 T κ n +p κ n (t) = 1. Ignoring the approximation error of the B-splines, the objective function (5) can thus be written as where If a full model is considered, the problem can be solved efficiently because the structure (4) turns out to be In this case, we denote F = AB T , and minimize (6) with respect to the vector vec(F T ) because the following equation holds where vec(·) represents the vec operator and ⊗ is the Kronecker product. Furthermore, with a given set of knots for theB-splines, the above minimization problem is easy to solve by using the routine linear programming algorithm that is well studied in the quantile regression literature. We refer to Koenker (2005) for details. For a rank-reducible model, we numerically minimize (6) by an alternative regression procedure when the matrix A 0 in (4) is unknown. The function rq in R package quantreg can be used in computational implementation, and the method of Wang, Zhu, and Zhou (2009) is utilized for the choice of knots. Choose an m × (k + 1) matrix A (0) as the initial point. The computation procedure can be described in the following steps.
(S1) Suppose we obtain the matrix A (l) for l ≥ 0. Note that Given A = A (l) , we use the regular quantile regression procedure to minimize (6) with respect to the matrix B. Denote the estimated matrix as B (l) . (S2) Note that Given B = B (l) , we repeat the optimization procedure as in Step (S1) but with respect to A to obtain the estimated matrix A (l+1) . (S3) Repeat Steps (S1)-(S2) until a certain stopping criterion is satisfied. Denote the estimated matrices as A and B, respectively.
Remark 1. The objective function L n (A, B) defined in (6) is convex if a full rank structure (4) with k = m is used. It is also a strictly convex function as long as either A or B is known a priori. However, if a rank reducible structure is considered and both A and B have to be estimated from data, then L n (A, B) is not convex. Even if we write F = AB T and define the objective function as L * is convex only when we allow F to have full rank, that is, k = m. For a rank-reducible structure, we may try different initials and choose the solution that minimizes the objective function L n (A, B). Our limited experience shows that when the sample size is large, the solutions from the algorithm are all close to the global minimimer with very high probability.

Large Sample Properties
Suppose the support of the random variable T is [0, 1]. Let e i = Y i − x T i β 0 (T i ) and z i = κ n (T i ) ⊗ x i , and denote the conditional probability density of e i as f i (·) given x i and T i , for i = 1, . . . , n. We make the following assumptions.
where r is some integer greater than 2, κ n = O(n δ ) for (2r) −1 < δ < 3 −1 and the matrices A 0 and B 0 are of the special forms and E 0 is defined in (4).
Assumptions (A1) and (A2) are commonly used in the quantile regression literature, which facilitate the development of asymptotic results. Assumption (A4) ensures the sufficient linear approximation under the rank-reducible structure (4). We further assume κ n = O(n δ ) for (2r) −1 < δ < 3 −1 to limit the asymptotic bias between the true curves and the linear approximation. Assumptions (A5) and (A6) are used to obtain the Bahadur representation of the estimators, and these conditions are also considered by He and Shao (2000).
For convenience, we omit κ n from B 0κ n , but we shall bear in mind that the row dimension of this matrix depends on the number of knots κ n . Denote the matrices composed of the first and the last k columns of the matrix A 0 as A (1) 0 and A (2) 0 , respectively. (8), and θ be the estimate of θ 0 . We obtain the Bahadur representation of the estimator in the following theorem. where This representation will be used for testing the null hypothesis (2) in the following section.

HYPOTHESIS TESTING
In this section, we establish the connection between the null hypothesis (2) and special unidimensional structures of the coefficients matrix A 0 B T 0 under different circumstances. We then propose a statistic for testing this unidimensional structure accordingly.

Hypothesis Testing With an Unknown Vector c 0
In this section, we consider the null hypotheses (2) versus the alternative hypothesis H a : at least one of the components of the vector C T β 0 (t) is time-varying.
Our test is based on the fact that the sum of normalized Bspline bases at any given point is exactly equal to one. This property helps to establish the connection between the null hypothesis (2) and a unidimensional structure of the coefficients matrix A 0 B T 0 . Consider the (κ n + p) × (κ n + p) transformation matrix Then G κ n +p (t) = 1, π T 2 (t), . . . , π T κ n +p (t) T .
where the matrices A 0 and B 0 are defined in (8). Note that P 0 κ n +p (t) = ( P 0 G −1 ){G κ n +p (t)}. Thus, the constancy of C T β 0 (t) in (2) Actually, the matrix G − is composed of the last (κ n + p − 1) columns of the matrix G −1 .
Since the vector space R κ n +p is spanned by the columns of the matrix G −1 , the equation P 0 G − = 0 implies that each row of the matrix P 0 is proportional to the vector Thus, under the null hypothesis, the matrix P 0 has a unidimensional structure with the right singular vector d 0 . Furthermore, we have where the vector c 0 is given in (2). There are some methods proposed in literature for testing the ranks of matrices with fixed dimensions; see, for example, Gill (1992), Cragg and Donald (1996), Kleibergen and Paap (2006), among others. Feng and He (2009) considered a unidimensional structure test where the observed matrix contains independent vectors. However, their method is not applicable in this case because the asymptotic covariance structure of the matrix P is unclear, where Under the null hypothesis (2), the restricted forms (8) are needed for matrices A 0 and B 0 . Let z 1i and z 2i be vectors of the last (m − k) × (k + 1) and the first k × (κ n + p) elements of the vector z i , respectively, where the matrix D 0n is given in (10), and the matrix S κ n is defined in (11). We use ivec(·, d) to represent the inverse operation of vec(·), which constructs a matrix column by column with row dimension equal to d based on a given vector. The test statistic is We obtain the asymptotic distribution of the test statistic T (1) n as follows.
In practice, the vector x (1) i is not fully observed due to unknown parameter matrices A 0 , B 0 , and probability densities f i . To estimate vector x (1) i , we replace the true parameters by the estimates A and B. Moreover, we estimate the value of probability densities f i at zero with the modified version of the method of Hendricks and Koenker (1992) bŷ where β τ * is the estimated parameter at quantile level τ * , h is the bandwidth, and is a small positive constant to avoid zero denominator. Throughout our numerical studies, we set to be the placing of floating point numbers in the R software, which is approximately 3.67 × 10 −11 . The estimator in (20) accounts for the issue that in finite samples the estimated quantiles may cross so that the upper quantiles may be estimated to be smaller than lower quantiles. A similar idea was adopted by Wang, Zhu, and Zhou (2009) to estimate density functions. We refer to Hendricks and Koenker (1992) for details on this estimation method. Replacing A 0 , B 0 and f i (0) with A, B and f i (0), respectively, in x (1) i will not affect the limiting distribution of the statistic T (1) n , as long as A, B, and f i (0) are all consistent estimators.

Remark 2. Implementing the test statistic T (1)
n involves estimating f i (0) which, however, involves a tuning parameter h, as given in (20). How to select an appropriate bandwidth h is an important issue. Our limited experience shows that, when the sample size is moderate or large, the performance of our proposed test relies less on the choice of the bandwidth h. However, when the sample size is small, estimating the density function f i (0) accurately is typically difficult. In the small sample scenario, a useful strategy to bypass this difficulty is through a resampling approach, which will be detailed in Section 4.

Hypothesis Testing With a Known Vector c 0
Sometimes, we may compare groups in the data analysis, and a specific vector c 0 is given, say c 0 = 0. As pointed out in the previous section, we have P 0 = c 0 d T 0 under the null hypothesis. We are interested in the alternative hypothesis H a : C T β(t) = c 0 .
We estimate the matrices A 0 and B 0 as described in Steps (S1)-(S3), and use the residual matrix P − c 0 d T 0 to construct the statistic where the matrix P is defined in (16), , and the matrices Z 1i and Z 2i are given in (19). We then have the following asymptotic result.
Theorem 3. Suppose the conditions of Theorem 1 hold. Under the null hypothesis (2), we have The limiting distribution of T (2) n is slightly different from that of T (1) n in Theorem 2 because c 0 is a known vector here.

SIMULATION STUDIES
In this section, we conduct simulations to compare the finite sample performance of the tests proposed in Section 3 and the rank score test of Wang, Zhu, and Zhou (2009) under the assumption of heterogeneous model errors. Two simulation studies are reported based on the full model structure (7) and the rank-reducible structure (4), respectively.

The First Study
In this study, we first consider three quantile levels τ = 0.25, 0.5, and 0.75 with the sample size n = 200, 400, and 1000. We independently generate 5000 Monte Carlo samples. Each sample is simulated from the following heteroscedastic model where the covariate T i is generated from the uniform distribution U (0, 1); the continuous variables X i1 and X i2 are generated from the exponential distribution exp(1) independently; the discrete variables X i3 and X i4 are composed of only ones and zeros with the relative ratio of 1/4; the random variable i follows the standard normal distribution. We set μ(t) = 10 + 10 sin(tπ/5), Under the alternative model, we generate datasets by setting μ(t) = 10 + 10 sin(tπ/5), α(t) = 1 + (10 − t) 2 /10, β(t) = −2 + 2 cos (2t + 5)π/5 , In this study, we use the full model structure (7) for estimation. Moreover, quadratic splines are considered in the Bspline approximation, with the knots selected by Wang, Zhu, and Zhou's (2009) approach. Recall that implementing our proposed tests requires estimating f i (0), the density function of model error at zero. We gave an estimate of f i (0) in (20), which however involves a tuning parameter h. To evaluate how the performance of our tests varies with the bandwidth h, we consider two bandwidths, h 1 = 1.5h opt and h 2 = 2.0h opt , at quantiles τ = 0.25, 0.50, and 0.75 for three different sample sizes n = 200, 400, and 1000, where h opt is the optimal bandwidth given by the function bandwidth.rq in the R package quantreg with the option hs = TRUE. When the sample size is small, estimating f i (0) accurately is very difficult. To bypass this difficulty, we resort to the wild bootstrap method of Feng, He, and Hu (2011). We propose to estimate the variance of the testing statistic given in (16) with 100 bootstrapped samples. We consider two sets of hypotheses. The first set is H 0 : γ (t) is parallel to η(t), versus H a : two curves are not parallel to each other and the second is H * 0 : γ (t) and η(t)are identical, versus H a : two curves are not identical. Table 1. Estimated Type I errors and powers at the nominal level of 5% for the first simulation study, where the null hypothesis H 0 is considered; "h 1 ," "h 2 ," "Boot," and "QRS" refer to our test proposed in Section 3.1 using different estimates of f i (0), the test based on the wild bootstrap, and the rank score test, respectively Null Alternative We use the statistic T (1) n given in (18) to test the first hypothesis and T (2) n given in (21) to test the hypotheses with c 0 = 0. By transforming the design matrix to be (X 1 , X 2 , X 3 + X 4 , X 4 ), the parallelity of curves γ (t) and η(t) is equivalent to the constancy of the curve η(t) − γ (t). It implies that the rank score test of Wang, Zhu, and Zhou (2009) can also be used.
In what follows, we first compare the test proposed in Section 3.1 with the rank score test of Wang, Zhu, and Zhou (2009). The simulation results are summarized in Table 1 when H 0 is considered and in Table 2 when H * 0 is considered. It can be seen from these two tables that, the Type I errors are well preserved for both tests at all three quantile levels. The standard errors of all the estimated Type I errors are smaller than 0.44%. The powers of two tests seem similar at the quantile level τ = 0.25. However, we obtain higher powers with the proposed test at higher quantile levels.
Since the proposed test tends to be aggressive while the rank score test seems conservative, which leads to different sizes in this study, we further plot the ROC curves to assess the performance of these two tests. The results for n = 400 in Figure 1 deliver similar messages as we observed in Table 1: At higher quantile levels, our proposed test has higher power than the rank score test.
We can further observe from Tables 1 and 2 that, as the sample size n increases, the performance of our tests depends less on the choice of the bandwidth. For example, when n = 200 and τ = 0.50, the Type I error of our test is 0.076 for h 1 and 0.048 for h 2 ; when n increases to 400, the Type I errors obtained from two different bandwidths get closer; when n = 1000, the Type I errors for both h 1 and h 2 are very close to the nominal level 0.05. When the sample size is small, say, when n = 200, the performance of our proposed method based on density estimation is poor, while the proposed method based on bootstrapped estimation is still promising.
The hypotheses are often of practical interest when we investigate the relationship between the responses and different levels of a categorical variable. Such an application will be reported in the following section. Under the null hypothesis, all the four curves α(t), β(t), γ (t), and η(t) can be expressed by some linear combinations of a certain function and a constant. Hence, a rank-reducible structure (4) is used for estimation with k = 2, and the quadratic splines are considered for approximation. We simulate 500 Monte Carlo samples to compare the performance of different tests. The results of Table 3 imply that the proposed tests may be aggressive, and the bootstrap method leads to more satisfactory results. The rank test is, however, too conservative. The ROC curve for n = 400 is also summarized in Figure 2, which indicates that the proposed method dramatically improves on the rank score test. The proposed test evaluates the deviation of the whole coefficient matrix away from a target unidimensional matrix, while the rank score test considers the residuals based on the product of the coefficient matrix and the B-spline basis, which may probably explain why our proposed test is more sensitive to nonconstancy than the rank score test. Figure 1. The ROC curves for the first simulation study with n = 400. The ROC curves are generated as follows. Given a nominal level α ∈ (0, 1), the horizontal axis stands for 1-specificity (Type I error) and the vertical axis stands for sensitivity (1 -Type II error). The solid line represents our proposed test and the dashed line represents the rank score test.

A CASE STUDY
In this section, we analyze a dataset collected from car dealers. The original dataset can be obtained from the website http://www.amstat.org/publications/jse/jse_data_archive.htm.
In this dataset, 428 new cars and trucks sold in 2004 are recorded with 19 variables, including type of vehicles, suggested retail, and invoice prices, engine size, and some other variables on size and fuel efficiency. The details on this dataset are discussed on page 104-123, volume 57, Kiplinger's Personal Finance. We remove data on the pickup trucks so that we focus on analysis of cars. Consequently, 404 records on cars are used for our analysis.
We are interested in the relationship between type of cars and suggested retail prices, and the relationship might be compli- Table 3. Estimated Type I errors and powers at the nominal level of 10% for the second simulation study; "Proposed" refers to the proposed test of Section 3.2 using the bandwidth 2h opt , where h opt is the optimal bandwidth given by the function bandwidth.rq in the R package quantreg with the option hs = TRUE; "Boot" and "QRS" are the wild bootstrap method and the rank score test, respectively  (Figure 3). To better explore such relationship, we introduce the following model where the responses Y i refer to the suggested retail prices, the dummy variables X i1 , X i2 , X i3 and X i4 are used to indicate type of each car, corresponding to sports car, sport utility vehicle (SUV), wagon, and minivan, respectively. The coefficients are varying on the engine size T. It is of our interests to check if there exists a consistent relationship between engine size and retail price across sports car, SUV, wagon, and minivan. This amounts to testing the hypotheses H 0 : α(T ), β(T ), γ (T ), η(T ) are parallel, versus H a : these curves are not parallel.
Two quantile levels τ = 0.5 and 0.75 are considered, and the R package quantreg is used for computation. In this study, we cannot estimate coefficient curves under the full model structure (7) since the R package reports the error message of a singular design matrix based on the spline bases. Thus, we used the rankreducible structure (4) with k = 2 and the proposed method of Section 3.1 for hypothesis testing. At each quantile level, 20 random starting points are used for the iterative estimation procedure (S1)-(S3) in Section 2.2. The obtained p-values are less than 0.001 at both quantile levels. For the cars with median price, the suggested retail price is affected differently by the engine size when different types of cars are sold. For more expensive cars, further analysis indicates that the effect of engine size on the prices of the wagon seems distinctive.

DISCUSSION
In this article, we introduced a rank-reducible structure in the varying coefficient quantile regression and explored its Figure 2. The ROC curves of the second simulation study with n = 400. The horizontal axis stands for 1-specificity (Type I error) and the vertical axis stands for sensitivity (1 -Type II error). The solid line represents our proposed test, the dashed line represents the rank score test, and the dotted line represents the bootstrap adjusted test.
connection with certain hypothesis on the quantile coefficient functions. The rank-reducible structure helps to capture the local structures of the data and avoids the computational difficulties in the estimation of a full model where the pseudo-design matrix is sometimes ill-conditioned.
We made use of the characteristics of normalized B-spline basis functions to propose an appropriate test. Our proposed test is more powerful than the rank score tests in quantile regression. We observe that, a direct use of the asymptotic distributions of the test statistics may lead to inflated Type I errors, but a bootstrap approach is found to be effective in our numerical studies.
The quantile regression is also closely related to the mean function by noting that E(Y |x, T ) = 1 0 Q τ (Y |x, T )dτ . Consequently, if there exist a common rank-reducible structure in quantile regressions for all quantile levels, there must exist rankreducible structure in the mean function as well. We remark here that our idea of constructing test statistics in the quantile regression context can be readily adapted to the mean regression context, though no details are pursued in the present article.