Smoothed and Corrected Score Approach to Censored Quantile Regression With Measurement Errors

Censored quantile regression is an important alternative to the Cox proportional hazards model in survival analysis. In contrast to the usual central covariate effects, quantile regression can effectively characterize the covariate effects at different quantiles of the survival time. When covariates are measured with errors, it is known that naively treating mismeasured covariates as error-free would result in estimation bias. Under censored quantile regression, we propose smoothed and corrected estimating equations to obtain consistent estimators. We establish consistency and asymptotic normality for the proposed estimators of quantile regression coefficients. Compared with the naive estimator, the proposed method can eliminate the estimation bias under various measurement error distributions and model error distributions. We conduct simulation studies to examine the finite-sample properties of the new method and apply it to a lung cancer study. Supplementary materials for this article are available online.


INTRODUCTION
Mean-based regression models have been extensively studied for randomly censored survival data. For example, the Cox (1972) proportional hazards model characterizes the hazard as a function of different covariates; and the accelerated failure time (AFT) model directly formulates linear regression between the logarithm of the failure time and covariates. However, neither the Cox nor the AFT model can differentiate the covariate effect at higher or lower quantiles of survival times, as they only provide the mean effect. In particular, the AFT model concerns only the mean regression, for which the estimation procedure is typically based on the least squares or rank methods (Prentice 1978;Buckley and James 1979;Ritov 1990;Tsiatis 1990;Wei, Ying, and Lin 1990;Lai and Ying 1991;and Jin et al. 2003). On the other hand, quantile regression provides a robust alternative to mean-based regression models. Under this framework, we can model the median or any other quantile of the outcome or survival time (Koenker and Bassett 1978;and Koenker 2005). Regression parameters are often estimated by minimizing a check function, and the corresponding variance estimates are typically obtained by resampling methods, such as bootstrap. When censoring times are assumed to be fixed and known, quantile regression has been extensively studied, particularly in the field of econometrics; for example, see Powell (1984), Buchinsky and Hahn (1998), Fitzenberger (1997), and Khan and Powell (2001). In survival analysis with random censoring, censored quantile regression (CQR) has been proposed and is gaining much popularity (Ying, Jung, and Wei 1995;Lindgren 1997;Yang 1999;Koenker and Geling 2001;Bang and Tsiatis 2002;Chernozhukov and Hong 2002;Portnoy 2003;Peng and Huang 2008;and Wang and Wang 2009).
In practice, covariates are often subject to measurement errors. The most common measurement error structure is W = Z + U, where W is the observed surrogate, Z is the true but unobserved covariate, and U is the random measurement error. For a comprehensive coverage of various measurement error models and inference procedures with mean-based regression, see Carroll et al. (2006). In the context of quantile regression with measurement errors, Brown (1982) examined median regression and described the difficulty involved in parameter estimation. He and Liang (2000) proposed root-n consistent estimators for linear and partially linear quantile regression models. Their method assumes that the random error in the response and the measurement errors in the covariates follow a spherical symmetric distribution. Wei and Carroll (2009) proposed a novel approach to quantile regression with measurement errors by using the derivative property of the quantile function when the same quantile regression structure is assumed for all the quantile levels. Recently, Wang, Stefanski, and Zhu (2012) developed a corrected-loss function for the smoothed check function, a substantial advance in this area. However, there is limited research on quantile regression with covariate measurement errors under censoring. Ma and Yin (2011) studied covariate measurement errors in CQR models based on the inverse probability weighting scheme, but their method also requires the spherically symmetric distribution. In this article, we study the issue of covariate measurement errors in quantile regression with randomly censored data. We propose a smoothed and corrected martingalebased estimating equation, consider grid-based estimates for the quantile regression coefficients, and establish the asymptotic properties of the proposed estimator by employing empirical process theory. Our proposed method allows an abundant class of distributions for the error in the response; for example, it could be light-or heavy-tailed, symmetric or asymmetric, or homoscedastic or heteroscedastic.
The rest of the article is organized as follows. In Section 2, we describe the CQR model with measurement errors, develop a corrected estimating equation based on a kernel smoothing approximation, and establish the asymptotic properties of the resultant estimator. Section 3 contains simulation studies for the evaluation of the finite sample performance of the proposed method. A dataset concerning lung cancer is analyzed in Section 4 and some concluding remarks are provided in Section 5. The assumptions that we imposed in the article were listed and discussed in the Appendix and the detailed proofs of theorems are deferred to the online supplementary material.

Model Specification
Let T denote the transformed failure time under a known monotone transformation, for example, the logarithm function. Let C denote the censoring time under the same transformation. Let Z be a p-vector of covariates, X = T ∧ C be the observed time, and = I (T ≤ C) be the censoring indicator, where a ∧ b is the minimum of a and b, and I (·) is the indicator function. Assume that T and C are conditionally independent given covariate Z.
For τ ∈ (0, 1), the conditional τ th quantile function of survival time T given covariate Z is defined as Q T (τ |Z) = inf{t: P (T ≤ t|Z) ≥ τ }. The quantile regression model associated with covariate Z has the form where β(τ ) is an unknown p-vector of regression coefficients, representing the effect of Z on the τ th quantile of the transformed survival time.
In reality, covariate Z may be measured with errors, so that we do not directly observe Z but its surrogate W. We assume the classical error structure where U is a p-variate random vector with mean 0 and covariance matrix . The case that some covariates are error-free is accommodated in our model by setting the relevant terms in to be zero. We further make the typical surrogacy assumption that (T , C) and W are conditionally independent given covariate Z. For ease of exposition, we assume to be known provisionally, since can easily be estimated with replicated observations or validation data.

Approximately Corrected Estimating Equation
We first introduce notation: F T (t|Z) = P (T ≤ t|Z), Following the argument in Fleming and Harrington (1991), it is easy to show that evaluated at β 0 (τ ), the true value of β(τ ), M(t) is a martingale process associated with the counting process N (t). Furthermore, because E{M(t)|Z} = 0 at β 0 (τ ) for any t, we have for τ ∈ (0, 1). Under model (2.1), after some algebraic manipulations, we obtain that Based on (2.2) and (2.3), when all Z i 's are observed, Peng and Huang (2008) proposed an estimating equation for β(τ ), (2.4) However, when the covariates Z i 's are measured with errors, naively treating mismeasured covariates to be error-free would cause estimation bias and thus lead to incorrect inference. In (2.4), because covariate Z i lies inside the indicator function, which is discontinuous, it is difficult to build up a consistent estimator when the surrogates W i 's, instead of Z i 's, are observed. To overcome the challenge caused by discontinuity and measurement errors, we propose an approximately corrected estimating equation for (2.4) and further establish the asymptotic properties of the resultant estimators for regression quantile coefficients.
We denote the observed data O = (X, , W) and let U = (X, , Z). In view of the estimating equation (2.4), if we can find a function g * {O, β(τ )} such that for τ ∈ (0, 1), we can then follow the corrected score argument (Stefanski 1989;Nakamura 1990) to construct an unbiased estimating equation as However, the cusp in the indicator function makes it difficult to find such a function. On the other hand, Horowitz (1992Horowitz ( , 1998 proposed the smoothed maximum score estimator for the binary response model and the smoothed least absolute deviation for median regression. Motivated by the smoothing scheme, we circumvent the discontinuity stemming from the indicator function and consider a smoothing function that approaches the indicator function as n → ∞. More specifically, assume that a smooth function K(·) satisfies lim x→−∞ K(x) = 0 and lim x→∞ K(x) = 1. If we consider a positive scale parameter h n that converges to zero as sample size n → ∞, K(x/ h n ) may provide an adequate approximation to I (x > 0) as n → ∞, where h n behaves like the bandwidth in the kernel smoothing.
If we can find a function G{O, β(τ ); h n } such that and conclude that E[g{O, β(τ ); h n }|U] is close to ZI {X > Z T β(τ )}. As a result, we can construct an approximately corrected estimating equation Since it is challenging to obtain the functional solution to the integral Equation (2.6), we follow Peng and Huang (2008) to develop a grid-based estimation procedure for β 0 (·). Assume that τ U is a deterministic constant in (0, 1) subject to certain identifiability constraints, for example, Assumption A4-(iii) in the Appendix. Due to the inherent nonidentifiability of the regression quantiles beyond the level τ U , we confine estimation of β 0 (τ ) for τ ∈ (0, τ U ]. We denote a partition over the interval [0, τ U ] by S q n = {0 ≡ τ 0 < τ 1 < · · · < τ q n ≡ τ U }, where the number of grid points q n depends on n. We consider an estimator of β 0 (τ ) that is a right-continuous piecewise constant function and jumps only at grid points in S q n . Noting that Z T β 0 (τ 0 ) = −∞, we intuitively set g{O, β(τ 0 ); h n } = W. For a given h n , employing the Newton-Raphson algorithm, the estimates β(τ j ), j = 1, . . . , q n , can be obtained sequentially by solving (2.7)

Laplace and Normal Measurement Errors
Apparently, it is crucial to find the function G such that (2.5) holds. For illustration, we construct G when the measurement errors follow a multivariate Laplace and a multivariate normal distribution, respectively. Wang, Stefanski, and Zhu (2012) also considered these two types of measurement errors, as Laplace distributions are more heavy-tailed than normal distributions, and both are widely used in practice.
Assume that U is a p-variate Laplace distributed random vector with mean 0 and covariance matrix , denoted by U ∼ L p (0, ), whose characteristic function is given by ϕ(t) = 1/(1 + 0.5t T t) for t ∈ R p (Kotz, Kozubowski, and Podgorski 2001). Thus, where (τ ) = X − W T β(τ ). Following the work of Hong and Tamer (2003) and Wang, Stefanski, and Zhu (2012), we have where K (j ) (x) = d j K(x)/dx j for j = 1, 2, 3, 4. It is easy to show that G L {O, β(τ ); h n } satisfies (2.5). Therefore, After plugging g L {O, β(τ ); h n } in (2.7), we can solve for β(τ ). We consider a more common case that U is a p-variate normal random vector with mean 0 and covariance matrix , that is, Motivated by Stefanski (1989) and Wang, Stefanski, and Zhu (2012), we take the objective function G N {O, β(τ ); h n } to be Consequently, g N {O, β(τ ); h n } can be obtained by taking the derivative of G N {O, β(τ ); h n }. Although we can construct the approximately corrected estimating equation as (2.6) and theoretically define an estimator based on the resultant grid-based solution for β 0 (·), it is infeasible to solve the equation because G N involves an infinite series. Following the recommendation of Stefanski (1989), we keep the first two summands in G N as an approximation, which is found to be adequate in our simulation studies. More interestingly, using the first two summands leads to exactly the same form of the approximately corrected estimating equation as that in the Laplace measurement error model.

Asymptotic Properties
Denote a n = max 1≤j ≤q n |τ j − τ j −1 |, the maximum distance between two adjacent points belonging to S q n . The asymptotic properties of the estimator β(τ ), which solves (2.7), are summarized in the following two theorems.
Theorem 1. Under Assumptions A1-A4 in the Appendix, if a n = o(1), then sup τ ∈[ν,τ U ] β(τ ) − β 0 (τ ) → 0 in probability for any ν ∈ (0, τ U ] as n → ∞. Theorem 2. Under Assumptions A1-A5 in the Appendix, if a n = o(n −1/2 ), then n 1/2 { β(τ ) − β 0 (τ )} converges weakly to a mean zero Gaussian random field over τ ∈ [ν, τ U ] for any ν ∈ (0, τ U ] as n → ∞.  Both the consistency and weak convergence of the proposed estimator only hold for quantile levels bounded away from zero due to the data sparsity when τ is close to zero. A much finer partition with a step size of order o(n −1/2 ) is required to establish the weak convergence property. The proofs of both theorems rely heavily on empirical process theory, which are provided in the supplementary material.
It is crucial to select the smoothing parameter h n . Without loss of generality, assume that Z includes the intercept as its first element. Noting that E[g{O, β(τ ); h n }|U] is close to  for the martingale M{Z T β(τ )}, whereḡ 1 and g 1 are the first elements ofḡ and g, respectively. In practice, we recommend a d-fold cross-validation method to choose h n . We randomly divide the data into d nonoverlapping and approximately equalsized subgroups. For the jth subgroup D j , we fit the proposed procedure using the data excluding D j , denoted by D (−j ) , and calculate the loss function where |D j | denotes the cardinality of the set D j , W o i denotes the error-free elements of Z i , and W o i ≤ w o means every entry of W o i is not larger than the counterpart of w o . The loss function is based on a cumulative sum of martingale residuals, with further correction on measurement errors. Here, β (−j ) (τ ) for τ ∈ [0, τ U ] is obtained using the proposed procedure on the data D (−j ) . Finally, we select the bandwidth by minimizing the total loss L(h n ) = d j =1 L j (h n ).

SIMULATION STUDIES
We conducted extensive simulation studies to assess the performance of the proposed method with finite samples. We generated survival timeT from the log-transformed linear model with heteroscedastic errors, where the model error was from the standard normal distribution, and Z was generated from the uniform distribution, Unif(0, √ 12). The corresponding CQR model (2.1) given Z = (1, Z) T takes the form of where T = logT , β 0 (τ ) = −0.5 + Q (τ ), β 1 (τ ) = 1 + 0.2Q (τ ), and Q (τ ) is the τ th quantile of . We further assumed that Z was measured with errors in the form of W = Z + U , where U is the measurement error and W is the surrogate of Z. We generated the measurement errors from three different distributions, respectively; that is, Laplace: U ∼ L 1 (0, 0.5 2 ), normal: U ∼ N (0, 0.5 2 ), and uniform: U ∼ Unif(− √ 3/2, √ 3/2). These choices of measurement error distributions correspond to a signal-to-noise ratio of 0.8. The censoring time C, dependent on Z, was generated from Unif(c 1 , c 2 ) if Z < √ 12/2 and from Unif(c 1 + 1, c 2 ) otherwise. For each scenario, c 1 and c 2 were chosen to yield a censoring rate of around 20%. Note that although the proposed method is developed to handle the Laplace or normal measurement errors, we also considered uniform measurement errors to examine the robustness of our approach. We chose the bandwidth h n = 1, while sensitivity analysis with different values of h n is given at the end of this section. We set the smoothing function K(·) as the standard normal distribution function, and adopted an equally spaced grid over interval the measurement error as Laplace, was obtained through the Newton-Raphson algorithm by taking the naive estimator as the initial value. We set sample size n = 200, and simulated 500 replicated datasets under each configuration. Following the convention in quantile regression, we used bootstrap with 200 bootstrap samples to obtain the variances of the parameter estimates.
In Table 1, the column labeled "Est" is the median of the estimates, "SE" is the rescaled (i.e., multiplied by −1 (0.75), where (·) is the standard normal cumulative distribution function) median absolute deviation of the estimates, which is a robust estimate for the standard error (van der Vaart 1998, Example 21.11), "ESE" is the average of the bootstrap rescaled median absolute deviation, "CP" is the coverage probability of 95% confidence intervals. With sample size n = 200, the proposed estimation method performs reasonably well under the standard normal distribution for the model error , coupled with three different distributions for the measurement error U. The biases are essentially negligible except for those of the lower quantile levels near 0.2 due to sparse event information observed at the initial follow-up time. The estimated standard errors using the bootstrap method agree well with the sampling standard errors, and the coverage probabilities of 95% confidence intervals are around the nominal level. We specifically point out that the performance is similar in all three measurement error cases, even though strictly speaking, our implementation here is only valid for Laplace measurement errors and serves as an approximation for normal measurement errors.
We also explored different distributions for the model error ; for example, an extreme value distribution with the cumulative distribution function F (x) = 1 − exp(−e x ) and Student's t distribution with two degrees of freedom, while keeping the rest of the data generation scheme the same as before. The corresponding simulation results are respectively presented in Tables 2 and 3, from which we can draw similar conclusions. When the sample size is small, some nonconvergent cases might be encountered using the Newton-Raphson algorithm. Often, the nonconvergent issue would disappear with a larger sample size. An alternative solution is to minimize the L 2 -norm of the estimating function that would bring the estimating equation value as close to zero as possible. More detailed discussions on numerical issues are given in the supplementary material.
To evaluate the overall performance of the proposed method as well as its comparison with the naive method, we present the biases of the estimated quantile intercept and slope coefficients across τ ∈ [0.2, 0.78] under different model error and covariate measurement error distributions in Figures 1 and 2, respectively. It can be seen that the proposed method can effectively correct the biases caused by measurement errors, whereas the naive method indeed produces serious biases, especially for the quantile slope coefficients, which are typically of more interest in practice. Moreover, Figures 3 and 4   squared errors (MSEs) of the estimated quantile intercept and slope coefficients, respectively. The MSEs under the proposed method are much smaller than those under the naive method, which further demonstrates that the proposed method is a viable approach to CQR with measurement errors.
When is unknown, it may be estimated from replicated data, for which case Table 4 shows that the proposed method also performs well. Obviously, when the censoring rate becomes heavier, the range of estimable quantile levels shrinks. It is evident from Table 5 that the performance of the proposed method is satisfactory even with a censoring rate of 50%. Furthermore, when the symmetric Laplace measurement error is misspecified as an asymmetric distribution, for example, U ∼ Exp(1/λ) − λ with λ = 0.5, the conclusions remains the same.
We investigated the sensitivity of the proposed method to the smoothing parameter h n when the data were generated from the log-transformed CQR model with heteroscedastic model errors for ∼ N (0, 1) and covariate measurement errors U ∼ N (0, 0.5 2 ). As shown in Figure 5, the biases and MSEs vary slightly with different values of h n , demonstrating the estimation stability and, more strikingly, both of them are always much smaller than those from the naive method. We also explored the situation where multiple covariates are subject to measurement errors while others are measured precisely. For normal measurement errors, we conducted simulations to  compare our infinite series correction function with the integral correction function proposed by Wang, Stefanski, and Zhu (2012). In addition, we examined the simulation and extrapolation (SIMEX) method in He, Yi, and Xiong (2007) under the AFT model, and the simulation results in the supplemental material demonstrate the comparability of our proposed method with other alternatives.

APPLICATION
As an illustration, we applied the proposed estimation and inference procedure for CQR with measurement errors to a lung cancer study. This dataset contains 280 lung cancer patients, whose survival times were recorded with a censoring rate of 64.3%. One of the main objectives of this study was to assess the association of patient survival with certain biomarker expression in the tumor cell cytoplasm. The reading of the biomarker expression was performed by pathologists and could be subjective. As a result, the readings of the biomarker expression for each patient were considered imprecise measurements. To reduce the possible subjectivity of the evaluation, for some patients, two readings of the biomarker expression were provided by two different pathologists (replicates). However, neither of the two measurements of biomarker expression can be considered precise. Our main interest lies in investigating the potential of the biomarker as a new prognostic marker and therapeutic Table 4. Simulation results for the log-transformed censored quantile regression with the heteroscedastic model error for ∼ N (0, 1) and covariate measurement error U ∼ N (0, 0.5 2 ), where the standard error of the measurement error is estimated based on five replications target for lung cancer. Other confounding covariates of interest include tumor histology (there were 61% of patients with adenocarcinoma coded as 1, and 39% squamous cell carcinoma coded as 0), age (ranging from 34 to 90 years with mean 66 years), and sex (52% female coded as 1, 48% male coded as 0). In our analysis, we standardized the patients' ages by subtracting their mean and dividing their standard deviation. Half of the patients in the dataset had duplicated readings of the biomarker expression and the averaged value of the two expression readings was considered as the surrogate variable in our analysis. Based on the duplicates, we were able to calculate the variance of the measurement error. We selected the smoothing parameter h n through the 10-fold cross-validation procedure proposed in Section 2 and obtained the optimal h n = 1.88. Figure 6 displays the proposed quantile regression estimates of covariate effects and the corresponding 95% pointwise confidence intervals for τ ∈ [0.1, 0.5] on the basis of 200 bootstrap samples. As the censoring rate is high, we can only estimate regression quantiles up to τ U = 0.5. We observe that in general patients with a tumor histology of adenocarcinoma had a significantly better survival rate than those with squamous cell carcinoma, and younger patients could be expected to live longer at a lower risk of death. We did not find any significant covariate effects for patients' sex on their survival for all of the considered regression quantiles. There was no significant effect of the biomarker expression detected on the survival for the regression quantiles that we considered. However, there was a trend that a lower level of biomarker expression tended to be associated with a longer survival time, which nev-   (2004) proposed a first-order bias correction method for the Cox proportional hazards model with covariate measurement errors. For comparison, we also analyzed the lung cancer data using the method of Li and Ryan (2004) as well as the SIMEX method of He, Yi, and Xiong (2007). The corresponding results are summarized in Table 7, from which we can see that patients with a tumor histology of adenocarcinoma or younger patients could be expected to experience significantly longer survivals whereas patients' sex and biomarker expression were not significantly associated with their survival times. These results in general agree with those drawn by the smoothed and corrected method for CQR. Nevertheless, the Cox model in Li and Ryan (2004) or the AFT model in He, Yi, and Xiong (2007) cannot provide the dynamic covariate effects as the quantile level varies.
For model checking, we consider the cumulative residuals over the precisely measured covariates, Table 7. Analysis results of the lung cancer data using the first-order bias correction method for the Cox proportional hazards model (Li and Ryan 2004) and the simulation-extrapolation method for the accelerated failure time (AFT) model (He, Yi, and Xiong 2007)   where W o = (Histology, Age, Sex) T represent the error-free covariates excluding the biomarker expression. The null distribution of T (w o , τ ) can be approximated by the zero-mean process

REMARKS
We have proposed a corrected estimating equation approach to CQR models for survival data when covariates are measured with errors. Using a smooth function to approximate the indicator function, the resultant estimating function is smoothed, and thus conventional iterative root-finding procedures, such as the Newton-Raphson algorithm, can be applied (Wang, Stefanski, and Zhu 2012). We have established the asymptotic consistency and weak convergence of the proposed estimator through  Figure 6. Estimated covariate effects and the corresponding 95% pointwise confidence intervals under the proposed measurement error quantile regression (solid lines) and the naive method (dashed lines) for the lung cancer data. modern empirical process techniques. Numerical results show that the proposed method is promising in terms of correcting the bias arising from covariate measurement errors, whereas the naive method typically produces biased estimates.
For variance estimation, we also explored directly using the sandwich-type variance estimator based on the smoothed estimating Equation (2.6), but the resulting coverage probability is found to be generally over the nominal level. A more interesting resampling method, known as the Markov chain marginal bootstrap, can be tailored for variance estimation in quantile regression (Kocherginsky, He, and Mu 2005). When the covariates are of high dimension, particularly for those mismeasured ones, estimation could be difficult, whereas some regularization methods may potentially be incorporated into the estimation procedure to alleviate the instability caused by high dimensionality.
Identifiability is an inherent and subtle issue in CQR. Regression quantiles with τ close to 1 may not be identifiable due to the lack of event information in the upper tail. Theoretically, τ U should satisfy the identifiability Assumption A4-(iii) in the Appendix. In practical implementation, we first set τ U to be close to one minus the censoring rate, and then select the final τ U in an adaptive manner as follows. If all the regression quantiles over [ν, τ U ] can be estimated, we increase τ U by some small step size, for example, 0.05 or 0.1; otherwise, we decrease τ U slightly. Through this trial-and-error process, we can push τ U to the largest value at which the model parameters can be identified. Similarly, ν could also be chosen in such an adaptive way.
The proposed method requires the global linearity assumption as in Portnoy (2003), Peng and Huang (2008), and Wei and Carroll (2009); that is, to estimate the τ th conditional regression quantile, it is necessary to assume that the conditional functionals at all the lower quantiles are also in the linear form. When the linearity assumption holds only at one specific quantile level τ of interest, research along the work of Wang and Wang (2009) is warranted.
Assumption A1 holds if K(·) is the standard normal cumulative distribution function. Apparently, F X, =1 (t|z) and F X (t|z) satisfy a similar boundedness condition as Assumption A2, which is a standard assumption in survival analysis. Assumptions A3 and A4 are commonly used in CQR models (Peng and Huang 2009). Assumption A5 essentially imposes a higher convergence rate of the smoothing parameter h n compared with Assumption A1-(iii). If we take K(·) to be the standard normal cumulative distribution function, Assumption A5 holds for both the Laplace and the normal measurement errors. This is because the exponential part can be factored out for any order of derivatives with respect to K(·) and the component-wise continuity follows directly.