Parametric inference for quantile event times with adjustment for covariates on competing risks data

ABSTRACT We propose parametric inferences for quantile event times with adjustment for covariates on competing risks data. We develop parametric quantile inferences using parametric regression modeling of the cumulative incidence function from the cause-specific hazard and direct approaches. Maximum likelihood inferences are developed for estimation of the cumulative incidence function and quantiles. We develop the construction of parametric confidence intervals for quantiles. Simulation studies show that the proposed methods perform well. We illustrate the methods using early stage breast cancer data.


Introduction
The median survival time has been frequently used in medical research as a measure for summarizing the survival curve. Methods for constructing confidence intervals for the median survival time have been extensively studied [5,10,16,26,28]. However, such quantile analyses have not been widely investigated in the analysis of competing risks. These analyses can be carried out using the cumulative incidence function. Peng and Fine [24] proposed nonparametric quantile inferences using the cumulative incidence function. Lee and Fine [22] proposed parametric quantile inferences based on the cumulative incidence function, along with simplified procedures for the nonparametric quantile inference. Jeong and Fine [20] and Jeong [17] proposed nonparametric and parametric inferences for the quantile residual lifetime in the presence of competing risks, respectively. When covariate information is available, Lee and Han [23] proposed covariate-adjusted quantile inference using semiparametric estimation of the cumulative incidence function under the cause-specific proportional hazards model. In this paper, we propose parametric quantile inferences based on parametric regression modeling of the cumulative incidence function, which would provide a tool for summarizing the cumulative incidence curve with adjustment for covariates.
In the analysis of competing risks, one may be interested in predicting the probability of experiencing a particular type of failure by time t in the presence of competing risks. The probability is called by the cumulative incidence function defined by where T is the time to failure, ∈ {1, . . . , K} is the cause of failure, and Z is a vector of covariates. The cause-specific hazard function for cause k at time t conditional on covariates Z is defined by The cumulative incidence function can be expressed as a function of all cause-specific hazard functions as where S(t; Z) = Pr(T > t | Z) is the overall survival function conditional on covariates Z. Employing a general notation of the quantile function, the pth quantile of F k (t; Z) is defined by The pth quantile of F k (t; Z) can be estimated byF −1 k (p; Z) = inf{t :F k (t; Z) ≥ p}. For parametric quantile inferences, we consider the cause-specific hazard approach [25] and direct approach [19]. For the cause-specific hazard approach, we consider the proportional hazards model [8] for the cause-specific hazard function and specify a parametric model for a baseline hazard function. We illustrate the approach using parameterization of a baseline hazard function via the Weibull distribution. In Supplementary Materials, we illustrate the cause-specific hazard approach using the exponential and log logistic baseline hazard functions. For the direct approach, we consider a transformation regression model [11,12] for direct regression modeling of the cumulative incidence function and specify a parametric model for a baseline function. Jeong and Fine [19] proposed a parametric regression model for the cumulative incidence function via the Gompertz [14] distribution. Jeong and Fine [18] and Cheng [7] discussed direct modeling of the cumulative incidence function using the conditional Weibull model. Shi et al. [27] proposed a parametric regression model for the cumulative incidence function using a modified three-parameter logistic function [7]. We illustrate the direct approach using parameterization of a baseline function via the Gompertz distribution. In Supplementary Materials, we illustrate the direct approach by specifying conditional Weibull model and modified logistic function for a baseline function.
Brookmeyer and Crowley [5] proposed a nonparametric confidence interval for the median survival time by inverting a test statistic as whereŜ(t) is the Kaplan-Meier estimator [21], var(Ŝ(t)) is the Greenwood variance estimator ofŜ(t) [15], and χ 2 1,α is the 100α upper percentile of the chi-square distribution with 1 degree of freedom, that is, Pr(χ 2 1 > χ 2 1,α ) = α. We use this inversion method to construct parametric confidence intervals for quantile event times with adjustment for covariates. If a closed-form of a quantile event time exists, we derive the asymptotic distribution and variance estimator of a quantile estimator using the delta method. In this case, we construct a Wald confidence interval for a quantile event time. We discuss this method both for the cause-specific hazard approach and for the direct approach.
We evaluate the performance of the parametric quantile inferences in the simulation studies. We illustrate the practical utility of our methods with an analysis of early stage breast cancer data.

Inference procedures
In this paper, we consider only two causes of failure, ∈ {1, 2}, where the cause of interest is cause 1 and other competing risks are combined into cause 2. Define X = min(T, C), where T and C are the failure time and censoring time, respectively. Let Z be a p × 1 vector of time-independent covariates. We assume that the censoring time C is independent of (T, ) conditional on Z. Let δ = I(T ≤ C) , where I(·) is the indicator function. We observe Given Z i = z i , a likelihood function for competing risks data is given by where f k (x; z) = dF k (x; z)/dx is the density function of the cumulative incidence function F k (x; z) (k = 1, 2).

Cause-specific hazard approach
We propose parametric quantile inferences from regression modeling of the cumulative incidence function via parameterization of the cause-specific hazard functions. The likelihood function (1) can be expressed in terms of the cause-specific hazard func- du is the cumulative cause-specific hazard function for cause k. The likelihood function (1) can be written as The proportional hazards model [8] is commonly used for regression modeling of the cause-specific hazard function as where λ 0k (t) is an unspecified baseline hazard function for cause k and β k is a p × 1 parameter vector for cause k. For parametric inference, we specify a parametric model for the baseline hazard function λ 0k (t). Let λ 0k (t; k ) be a parametric model specified for λ 0k (t), where k is a parameter vector. The fully parameterized likelihood function under the model (2) is given by du is the parameterization of the baseline cumulative hazard function for cause k. The parameter can be estimated by maximizing the likelihood function with respect to , which yields the maximum likelihood estimator (MLE)ˆ . Under the model (2), the parameterization of the cumulative incidence function for cause k conditional on Z = z is given by which can be estimated by replacing with the MLEˆ aŝ The pth quantile of F k (t; , z) can be estimated bŷ The consistency and asymptotic normality of the MLE are derived under the regularity conditions [2,3]. The asymptotic variance-covariance matrix ofˆ is obtained by the inverse of the observed information matrix evaluated at the MLE, I −1 (ˆ ). By the delta method, where σ 2 (t; , z) can be consistently estimated bŷ and ∂F k (t; , z)/∂ is a vector of the first derivatives of F k (t; , z) with respect to . Then Suppose that we are interested in testing a null hypothesis H 0 : F k (t; , z) = p. Under the null hypothesis H 0 , Following Brookmeyer and Crowley [5], an asymptotic 100(1 − α) percentile confidence interval for F −1 k (p; , z) can be constructed by inverting the test statistic as We will refer to this as a parametric confidence interval for the pth quantile from the causespecific hazard approach.

Parametric quantile inferences via the weibull baseline hazard function
The Weibull distribution has been widely used for parametric regression modeling of lifetime data. We illustrate the parametric quantile inference from the cause-specific hazard approach when the baseline hazard function λ 0k (t) in the model (2) is parameterized by that of a Weibull distribution as λ 0k (t; α k , . The cumulative incidence function for cause k under the model (2) with the Weibull baseline hazard function is given by , z) only exists when the two shape parameters are identical, that is, α 1 = α 2 = α. Under the common shape parameter assumption, the cumulative incidence function for cause k is given by where = (α, θ 1 , θ 2 , β 1 , β 2 ) with the common shape parameter α. The parametric estimator of F k (t; , z) is obtained by replacing with the MLEˆ . The variance ofF k (t;ˆ , z) is estimated by the delta method as The pth quantile of F k (t; , z) is given by The parametric estimator of F −1 k (p; , z) is obtained by replacing with the MLEˆ . A 95% parametric confidence interval for F −1 k (p; , z) is given by Since a closed-form of F −1 k (p; , z) exists, a Wald confidence interval for F −1 k (p; , z) can be constructed. By the delta method, the variance of F −1 k (p; , z) can be consistently estimated by We refer to this as a parametric Wald confidence interval for the pth quantile from the cause-specific hazard approach. The exponential and log logistic distributions can be used for the parameterization of the baseline hazard function λ 0k (t) in the model (2). The details of the parametric quantile inferences via the exponential and log logistic baseline hazard functions can be found in Appendix A of Supplmentary Materials.

Direct approach
We propose parametric quantile inferences from direct parametric regression modeling of the cumulative incidence function.
The likelihood function (1) can be expressed in terms of the cumulative incidence function as Fine and Gray [12] and Fine [11] considered a transformation regression model for direct regression modeling of the cumulative incidence function as where g k (·) is a known increasing function, h k (t) is a completely unspecified, invertible, and monotone increasing function, and β k is a p × 1 parameter vector for cause k. The link  (4), the parameterization of the cumulative incidence function for cause k conditional on Z = z is given by The fully parameterized likelihood function under the model (4) is given by The parameter can be estimated by maximizing the likelihood function with respect to . The parametric estimator of F k (t; k , z) can be obtained by replacing k with the MLEˆ k aŝ The pth quantile of F k (t; k , z) can be estimated bŷ The consistency and asymptotic normality of the MLE are derived under the regularity conditions [2,3]. By the delta method, where σ 2 (t; k , z) can be consistently estimated bŷ with respect to k , and var(ˆ k ) is a submatrix corresponding to a variance-covariance matrix ofˆ k from the inverse of the observed information matrix evaluated at the MLE. Then, Under a null hypothesis H 0 : Following Brookmeyer and Crowley [5], we obtain an asymptotic 100(1 − α) percentile confidence interval for F −1 k (p; k , z) as We will refer to this as a parametric confidence interval for the pth quantile from the direct approach.

Parametric quantile inferences via the gompertz baseline function
Jeong and Fine [19] proposed direct parametric regression modeling of the cumulative incidence function using a generalized odds rate model [9] with the Gompertz [14] baseline distribution. We illustrate the parametric quantile inference from the direct approach when the baseline function h k (t) in the model (4) is specified by the log cumulative hazard function of a Gompertz distribution as in [19] } and the Gompertz baseline function is given by The pth quantile of F k (t; k , z) is given by The parametric estimator of F −1 k (p; k , z) is obtained by replacing k by the MLEˆ k . A 95% parametric confidence interval for F −1 k (p; k , z) is given by From the closed-form of F −1 k (p; k , z), a Wald confidence interval for F −1 k (p; k , z) can be constructed. The variance of F −1 k (p; k , z) can be consistently estimated by the delta method as where ∂F −1 k (p; k , z)/∂ k is a vector of the first derivatives of F −1 k (p; k , z) with respect to k . A 95% Wald confidence interval for F −1 k (p; k , z) is given bŷ We refer to this as a parametric Wald confidence interval for the pth quantile from the direct approach.
As an alternative to the Gompertz baseline function, the conditional Weibull model [7,18] can be specified for the baseline function h k (t) in the model (4). Shi et al. [27] proposed direct regression modeling of the cumulative incidence function assuming a generalized odds rate model [9] and specifying a modified three-parameter logistic function [7] for the baseline function. A complementary log-log function of a modified three-parameter logistic function can be specified for the baseline function h k (t) in the model (4). The details of the parametric quantile inferences via the conditional Weibull and modified logistic baseline functions can be found in Appendix B of Supplmentary Materials.

Numerical studies
We conducted simulation studies to evaluate the performance of the parametric quantile inferences. In the first scenario, we evaluated the performance of the parametric quantile inference from the cause-specific hazard approach. The second scenario evaluates the performance of the parametric quantile inference from the direct approach. The third scenario explores the performance of the parametric quantile inference when a parametric model is misspecified. We conducted 1000 simulations with sample size 400.
In the first scenario, we assumed that the baseline hazard functions for causes 1 and 2 are that of a Weibull distribution as λ 0k (t) = (α/θ k )(t/θ k ) α−1 (k = 1, 2) with the common shape parameter α. A covariate Z was generated from the standard normal distribution. The cause-specific hazard function for cause k under the proportional hazards model is 2, 3, 1, 1). The censoring time C was generated from a uniform distribution U(0, c 1 ), where c 1 is determined by the prespecified censoring proportions. Under this scenario, we had 69% failures from cause 1 and 31% failures from cause 2 in the absence of censoring. We fitted the cause-specific proportional hazards model with the Weibull baseline hazard function to estimate F −1 1 (p; , Z = 0.3) in (3) with = (α, θ 1 , θ 2 , β 1 , β 2 ) = (2, 2, 3, 1, 1). Table 1 shows the biases, empirical variances, averages of the variance estimates (E( var)) of the 0.25th and 0.5th quantile estimates conditional on Z = 0.3, empirical coverage probabilities, and the median widths of 95% confidence intervals for F −1 1 (p; , Z = 0.3) with p = 0.25 and 0.5 from the cause-specific proportional hazards model with the Weibull baseline hazard function. The biases are small and the variance estimates agree with the empirical variances. The empirical coverage probabilities are close to the nominal level.
In the second scenario, the cumulative incidence functions for causes 1 and 2 are assumed to be F k (t; Z) = 1 − exp{− exp(h k (t)) exp(β k Z)} with the Gompertz baseline function h k (t) = log{γ k (exp(ρ k t) − 1)/ρ k } (k = 1, 2). A covariate Z was generated from a uniform distribution U (−3, 0). The true parameter is (γ 1 , ρ 1 , γ 2 , ρ 2 , β 1 , β 2 ) = (0.6, −0.6, −0.4 log(1 − e −1 ), −0.4, 0.2, 0.3). The censoring time C was generated from a uniform distribution U(0, c 2 ), where c 2 is determined by the prespecified censoring proportions. Under this scenario, we had 52% failures from cause 1 and 48% failures from cause 2 in the absence of censoring. We fitted the direct proportional hazards model with the Gompertz baseline function to estimate F −1 1 (p; 1 , Z = −2) in (5) with 1 = (γ 1 , ρ 1 , β 1 ) = (0.6, −0.6, 0.2). Table 2 shows the biases, empirical variances, averages of the variance estimates (E( var)) of the 0.1th and 0.3th quantile estimates conditional on   Finally, we conducted simulations to assess the performance of the parametric quantile inferences when both the cause-specific hazards and direct parametric regression models are misspecified. We assumed that the cumulative incidence functions for causes 1 and 2 are F k (t; A covariate Z was generated from a uniform distribution U (−3, 0). The true parameter is (β 1 , β 2 ) = (0.2, 0.3). Under this scenario, we had 66% failures from cause 1 and 34% failures from cause 2 in the absence of censoring. We fitted the cause-specific proportional hazards model with the Weibull baseline hazard function λ 0k (t; α, θ k ) = (α/θ k )(t/θ k ) α−1 (k = 1, 2) and the direct proportional hazards model with the Gompertz baseline function h k (t; γ k , ρ k ) = log{γ k (exp(ρ k t) − 1)/ρ k } (k = 1, 2). Table 3 shows that the biases, empirical variances, averages of the variance estimates (E( var)), mean square errors of the 0.1th and 0.3th quantile estimates conditional on Z = −2, empirical coverage probabilities, and median widths of 95% confidence intervals for F −1 1 (p; Z = −2) with p = 0.1 and 0.3 from the cause-specific proportional hazards model with the Weibull baseline hazard function and from the direct proportional hazards model with the Gompertz baseline function. The biases and mean square errors are somewhat large under the misspecified models, especially for estimates of F −1 1 (0.3; Z = −2) from the causespecific proportional hazards model with the Weibull baseline hazard function. This is the because the cause-specific proportional hazards model with the Weibull baseline hazard function is badly misspecified at p = 0.3, resulting in larger biases and mean square errors in estimates of F −1 1 (0.3; Z = −2) compared with the direct proportional hazards model with the Gompertz baseline function. More details can be found in Appendix C of Supplementary Materials. The variance estimates from the two parametric models agree with their empirical variances. The median widths of the parametric Wald confidence intervals for the two parametric models are shorter than those from the inversion method. The empirical coverage probabilities are underestimated under the model misspecification.
Overall, simulation studies show that the parametric quantile inferences perform well when the parametric model is appropriately specified. The biases of the quantile estimates are small and variance estimates agree with the empirical variances. The empirical coverage probabilities are close to the nominal level.

Application to breast cancer data
We illustrate our parametric quantile inferences with early stage breast cancer data [13]. Between December 1992 and June 2000, 769 women with early breast cancer were enrolled; 386 were randomly assigned to receive tamoxifen plus breast irradiation and 383 to receive tamoxifen alone. Of these 769 women, 641 were enrolled at the Princess Margaret hospital and 128 at the British Columbia Cancer Agency. Only 641 women enrolled at the Princess Margaret hospital were used to this quantile analysis; 320 were randomly assigned to receive radiation and tamoxifen (RT+Tam) and 321 to receive tamoxifen alone (Tam). The median duration of follow-up was 5.4 years. An event of interest was local relapse but some patients died without having local relapse, yielding two competing risks. The purpose of this study was to investigate the effect of the treatment (RT+Tam versus Tam) on local relapse. We included treatment, tumor size (median: 1.5 cm), and age at the time of randomization (median: 67 years old) as covariates in the analysis.
We fitted the cause-specific proportional hazards models with the Weibull baseline hazard function, exponential baseline hazard function, and log logistic baseline hazard function. For the Weibull cause-specific proportional hazards model, we relaxed the common shape parameter assumption. We also fitted the direct proportional hazards models with the Gompertz baseline function, conditional Weibull baseline function, and modified logistic baseline function. Tables 4 and 5 show the parameter estimates (standard errors) for the parametric models, along with the regression parameter estimates (standard errors) for the Cox proportional hazards model [8] and for the Fine and Gray [12] model. For all parametric and semiparametric models, treatment and tumor size were significant for local relapse while age was not significant for local relapse. Figure 1 shows the cumulative incidence estimates for local relapse from the cause-specific hazards models and from the direct parametric regression models for a woman aged 50 at the time of randomization of tumor size 2.3 cm on tamoxifen plus breast irradiation and on tamoxifen alone. For comparison, we also plotted semiparametric estimates of the cumulative incidence function for local relapse from the cause-specific proportional hazards model [6] and from the proportional subdistribution hazards model [12]. The parametric curves agree well with the semiparametric curves, except for the parametric estimates from the exponential cause-specific proportional hazards model on tamoxifen alone. For all models, the rate of local relapse of the patient in the tamoxifen plus irradiation group is lower than that in the tamoxifen alone group, which indicates that radiotherapy plus tamoxifen significantly reduces the risk of local relapse. To check parametric baseline assumptions, we compared   show that the Weibull baseline estimates, log logistic baseline estimates, Gompertz baseline estimates, and modified logistic baseline estimates are close to the corresponding semiparametric estimates. These findings suggest that the cause-specific proportional hazards models with the Weibull baseline hazard function and log logistic baseline hazard function, and the direct proportional hazards models with the Gompertz baseline function and modified logistic baseline function are adequate for modeling of the cumulative incidence function on the breast cancer data. To compare the goodness of fit of these parametric models, we computed the Akaike information criterion (AIC) [1], where AIC = −2×log-likelihood+2×number of parameters. Based on the AIC in Tables 4 and 5, the Weibull cause-specific proportional hazards model provides a better fit than the exponential and log logistic cause-specific proportional hazards models. The direct Gompertz model has a smaller AIC than the conditional Weibull and modified logistic models. Among the Weibull and Gompertz models, the Weibull cause-specific proportional hazards model provides a better fit than the direct Gompertz proportional hazards model. Table 6 shows the parametric estimates and 95% confidence intervals for quantiles of time to local relapse for the woman on tamoxifen plus irradiation and on tamoxifen alone from the Weibull cause-specific proportional hazards model and from the direct Gompertz proportional hazards model that provide better fits for each approach. On tamoxifen plus irradiation, the 1% and 2% quantile estimates of time to local relapse were about 3 years and 5 years, respectively. On tamoxifen alone, the 1% and 2% quantile estimates of time to local relapse were about 0.3 year and 0.6 year, respectively. Thus, the quantile estimates of time  to local relapse on tamoxifen plus irradiation are roughly seven times as large as those on tamoxifen alone, which indicates that radiotherapy plus tamoxifen has a beneficial effect on local relapse. The parametric confidence intervals for the quantiles of time to local relapse on tamoxifen plus irradiation do not overlap with those on tamoxifen alone. This confirms that radiotherapy plus tamoxifen is significant for local relapse as shown in Tables 4 and 5.

Discussion
We investigated parametric quantile inferences from parametric regression modeling of the cumulative incidence function. We considered the cause-specific hazard and direct approaches for the parametric quantile inferences. We developed the construction of parametric confidence intervals for quantile event times with adjustment for covariates by inverting a test statistic following Brookmeyer and Crowley [5]. When a closed-form of a quantile event time exists, we constructed a Wald confidence interval for a quantile event time using the delta method. The simulation studies showed that the parametric quantile inferences performed well when a parametric model is appropriately specified.
In this paper, we used several different parametric models for the cause-specific hazard approach and for the direct approach. Practitioners often wish to consider several models and then choose the best-fitting one. As shown in the data example, the appropriateness of parametric models for a particular dataset can be assessed by comparing parametric estimates with semiparametric counterparts through graphical checks. The AIC can be used to select a parametric model that provides the best fit, which can be used for estimation and inference of quantile event times of interest.
For parametric quantile inferences, we considered the proportional hazards model [8] for the cause-specific hazard approach and a transformation regression model [11,12] for the direct approach. As suggested by one referee, to improve model flexibility and model misspecification, one may consider parametric quantile inferences under a generalized odds rate model [9] for the cause-specific hazard and direct approaches. This may be a topic for future research.
The proposed quantile inferences were developed based on the cumulative incidence function, which is an improper function since its upper limit is less than one. The pth quantile event time may not exist if the upper limit of the cumulative incidence function is less than p. Therefore, quantile event times based on the cumulative incidence function are restricted to quantile levels lower than the upper limit of the the cumulative incidence function. To avoid this issue, as suggested by one referee, one may consider quantile inferences based on the conditional distribution of the event time given a cause of failure. Investigations along this direction may merit future research but are beyond the scope of this paper.