Goodness-of-fit inference for the additive hazards regression model with clustered current status data

Clustered current status data are frequently encountered in biomedical research and other areas that require survival analysis. This paper proposes graphical and formal model assessment procedures to evaluate the goodness of fit of the additive hazards model to clustered current status data. The test statistics proposed are based on sums of martingale-based residuals. Relevant asymptotic properties are established, and empirical distributions of the test statistics can be simulated utilizing Gaussian multipliers. Extensive simulation studies confirmed that the proposed test procedures work well for practical scenarios. This proposed method applies when failure times within the same cluster are correlated, and in particular, when cluster sizes can be informative about intra-cluster correlations. The method is applied to analyze clustered current status data from a lung tumorigenicity study.


Introduction
Current status data, or Case I interval-censored failure time data originated from demographical studies, are frequently encountered in survival analysis. For such data type, each study subject is observed only once at some possibly subject-specific observation time.
The only information known about the failure event is whether or not the failure has occurred by the time of observation, but the exact failure onset time is either left-or rightcensored. In literature, statistical methods have been developed extensively for analyses of current status data. Among others, Lin et al. [9], Chen et al. [2] and Li et al. [8] considered regression analysis using the additive hazards model. Wang et al. [19] and Hu et al. [7] proposed regression-analysis based procedures on bivariate current status data under the proportional hazards model. Hu et al. [6], Yang et al. [21] and Lu et al. [12] investiaged varying-coefficient partially linear proportional hazards models. In various research fields where current status data are encountered, such as tumorigenicity research [5], epidemiology, demographics and social research, one observation is that current status data can be "clustered" when correlated events occur on the same individual, or when a failure event is CONTACT  assessed among a group of individuals who share the same social or environmental factors. For instance, in epidemiology, similar diseases can be observed from members of the same family, and the disease onset times can be correlated. Moreover, the strength of correlation may depend on the family size.
To incorporate intra-cluster correlations when analyzing clustered current status data, a collection of regression-modeling based statistical methods have been developed. Lin and Wang [11] analyzed clustered current status data under the frailty proportional odds model. Wen and Chen [20] developed an algorithm for nonparametric maximum likelihood estimation of clustered current status data based on a gamma-frailty Cox model. Su and Chi [16] studied the marginal regression methods with the additive hazards model. Feng et al. [3] proposed a regression method that corrects possible estimation bias from the standard current status data regression without considering cluster-size informative intra-cluster correlation. All the aforementioned literatures are derived on basis that the corresponding models are right fits for the data sets. In practice, however, there is no assurance that a model is a good fit without a process of model checking. For models associated with right-censored data, Lin et al. [10] and Yin [22] studied model assessment tests for the Cox model and the additive hazards model, respectively, by employing cumulative sums of martingale-based residuals. For current status data, Feng et al. [4] studied a model checking technique with the additive hazards model but there has been no such a procedure for clustered current status data.
One motivating example arises from the clustered current status data from a National Toxicology Program (NTP) 2-year rodent carcinogenicity study of chloroprene, a highvolume production chemical with limited information on its carcinogenic potential [13]. Male and female F344/N rats and B6C3F 1 mice were exposed to chloroprene by inhalation for 16 days, 13 weeks, or 2 years. In the 2-year study on B6C3F 1 mice, groups of 50 male and 50 female mice were exposed to chloroprene at concentrations of 0, 12.8, 32, or 80 ppm by inhalation, 6 hours per day, 5 days per week, for 2 years. Types of lung tumors from the B6C3F 1 mice were found to occur at deaths, but the exact tumor onset times were unknown and thus current status data arose. The detailed study protocol and the main findings were published in the NTP technical report [13]. Besides, the emergence statuses of lung tumors composed clustered current status data with the number of lung tumor types recorded at each observation time being the cluster size [3]. The inference procedure proposed by Feng et al. [3] assumes the additive hazards model, which specifies the treatment group effects by additive terms to the baseline hazard function of each tumor type. As compared to the popular choices of the proportional hazards models, such models can be more intuitive and plausible to interpret [9]. However, the goodness-of-fit of the additive hazards model was unknown for this set of clustered current status data, due to lack of an appropriate model assessment procedure.
In this paper, we propose a class of model assessment procedures to evaluate assumptions associated with the additive hazards model fitted to clustered current status data. The rest of this paper is organized as follows. As a basis for model-checking techniques, we first briefly review the additive hazards model for clustered current status data and its inference procedure in Section 2. In Section 3, we propose graphical and numerical modelchecking techniques with a class of model assessment procedures, and establish asymptotic properties of the proposed test statistics. In Section 4, we present results obtained from an extensive simulation study to evaluate the finite sample performance of the proposed method. In Section 5, an illustrative example is provided for application.

Model and inference
Consider a survival analysis based on current status data from n independent clusters. Let n i denote the size of the ith cluster (i = 1, . . . , n). Let C ik represent the observation time for subject k in cluster i (k = 1, . . . , n i ), and T ik the unobserved failure time of interest. Let Z ik (t) be an external vector of p-dimensional covariates observed at time t. We note that for the same i, T ik can be correlated for k = 1, . . . , n i ; however, subjects from different clusters are assumed to be independent, and we also assume that C ik is conditionally independent from T ik given Z ik (t).
To incorporate potentially informative cluster sizes on intra-cluster correlations, assume that T ik satisfies the following additive hazards model where λ 0 (t) denotes the unknown baseline hazard function, β 0 denotes an unknown vector of p-dimensional regression coefficients, and ξ i represents a cluster-specific latent process that may be related to the cluster size n i . Without loss of generality, it is assumed for the counting process associated with C ik . To estimate the regression parameters β 0 , a local square-integrable martingale can be constructed by t 0 λ 0 (s) ds, and λ c (t) is the hazard function of C ik 's. Therefore, β 0 , can be estimated by solving the weighted estimating equation ⊗l , a ⊗l = 1, a and aa for l = 0, 1 and 2, respectively. Under some regularity conditions, it was shown thatβ is consistent, and √ n(β − β 0 ) converges in distribution to a zero-mean normal distribution. In this inference method, the contribution of each individual to the overall estimating equation is inversely weighted by the corresponding cluster size to account for the possibly informative cluster sizes, which corrects possible biases from ignoring informative cluster sizes as compared to a standard estimating equation method.

Model checking techniques
For model (1), we construct model checking techniques based the martingale residual, defined byM . M ik (t) can be viewed as the difference at time t between the observed and the expected number of failures for subject k in cluster i. For a given set of data, by plottingM ik (t) along the observed times, the closerM ik (t) fluctuates around 0, the better model (1) fits [17]. However, as commented by Yin [22] among others, such a graphical examination could hardly be used as an objective diagnostic test. To this end, we study the multi-parameter stochastic process by extending the work from Yin [22] and Feng et al. [4]. Under model (1), we define du, and f (·) is a known vector-valued bounded function. Both z and f (·) are selected for various modelchecking assumptions. For instance, initiated by Lin et al. [10], a common practice is to choose f (Z * ik (s)) = Z * ik (s) and z = ∞ to check the additive hazards assumption; f (Z * ik (s)) ≡ 1, t = τ and z l = ∞ for all l = j (l = 1, . . . , p) to assess the linear covariate effect assumption of Z j ; and f (·) ≡ 1 for an omnibus goodness of fit assumption. Regardless of these choices, under model (1), the process W(β, t, z) is expected to fluctuate randomly around zero. To develop a rigorous test procedure based on W(β, t, z), the following theorem establishes the asymptotic property for the stochastic process W(β, t, z). (1) and regularity conditions presented in the appendix, n −1/2 W(β, t, z) converges weakly to a zero-mean Gaussion random field with the covariance function between (t 1 , z 1 ) and (t 2 , z 2 ) given by E{U 1

Theorem 3.1: Under model
The proof of Theorem 3.1 is sketched in the appendix. Intuitively, model assessment can be performed by evaluating largest departures from an assumed perfect fit. Statistically, such an idea can be delivered by checking various functions based on the supremum statistic of |n −1/2 W(β, t, z)| from zero. Aside from the desirable asymptotic behavior of n −1/2 W(β, t, z) itself, distributions of such supremum-based statistics can be hard to derive. To this end, we adopt the technique utilizing Gaussian multipliers and define a perturbed version of n −1/2 W(β, t, z), defined by: and {η 1 , . . . , η n } is a random sample from the standard normal distribution. For a certain data set, a large number of standard normal samples correspond to a collection of realizations from the distribution ofŴ(β, t, z). The next theorem lays the foundation for constructing a class of formal testing procedure for the relationship between W(β, t, z) and W(β, t, z). (1) and regularity conditions presented in the appendix, n −1/2Ŵ (β, t, z) converges weakly to the same zero-mean Gaussian random field as with n −1/2 W(β, t, z).

Theorem 3.2: Given the observed data
See Appendix for the Proof of Theorem 3.2. Theorem 3.2 implies that tests based on any function of W(β, t, z) can be performed by comparing its observed value to the empirical distribution from the function ofŴ(β, t, z), with the latter given by a large number of realizations utilizing Gaussian multipliers. Next, we show techniques to construct specific test statistics for checking various model assumptions associated with model (1). Based on Theorems 1 and 2, it can be shown that all the resulting statistical tests are consistent.
(i) Model assessment on the additive hazards assumption. To check the additive hazards assumption under model (1) Then W(β, t, z) corresponds to the following score-type process and the covariance matrix of n −1/2 W (β 0 , τ ) can be consistently estimated bŷ So for j = 1, . . . , p, we propose the test statistic for checking the additive hazards contribution from the jth covariate as where is the jth component of the perturbed score-type processŴ (β, t). The p-value can be empirically estimated by the percentage ofT j > t j through generating realizations ofT j using the Gaussian multipliers contained byŴ (β, t).
Besides, test statistics on the joint additive hazards assumption from all covariates are given by or (ii) Model assessment on the linear assumption of covariate effects. If the aim is to check whether one (say, the jth) covariate effect is a linear function of A p-value can be obtained by comparing sup z |W j (β, τ , z)| to a large number of its realizations from sup z |Ŵ j (β, τ , z)|, whereŴ j (β, τ , z) is the jth component ofŴ(β, τ , z) defined by (4).
(iii) An omnibus model assessment against departures of any form. To give an omnibus test for checking the overall fit of model (1), let f (·) ≡ 1, and the W(β, t, z) process becomes We propose the test statistic for omnibus goodness of fit as A formal test can be constructed with its p-value computed by P(T 0 > t 0 ), wherê andŴ 0 (β, t, z) represents the perturbed version of W 0 (β, t, z) as defined by (4).

Remarks:
(1) To accommodate the informative cluster sizes, inversed cluster sizes were employed in the estimating equation and the resulting statistics. Such a technique enables the contribution from each individual to be inversely weighted by the corresponding cluster size, which avoids overweighing oversized large clusters. (2) For simplicity, we have assumed that the observation time C ik and the covariate Z ik are statistically independent. However, the method developed can be straightforwardly generalized to other cases. For example, suppose the hazard function of C ik is informative about Z ik by following the model: then the local square-integrable martingale is given by . Based on the equation (8), test statistics and the corresponding asymptotics can be constructed in a similar manner to those described earlier in this section.

Simulation studies
Extensive simulation studies were conducted to evaluate the finite sample performances of the model checking test statistics proposed in the previous section. To mimic scenarios as with the motivating example, all covariates were assumed to be time-independent. Corresponding to subject k from cluster i, Z ik (t) was generated in three ways: (1) from the uniform distribution on (0, 1), (2) from the Bernoulli distribution with probability of successes equal to 0.5, and (3) from the normal distribution with mean and variance both equal to 0.5. Given the Z ik 's, the failure times T ik 's were generated from model (1) with λ 0 (t) = 1 or 2, and the observation or censoring times C ik 's were generated from the exponential distribution with unit mean or the uniform distribution on (0, 1). To simulate potentially informative cluster-sizes for intra-cluster correlations, each cluster size n i was generated from the binomial distribution B(6, 0.9) if ξ i > 0, and B(6, 0.1) otherwise, where ξ i was a normal random variable with zero mean and standard deviation 0.4. Zeroes of n i were discarded and re-generated until a non-zero value. The results given below are based on 1000 replications with sample sizes n = 100 or 200, and significance level α = 0.05. Table 1 presents the empirical sizes of the proposed test procedure on omnibus goodness of fit when β 0 = 0, ±0.1, ±0.2 or ±0.5. The results indicate that the proposed test procedure yields empirical test sizes that are close to the nominal level 0.05 for the situations considered. Also as expected, such an agreement appears stronger when the sample size increases. To evaluate empirical powers, we first inspected the proposed test performance on the additive hazards model assumption, and generated data from the following alternative model: where λ 0 (t) = 1. Table 2 presents the results of empirical powers for fitting model (1) when β 0 = ±0.6, ±1.2 or ±1.8. The results show that the proposed test procedure yields reasonable powers, which increase as the sample sizes increase. Then we employed the proposed method to test on the linear covariate effect assumption, when data were generated from the following model: Given ξ and Z ik , the failure times were generated from model (10) with λ 0 (t) = 1, β = 0.8 and n = 200. The censoring times were generated from the exponential distribution with unit mean. To obtain empirical test sizes and powers empirically, we considered the following two cases: Case 1 (the null model): γ = 0; Case 2 (the alternative model): γ = 1.
The empirical test sizes are 0.053 and 0.046, and the test powers are 0.643 and 0.599, when Z ik followed the normal and uniform distributions aforementioned. The results again indicate that the proposed method can well preserve the type I error at the nominal level and is sufficiently powerful to detect model misspecification when the linear covariate effect assumption is not met. To further see the performance of the graphical examination method presented in Section 3, we examined the plots of the martingale-based residuals W(β, t, z) versus 50 realizations fromŴ(β, t, z) and displayed the results by Supplemental Figure 1 for both cases above when n = 200. We see from these plots that while the bold curves of observed W(β, t, z) fluctuate closely around zero under the null model, when under the alternative model, the curves show large departures from zero and their ranges exceed those ofŴ(β, t, z).
In addition, we assessed performances of the omnibus goodness of fit test when data were generated from various other setups. For example, Tables 3-5 present the empirical test sizes and powers for data generated under the same setups as with Tables 1 and  2 except for a few changes representing different clustering effects or multiple covariates. More specifically, for Tables 3 and 4, ξ i was generated from zero-mean normal distributions with various standard deviations (σ ) corresponding to different cluster effects. The smaller σ is, the less variability with cluster-specific baseline hazards and thus the weaker the clustering effects become. Table 5 presents the empirical test sizes and powers for the same setups as with Tables 1 and 2 but Z ik = (Z 1ik , Z 2ik , Z 3ik , Z 4ik , Z 5ik ) , where C ik 's were generated from the exponential distribution with unit mean and λ 0 (t) = 1. The elements of Z ik were assumed to follow a multivariate normal distribution as follows: Our findings based on these additional simulation setups are all consistent with those obtained earlier.
To assess the omnibus goodness of fit of model (1) based on clustered current status data, one question is whether the potentially informative clustering effects could be ignored such that some existing model assessment procedure could be employed. To address this question empirically, we applied the procedure given by Feng et al. [4] to the data generated for Tables 1 and 2, with the empirical test sizes and powers presented by Supplemental  Tables 1 and 2. Compared with Tables 1 and 2, Supplemental Tables 1 and 2 show empirical test sizes that can fluctuate further away from the nominal level 0.05, and the empirical test powers are overall lowered. Such comparisons indicate ignoring the clustering effects can lead to less reliable or biased model assessment results. Besides the results above, other setups with different sample sizes were generated to assess performances of the proposed test statistics. For example, Supplemental Tables 3  and 4 include the empirical sizes and powers for the setups same as those with Tables 1 and 2 but n = 50. All results imply consistent conclusions as those obtained above.

Case study
In this section, we applied the proposed test procedure to analyze the set of clustered current status data on lung tumors from the B6C3F 1 mice studied by the NTP tumorigenicity research, as described earlier in the introduction section. In this study, current status data arose since tumor emergences were observed only at the times of natural deaths or sacrifices, but the exact tumor onset times were unknown. Clustered data on types of lung tumors were studied: Alveolar/Bronchiolar Adenoma (Type A/B A), Alveolar/Bronchiolar Carcinoma (Type A/B C) and Mediastinum (Type M). Emergence statuses on Types A/B A and A/B C tumors were complete, but partly missing on Type M. The number of lung tumor types with statuses recorded from each observation time represented the cluster size.
To study the effect of high-concentration chloroprene (80 ppm) on three types of tumor formations in comparison to the control group (0 ppm), Feng et al. [3] proposed  Table 5. Empirical sizes and powers for the marginal additive hazards model based on the score process Sizes Powers an inference procedure tailored to the clustered current status data structure assuming model (1). To assess the adequacy of model (1), the proposed test procedure was applied for an omnibus goodness of fit. Males and females were evaluated separately. For each individual in each gender group, the covariate was coded 1 if from the high-concentration chloroprene group or 0 otherwise. Our test yielded p-values of 0.201 and 0.018 for the male and female groups, respectively. The results indicate that model (1) fits the male data well, as has been performed and inferenced by Feng et al. [3], but not suitable for the female data.

Discussion
This paper discusses a class of model assessment procedures for the appropriateness of the additive hazards model based on clustered current status data. Based on cumulative sums of martingale-based residuals over time and covariates, test statistics are proposed for three specific model checking assumptions: the additive hazards assumption, the linear covariate effect assumption and the omnibus goodness of fit assumption. Relevant asymptotic properties are established, and empirical distributions of the test statistics can be simulated utilizing Gaussian multipliers. Extensive simulation studies confirmed that the proposed test procedure works well for practical scenarios. The procedures apply when failure times within the same cluster are correlated, and in particular, when cluster sizes can be informative about intra-cluster correlations. The Matlab function for the proposed procedure is available from the authors upon request. The proposed model checking method applies to clustered current status data or case I interval-censored data only, and it would be desirable to generalize it to scenarios when general interval-censored data arise, for which both observation times associated with interval censoring need to be taken into account and the cumulative sums of martingalebased residuals need to be modified. In terms of the current status context, we have assumed that the distribution of the observation time C ik does not depend on the covariate Z ik , but when the assumption is not met in practice, additional modeling may be necessary for C ik and the test procedure should be updated accordingly. With respect to the parametric form of covariate effects, there are several other directions for future work. For example, it can be more appropriate to consider a model general class of failure time models such as the linear transformation models. Sometimes important covariates may contain missing information and only auxiliary covariates are available. Although the graphical model checking idea presented in Section 3 could still work, it would be more challenging to derive formal model checking procedures for such situations.
Proof: Firstly, we note that n −1/2 W(β, t, z) is asymptotically normal for fixed t and z: consider n (β, s). Using the consistency ofβ and Taylor's series expansion of U (β) at β 0 , it yields It can be easily seen thatˆ (β, τ ) is positive definite in probability. Using the Taylor's series expansion of W(β, t, z) at β 0 , it gets that So we can get that n −1/2 W(β, t, z) is equal to Under the regularity conditions (C1)-(C3), n −1/2 W(β, t, z) is asymptotically equivalent to Note that, for any fixed t and z, U i (t, z) for i = 1, . . . , n are independent random vectors with zeromean and bounded variance. By the multivariate central limit theorem, we can get that n −1/2W (t, z) converges to a zero-mean normal distribution, and implying the same for n −1/2 W(β, t, z). Now to prove Theorem 3.1, it suffices to prove the tightness of n −1/2 W(β, t, z). Without loss of generality, both the covariates and the censoring times are assumed to be bounded in [−1, 1]. We next show the tightness of n −1/2 W(β, t, z) in [0, τ ] × [−1, 1] p , and for this we rewrite It yields that n −1/2 n i=1 η i {J i (β 0 , t) −Ĵ i (β, t)} is equal to It can be easily seen that, in probability, the third term on the right side of (A5) converges to zero by the consistency ofβ. Using the strong representation theorem ([14, Theorem 9.4]) and Lemma A.3 of Bilias et al. [1], we obtain that, as n → ∞, The first term converges to zero in probability by Lemma A.3 in Spiekerman and Lin [15] and the equation (A4). Using Theorem 2 in Spiekerman and Lin [15], the consistency ofˆ 1 and the equation (A4), it also can be seen that the second term converges to zero in probability. Thus, we have n −1/2 n i=1 η i {J i (β 0 , t) −Ĵ i (β, t)} P − → 0 as n → ∞. Finally, we define thus we obtain that n −1/2Ŵ (β, t, z) − n −1/2 W * (t, z) P − → 0 as n → ∞. This completes the proof.