Nonparametric estimation and test of conditional Kendall's tau under semi-competing risks data and truncated data

In this article, we focus on estimation and test of conditional Kendall's tau under semi-competing risks data and truncated data. We apply the inverse probability censoring weighted technique to construct an estimator of conditional Kendall's tau, . Then, this study provides a test statistic for , where . When two random variables are quasi-independent, it implies . Thus, is a proxy for quasi-independence. Tsai [12], and Martin and Betensky [10] considered the testing problem for quasi-independence. Via simulation studies, we compare the three test statistics for quasi-independence, and examine the finite-sample performance of the proposed estimator and the suggested test statistic. Furthermore, we provide the large sample properties for our proposed estimator. Finally, we provide two real data examples for illustration.


Introduction
Let (X , Y ) be a pair of failure times which are possibly correlated. In many biomedical applications, bivariate survival analysis has attracted substantial attention due to its wide applications. In the past few years, applications have been extended to include multiple event times which have dependently censored or truncated relationship, such as semi-competing risks data and truncated data. For semi-competing risks data, there are two type events, terminal event and non-terminal event. The time to terminal event Y may censor the time to non-terminal event X, but not vise versa. The paper by Tsai [10] considered the truncated data in which one variable X truncates the other Y, and the two variables are possibly correlated. Therefore, we can observe the pair (X , Y ) only when X < Y . In bivariate survival analysis, we are often interested in association measurement between two variables X and Y. There is an association measure, Kendall's tau, which measures the association between X and Y. Kendall's tau is an usual association measurement in life-time models. However, when data become incomplete due to dependently censored or truncated, such as semi-competing risks data or truncated data, Kendall's tau is non-identifiable without extra assumptions. There were some literatures discussed the association of (X , Y ) for semi-competing risks data and truncated data under some proper assumptions, such as Fine et al. [2], Wang [11], and Lakhal et al. [6] for semi-competing risks data, and Chaieb et al. [1] for truncated data. All of these literatures considered a copula model to specify the dependence.
Tsai [10] proposed the conditional Kendall's tau, τ c , to measure the association of (X , Y ) under comparable set for truncated data. Although τ c cannot identify all the association information of (X , Y ), it can reveal a part of the association information of (X , Y ). Via the partial association information of (X , Y ), τ c , we can understand the relation of (X , Y ) under the restriction due to incomplete data. By the same way, following Tsai's definition, we also can define the conditional Kendall's tau for semi-competing risks data. When X and Y are independent under the comparable set, we say that X and Y are quasi-independent, denoted as X ⊥ Q Y . Tsai [10] and Martin and Betensky [8] provided the statistical approaches to test the hypothesis, H 0 : τ c = 0, which is a proxy for quasi-independence under truncated data.
In this article, we focus on estimation and test of conditional Kendall's tau, τ c , under semicompeting risks data and truncated data. τ c can reveal the association information of (X , Y ) under comparable set, which provides the association information of (X , Y ) as more as possible under a restriction caused by incomplete data. For the testing problem, we consider the more general testing problem, H 0 : τ c = τ 0 , where τ 0 ∈ (−1, 1), instead of H 0 : τ c = 0 considered by Tsai [10] and Martin and Betensky [8].
In Section 2, we introduce two data structures, semi-competing risks data and truncated data, and the definition of conditional Kendall's tau. In Section 3, we use the IPCW technique to construct an estimator of conditional Kendall's tau, derive the large sample properties, and propose a test statistic for conditional Kendall's tau under semi-competing risks data and truncated data. Then, we examine the finite-sample performance of our proposed estimator and testing procedure in Section 4. In Section 5, we analyze two real data sets for illustration. Finally, we give the concluding remarks in Section 6.

Data and conditional Kendall's tau
Let (X , Y ) be the two life-time variables and C be an external right-censoring variable. We discuss the two data structures in the following.

Semi-competing risks data
Semi-competing risks data arise in the situation that one event is terminal, such as death, while the other is non-terminal, such as disease progression. Let X be the time to non-terminal event and Y be the time to terminal event. Y may censor X but Y is not censored by X. C is a right-censoring variable, which is independent of X and Y. Define the observed variables is an indicator function. The observable data are (S, T, δ X , δ Y ). For the study of leukemia patients receiving bone marrow transplants, let X be the time of leukemia relapse, Y be the time of death and C be the time of right-censoring. We know that when death occurs prior to relapse, we cannot observe X. But, if a patient relapses, we can still observe the death time Y if Y ≤ C. So the terminal event censors the non-terminal event, but not vice versa.

Truncated data
Tsai [10] considered the data structure in which one variable truncates the other and the two variables are possibly correlated. Let X be the left-truncated time and Y be the failure time. Thus, we can observe a pair (X , Y ) only if X < Y . Such data are called truncated data. Let C be a right-censoring variable, and assume C and Y are conditional independent given X. Define the observable variables T = min(Y , C) and δ = I(Y ≤ C). The observable data are (X , T, δ). For the Channing house retirement center example [5], subjects are observed only if they live long enough to enter the retirement house. Let X be the retirement house entry age, Y be the lifetime and C be the right-censoring time. Obviously, we observe a subject only when the subject enters the retirement house, that is, X < Y .

Conditional Kendall's tau
In bivariate survival analysis, we are often interested in association between two variables X and Y. Kendall [4] proposed the Kendall's tau to measure the dependency between X and Y. Let (X 1 , Y 1 ) and (X 2 , Y 2 ) be two independent vectors with the same bivariate distribution of (X , Y ).
where sgn(x) = 1 for x > 0, sgn(x) = −1 for x < 0, and sgn(x) = 0 for x = 0. The value of Kendall's tau is between −1 and 1. Because of its rank invariant property, it is suitable for measuring the association in lifetime models. When X and Y are independent, the probability of concordance should be equal to the probability of discordance. That is, when X and Y are independent, it implies τ = 0. Therefore, the null hypothesis, H 0 : τ = 0, is a proxy for the null hypothesis, H 0 : X ⊥ Y . In some situations, data become incomplete due to dependently censored or truncated, such as semi-competing risks data or truncated data. For this kind data, the pairs (X 1 , Y 1 ) and (X 2 , Y 2 ) are comparable only in the comparable set, 12 . Thus, the Kendall's tau is non-identifiable for semi-competing risks data and truncated data. Tsai [10] proposed the conditional Kendall's tau for truncated data, which is defined as where 12 is the condition that (X 1 , Y 1 ) and (X 2 , Y 2 ) are comparable under incomplete data. For the semi-competing risks data, the comparable set s 12 = {X 12 <Ỹ 12 }, whereX 12 = min(X 1 , X 2 ) andỸ 12 = min(Y 1 , Y 2 ). For the truncated data, the comparable set is t 12 = {X 12 <Ỹ 12 }, wherȇ X 12 = max(X 1 , X 2 ). τ c measures the association between X and Y under comparable set, 12 , and its value lies between −1 and 1. Although τ c cannot provide all the association information of (X , Y ), it presents the partial association information of (X , Y ) as more as possible under a restriction due to incomplete data. When X and Y are independent under the condition 12 , we say that X and Y are quasi-independent, denoted as X ⊥ Q Y . Under X ⊥ Q Y , it can show that τ c = 0. Therefore, H 0 : τ c = 0 is a proxy for H 0 : X ⊥ Q Y .
In the presence of right-censoring, we cannot estimate τ c directly because the pairs (X 1 , Y 1 ) and (X 2 , Y 2 ) are comparable only in comparable and orderable set, A 12 , which is included in 12 .

Estimation of conditional Kendall's tau
, which is valid in estimating τ c only when X ⊥ Q Y . Therefore, we want to develop an appropriate estimator of conditional Kendall's tau for general situation under semi-competing risks data and truncated data. Here, we use the technique, inverse probability censoring weighted (IPCW) [7], to modify the estimator. Under semi-competing risks data with the presence of right-censoring variable, we can determine the concordance or discordance of two pairs only in comparable and orderable set, A s 12 = {X 12 <Ỹ 12 <C 12 }. We will apply the survey sampling technique [3] to correct the bias for the estimation of τ c . We define a selection probability as where p s ij is the selection probability for comparable By the IPCW technique, we propose an estimator of τ c for semi-competing risks data aŝ . Note thatp s ij corrects the selection bias introduced from censoring. Under truncated data, Martin and Betensky [8] provided a consistent estimator ofτ c asτ t c,m . But, the estimator is valid for the estimation of τ c only when X ⊥ Q Y . Thus, we would like to develop an estimator of τ c under truncated data for general situation. The method is similar to the previous procedure for semi-competing risks data. Define a selection probability as where p t ij is the individual selection probability for comparable (i, j) pair. Because C > X is always true, let C = X + D and assume D is independent of (X , Y ). Because D can be viewed as the follow-up time after entering the study, which can be assumed to be independent of the truncated time and life time, the assumption is reasonable. Then, the p t ij can be written as By the IPCW technique, we can estimate τ c for truncated data bŷ

Large sample properties
In this section, we provide the asymptotic properties of our proposed estimator. To derive the large sample properties, we assume the following conditions: for all (i, j)) = 1.

C2:
For truncated data, assume D = Y − X fall inside [0, L XY ] and P(D > L XY ) > 2 > 0, for some constant 2 . It has that The conditions C1 and C2 avoid p * ij ( * = s or t) to be zero, which avoid to appear zero in denominator for the asymptotic properties derivation. However, in practical applications, we do not need the conditions becausep s ij andp t ij must be positive. In the following, we present a theorem for the asymptotic properties of our suggested estimator.
Theorem 1 Under condition C1 for semi-competing risks data and C2 for truncated data,τ * c is a consistent estimator of τ c and We present the detail proof in appendix of supplemental materials for semi-competing risks data. Similarly, the proof for truncated data is similar to semi-competing risks data and we omit it.
Since the asymptotic variance ofτ * c , * = s or t, is difficult to estimate, we use the jackknife approach to estimate the variance, which is shown aŝ

Testing for conditional Kendall's tau
Tsai [10] suggested an approach to test whether X and Y are quasi-independent. Since X ⊥ Q Y implies τ c = 0, H 0 : τ c = 0 is a proxy for H 0 : X ⊥ Q Y . He tested H 0 : τ c = 0 by U-statistic with exact variance formula. Martin and Betensky [8] also provided a test for H 0 : τ c = 0 by U-statistic with asymptotic variance. In Section 3.1, we propose an estimator for τ c , and provide the asymptotic properties in Section 3.2. In this section, we suggest a test statistic based onτ c . Our statistic can test not only H 0 : τ c = 0 but also H 0 : τ c = τ 0 , where τ 0 ∈ (−1, 1). Under semi-competing risks data, our estimator of τ c in Section 3.1 isτ s c , which is shown in (4). For H 0 : τ c = τ 0 , our test statistic is whereσ is an estimator of standard deviation by the jackknife method. By the theorem in Section 3.2 and Theorem 2.1 in Shao and Tu [9], we can show that T s where α is the type I error. Under truncated data, our estimator of τ c isτ t c as shown in Equation (5). For H 0 : τ c = τ 0 , our test statistic is whereσ is an estimator of standard deviation by the jackknife method. By the theorem in Section 3.2 and Theorem 2.1 in Shao and Tu [9], we have that T t d → N(0, 1) and

Simulation studies
In this section, we examine the finite-sample performance of our proposed estimator and testing procedure. We consider two data structures, semi-competing risks data and truncated data. Also, we use three different dependence structures to simulate, such as Clayton copula, Frank copula, and the beta frailty model. Furthermore, we compare our test statistic for quasi-independence with Tsai [10] and Martin and Betensky [8].

Semi-competing risks data
For semi-competing risks data, we consider three dependence structures, Clayton copula, Frank copula, and beta frailty model. For Clayton copula, the joint survival function of X and Y can be written as For Frank copula, the joint survival function of X and Y is where S X (x) and S Y (y) are marginal survival functions of X and Y, and α is the corresponding association parameter. For the frailty model, the joint survival function of X and Y can be written as  where f r (r) is a density function of frailty variable r. Here, we consider r follows Beta(a, b). Let X be an exponential distribution with mean 0.6, Y be an exponential distribution with mean 1. For Clayton and Frank copulas, let C be a uniform distribution on (0, 6), and for beta frailty model, let C be a uniform distribution on (0, 15). The variable C for beta frailty model is different from others since we would control the censoring percentage. Furthermore, The sample size is 100 and the number of replications is 1000. For estimations, we set τ c from 0.1 to 0.8 for Clayton copula, from −0.8 to 0.8 for Frank copula, and from 0.1 to 0.5 for the beta frailty model. For the setting of τ c , we set it according to the relation, τ = τ c = α/(α + 2), for Clayton copula, and, for Frank copula and beta frailty, we compute the approximate true τ c based on a large number of (X i , Y i ). The results are given in Tables 1 and 2. For testing procedure, let τ 0 = 0.5 for Clayton copula; τ 0 = 0.5 and −0.5 for Frank copula; and τ 0 = 0.3 for the beta frailty model. The results are presented in Figure 1. From the results, it shows the biases and mean squared errors are very small and the coverage probability of the nominal 95% confidence interval is near 0.95. Furthermore, the average standard deviation by jackknife is close to the empirical standard deviation. Moreover, the performance of the suggested test statistic is well.

Truncated data
For truncated data, we also consider three dependence structures. For Clayton copula, where F X (x) is the cumulative distribution function of X. For Frank copula,    For frailty model, where r ∼ Beta(a, b). Let X be an exponential distribution with mean 0.6, Y be an exponential distribution with mean 1. For Clayton and Frank copulas, let D be a uniform distribution on (0, 4), and let D be a uniform distribution on (0, 6) for the beta frailty model. Furthermore, let sample size be 100 and the number of replications be 1000. For estimations, set τ c from −0.8 to −0.1 for Clayton copula, from −0.8 to 0.8 for Frank copula, and from −0.5 to −0.1 for the beta frailty model. The results are presented in Tables 3  and 4. For testing, let τ 0 = −0.5 for Clayton copula; τ 0 = 0.5 and −0.5 for Frank copula; and τ 0 = −0.3 for the beta frailty model. The results are shown in Figure 2. Similarly, the biases and mean squared errors are very small and the mean of standard deviation by jackknife is close to the empirical standard deviation. Furthermore, the coverage probability of the 95% C.I. is near 95%, and the test statistic performs well.

Testing for quasi-independence under truncated data
When X and Y are quasi-independent, it implies τ c = 0. Thus, we test H 0 : τ c = 0 instead of H 0 : X ⊥ Q Y . In this subsection, we compare our test statistic, Equation (7), Tsai's test statistic, and Martin and Betensky's test statistic for testing quasi-independence.
Based on the same setting in Section 4.2, the simulation results are shown in Figure 3. From the Figure 3, it reveal that, for Clayton copula, the power of our test statistic is close to Tsai's, and slightly more powerful than Martin's. For Frank copula, Tsai's power is slightly higher when τ c is negative. For the beta frailty model, our test statistic is slightly more powerful than Tsai's and Martin's. Furthermore, we present the empirical sizes of the three tests under the null hypothesis  Table 3. Estimation of τ c under truncated data for Clayton copula and the beta frailty model.    with α = 0.05 in Table 5. We run the simulation three times and all the empirical sizes are close to 0.05.

Bone marrow transplants data
In this subsection, we apply our method to the bone marrow transplants data that were given in [5, p. 464]. There were 137 leukemia patients receiving bone marrow transplants. Let X be the   time of leukemia relapse, which is a non-terminal event time, Y be the time of death, which is a terminal event time and C be the time to end of study. Note that X may be censored by Y, but not vice versa. Thus, these data are a semi-competing risks data and the observable variables are S = min(X , Y , C), T = min(Y , C), δ X = I(X ≤ Y ∧ C), and δ Y = I(Y ≤ C). These patients can be divided into three groups. The first group is the acute lymphoblastic leukemia (ALL) group with 38 patients. The second group is the acute myelogenous leukemia (AML) low-risk group with 54 patients. The third group is the AML high-risk group with 45 patients. We estimate τ c and test H 0 : τ c = 0 for the three groups and all patients by our suggested approach. Let ALL group be G1, AML low-risk group be G2, AML high risk group be G3 and all patients be G4. The results are shown in the upper half of Table 6. From the results, all the p-values are close to zero. Therefore, X and Y are not quasi-independent for G1-G4. Furthermore, all ofτ s c are greater than 0.7. It means the associations between X and Y for G1-G4 are positive and higher under comparable set. Thus, we can conjecture that the association of (X , Y ) would be positive and high.

Channing house retirement community data
We also apply our method to Channing house retirement community data that were given in [5, p. 480]. There were 97 males and 365 females who were in residence during the period January 1964 to July 1975. Let X be the entry age in months, Y be the death age in months and C be the age at study end in months. Note that we can observe a pair (X , Y ) only if X < Y . Thus, these data are a truncated data and the observable variables are T = min(Y , C) and δ = I(Y ≤ C). By the proposed method, we estimate τ c and test H 0 : τ c = 0 for these data. The results are shown in the lower half of Table 6. From the results, all ofτ t c are between 0 and 0.2, then we can say that the associations of (X , Y ) under comparable set for the three cases are positive, but lower.
For all people and females, the p-values are greater than 0.05, thus it is not significant from zero. Therefore, we cannot reject that X and Y are quasi-independent. For males, the p-value is smaller than 0.05, which is significant from 0 according to 0.05 significant level. Furthermore, the conclusion coincides with Tsai's and Martin's test statistic presented in the below of Table 6.

Concluding remarks
We follow Tsai's definition of conditional Kendall's tau, τ c , under truncated data. By the same way, we can define τ c under semi-competing risks data. In this article, we consider the estimation and test of conditional Kendall's tau, τ c , under semi-competing risks data and truncated data. We provide a consistent estimator,τ * c ( * = s or t), for semi-competing risks data and truncated data, which applies the IPCW technique to correct the bias. Furthermore, we use the proposed estimator of τ c to construct a test statistic. Then, we can test H 0 : τ c = τ 0 , where τ 0 ∈ (−1, 1). When τ 0 = 0, the test is a proxy for quasi-independence. Tsai [10] and Martin and Betensky [8] provided test statistics for quasi-independence. We compare the three test statistics for three different dependence structures, such as Clayton copula, Frank copula, and the beta frailty model in simulation studies. From the results, we have the conclusion that there is no the best test statistic among the three test statistics.
We also provide the large sample properties for our proposed estimator, which also help us to construct the null distribution of the suggested test statistic. According to the results in simulation studies, it shows that the performance of our proposed estimator and test statistic works well. Finally, we apply our proposed method to analyze two data sets. For bone marrow transplants data, we discuss four cases and conclude that there is positive and high association between the time of leukemia relapse and the time of death under comparable set for each case. Moreover, it also concludes that the two event times are not quasi-independent. For Channing house retirement community data, the association between the entry age and the death age are positive and low under comparable set for three cases.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This paper was financially supported by the National Science Council of Taiwan [NSC99-2118-M-194-001].

Supplemental data
The appendix referenced in Section 3.2 is available under the paper information at doi:10.1080/02664763.2015.1004624.