Inference on semi-parametric transformation model with a pairwise likelihood based on left-truncated and interval-censored data

Semi-parametric transformation models provide a general and flexible class of models for regression analysis of failure time data and many methods have been developed for their estimation. In particular, they include the proportional hazards and proportional odds models as special cases. In this paper, we discuss the situation where one observes left-truncated and interval-censored data, for which it does not seem to exist an established method. For the problem, in contrast to the commonly used conditional approach that may not be efficient, a pairwise pseudo-likelihood method is proposed to recover some missing information in the conditional method. The proposed estimators are proved to be consistent and asymptotically efficient and normal. A simulation study is conducted to assess the empirical performance of the method and suggests that it works well in practical situations. This method is illustrated by using a set of real data arising from an HIV/AIDS cohort study.


Introduction
Semi-parametric transformation models have become more and more popular in regression analysis of failure time data as they provide a general and flexible class of models that can cover various types of covariate effects (Fine, Ying, and Wei 1998;Chen, Jin, and Ying 2002;Zhang, Sun, Zhao, and Sun 2005). In particular, they include the proportional hazards model, the most commonly used one in the failure time data analysis, and the proportional odds model as special cases. In this paper, we discuss their estimation when one faces left-truncated and interval-censored failure time data.
Left truncation often occurs in many fields and one common situation where it happens is when the prevalent cohort sampling is used that recruits only the subjects who have experienced some initial event such as an infection or satisfied some certain conditions such as the people who have been infected by some virus (Pan and Chappell 1999;Wang, Li, and Sun 2021). A specific example is AIDS cohort studies that follow HIV-infected subjects and aim to investigate AIDS incubation or death risk (De Gruttola and Lagakos 1989). Interval censoring is an another feature of failure time data that one often has to deal with and means that the failure time of interest is known or observed only to belong to an interval instead of being observed exactly (Sun 2006). It is easy to see that interval-censored data include right-censored data as a special case (Kalbfleisch and Prentice 2011). There usually exist several types of interval-censored data, and one is the so-called case I or currentstatus data, for which each study subject is observed only once. Correspondingly, one may observe case II or K interval-censored data, meaning that there exist two observations or a sequence of observations, respectively, on each subject (Huang and Wellner 1997). In the following, we will focus on case K interval-censored failure time data.
A great deal of literature has been established for the analysis of interval-censored data and in particular, many authors have considered the application of parametric methods or models such as Weibull models and log-linear models. See, for instance, Marshall (1974), Hazelrig, Turner, and Blackstone (1982), Odell, Anderson, and D'Agostino (1992), and Li and Ma (2010). Many semi-parametric methods have also been developed for the analysis of interval-censored failure time data (Sun 2006;Kalbfleisch and Prentice 2011). For example, Huang (1996) and Huang and Wellner (1993) considered the nonparametric maximum likelihood estimation of the proportional hazards and odds models, and Rossini and Tsiatis (1996), Huang and Rossini (1997) and Shen (1998) developed some sieve maximum likelihood estimation procedures for the proportional odds model. Furthermore, to fit the semi-parametric transformation model to interval-censored data, Zhang et al. (2005) and Zhang and Zhao (2013) proposed an estimating equation method and an empirical likelihood inference approach, respectively, and Zeng, Mao, and Lin (2016) developed an EM-type algorithm. Wang, Zhao, Du, and Sun (2018) and Xu, Zhao, Hu, and Sun (2019) also discussed the same problem but under a more complicated type of interval censoring.
It is well known that left truncation can make the analysis much more complicated and the analysis that ignores it may yield biased estimation (Pan and Chappell 1999). Several methods have been proposed for regression analysis of left-truncated and intervalcensored data. For example, Kim (2003) considered the fitting of the proportional hazards model, a particular case of the semiparametric transformation model, to left-truncated and current status data and Pan and Chappell (2002) and Shen (2014) discussed the same problem but for general interval-censored data. Also Wang, Tong, Zhao, and Sun (2015) investigated regression analysis of left-truncated data and interval-censored data arising from the additive hazards model. Under the semiparametric transformation model, Shen, Chen, Pan, and Chen (2019) proposed a computationally efficient EM algorithm for the left-truncated and interval-censored data without or with a cure fraction. All methods mentioned above as well as most of the existing methods for left-truncated data are conditional approaches given truncation times. As consequence, they may not be efficient. To address this or improve the efficiency, some composite or pairwise likelihood-based methods have been proposed under the proportional hazards or odds model (Huang and Qin 2012;Huang and Qin 2013;Wu, Kim, Qin, Saran, and Li 2018;Gao and Chan 2019). In particular, Wang et al. (2021) gave a pairwise likelihood approach for fitting the proportional hazards model to left-truncated and case II interval-censored data. In this paper, we will consider left-truncated and case K interval-censored data and generalise their method to semi-parametric transformation models.
The remainder of this paper is organised as follows. First we will introduce in Section 2 some notations and assumptions that will be used throughout the paper as well as the structure of observed data. The proposed pairwise pseudo-likelihood approach will then be presented in Section 3, and in the method, we will augment the conditional likelihood with a pairwise likelihood constructed based on truncation times. In Section 4, the asymptotic properties of the proposed estimators will be established. Section 5 gives some simulation results obtained from a simulation study conducted to assess the empirical performance of the proposed method, and they suggest that it works well and is efficient for practical situations. In Section 6, the proposed approach is illustrated by using a set of real data from an AIDS cohort study, and Section 7 contains some concluding remarks.

Notations and assumptions
Consider a failure time study and let T * and Z * denote the underlying failure time of interest and a p × 1 vector of time-independent covariates. Suppose that there exists another random variable or an underlying truncation time denoted by A * such that T * is observed only if T * ≥ A * . An example of this is that T * represents the time to an event of interest such as the diagnosis of a disease, while A * is the time to enter the study. In the following, we will assume that A * is independent of T * , and the underlying truncation time A * is independent of the covariate Z * .
To describe the covariate effects, we will assume that given Z * , T * follows the semiparameter transformation model given by in terms of the cumulative hazard function. In the above, β denotes a p × 1 vector of regression coefficients, G(·) is a known, strictly increasing transformation function, and 0 (·) denotes an unknown increasing function. One major advantage of the model above is its flexibility as it includes many commonly used models as special cases. For example, it yields the proportional hazards model or the proportional odds model with the choice of G(x) = x or G(x) = log(1 + x), respectively. Among others, one commonly used class of transformation functions is the logarithmic transformations given by G(x) = r −1 log(1 + rx), r ≥ 0. Under model (1) and given Z * , the survival functions of T * can be written as As mentioned above, it will be assumed that one only observes left-truncated data or the pair (A * , T * ) with A * ≤ T * , and we will use (A, T, Z) ≡ (A * , T * , Z * ) | (A * ≤ T * ) to denote the realisation. Furthermore, we will assume that the study consists of n independent subjects and for subject i, there exists a sequence of observation times U i0 = A i < U i1 < · · · < U iK i , where K i denotes the number of the observation times on subject i, i = 1, . . . , n. In the following, it will be supposed that one can write U ij as U ij = A i + V ij and the K i and V ij 's are independent of (A i , T i ) given Z i . Then the observed data have the where A i , T i and Z i denote A, T and Z, respectively, associated with subject i and τ i denotes a follow-up time for the ith subject that is assumed to be independent of T i . Also it will be assumed that the main goal is to make inferences about β.
Let H denote the cumulative distribution function of A * . Then under the assumptions above, one can derive that the full likelihood function is proportional to In the above, L C n denotes the conditional likelihood of (U ij , δ ij , i = 1, . . . , n, j = 1, . . . , K i ) given the (A i , Z i )'s and L M n the marginal likelihood of the A i 's given the Z i 's. For estimation of β, it is apparent that a natural way would be to maximise the full likelihood function L F n . However, this would not be easy since L F n also involves two unknown functions H(·) and (·). Another method would be to base the estimation on the conditional likelihood function L C n but as discussed above, this may not be efficient since it ignores the information about the parameters contained in L M n . In the following, we propose an efficient composite approach.

An efficient pairwise likelihood approach
In this section, we will present the proposed pairwise likelihood estimation procedure and one key feature of the method is that one does not need to impose any parametric assumption on and estimate the underlying truncation distribution H. For (·), by following Zhou, Hu, and Sun (2017) and others, we will employ the sieve approach to approximate it.
To derive the pairwise likelihood function, conditional on (Z i , Z j ) and the order statistic Also the generalised odds ratio R ij (β, 0 ) can be rewritten as under the semi-parametric transformation model (1). Define , and let θ = (β, 0 ). To account for the different magnitudes of L C n and L P n , for estimation of θ, we propose to consider the log pairwise pseudo-likelihood function It is easy to see that the log pairwise pseudo-likelihood function n (θ ) is a function of θ only or does not involve the unknown function H(·). Also n (θ ) is formed from log L C n and log L P n , which means that it retains the majority of the information about β and (·) and thus yields more efficient estimation than C n (θ ) (Liang and Qin 2000). On the other hand, it still involves the unknown function 0 (·) and for this, by following Zhou et al. (2017) and others, we suggest to approximate 0 (·) by Bernstein polynomials before maximising n (θ ). Specifically, define the parameter space Bernstein basis polynomials of degree m = o(n v ), and the sieve space for some v ∈ (0, 1) and M n = o(n a ), which controls the size of the sieve space with a ∈ (0, 1). In practice, [t l , t u ] is usually taken as the range of the A i 's, U i 's, and V i 's. To remove Letθ n = (β n ,ˆ n ) denote the estimator of θ given by maximising n (θ ) over the sieve space n with the re-parameterisation above. One can obtainθ n by directly solving the equations Also for the implementation of the estimation approach proposed above, it is clear that one needs to determine the degree of Bernstein polynomials m, which controls the smoothness of the approximation, as well as the transformation function G in model (1). For this, we suggest to consider several different values of m and n v and various transformation functions such as those discussed in the next section, and then employ the AIC and chose the combination of (m, G) that minimised Some more discussion and guidelines on the implementation as well as the selection of (m, G) are given below. For the determination ofθ n , in the numerical study below, the function tool fsolve in Matlab will be used.

Asymptotic properties
Now we will establish the asymptotic properties ofθ n including the asymptotic consistency and normality as well as the convergence rate. For this, let θ 0 = (β 0 , 0 ) denote the true value of θ and define the distance between θ 1 = (β 1 , 1 ) and θ 2 = (β 2 , 2 ) ∈ as where · is the Euclidean norm and First we will give the asymptotic consistency and convergence rate ofθ n and then the asymptotic normality ofβ n as n → ∞.
where v ∈ (0, 1) such that m = o(n v ) and d is defined in Condition A.1.
, denoting the semi-parametric efficient bound for β with respect to the log pairwise pseudo-likelihood function r ij (θ ) defined in the Appendix, with I * (β 0 ) given in the proof.
The proof of the theorems above is sketched in the Supplementary Material. To see why the proposed method yields a more efficient estimator than the conditional method theoretically, note the fact that by following Huang and Qin (2013) and assuming that 0 is known, the conditional estimatorβ C n is the solution to while the proposed estimatorβ n is the solution to with V C 1 and V C 1 + V P 1 denoting the asymptotic covariance matrices, respectively.
By using the delta method, we have that where c 1 (θ ) and p 12 (θ ) are defined in Appendix. To see the relationship between and C , define In other words,β n is more efficient thanβ C n when D is positive definite. Also, note from Theorem 4.2 that the choice of v = 1/(1 + d) yields the optimal rate of convergence n d/{2(1+d)} , which equals n 1/3 when d = 2 and improves as d increases.
For inference about β, it is clear that we need to estimate the covariance matrix ofβ n and for this, a simple and direct approach would be to treat the problem as a parametric problem and then use, for example, the EM methods (Louis 1982). However, this would not give a valid estimation since the standard maximum likelihood theory does not apply here. We could also derive a consistent estimator of and it will be seen from the supplementary material that this would not be straightforward. Corresponding to these, the following simple bootstrap procedure will be used (Efron and Tibshirani 1993), and the numerical studies below indicate that it works well. Let B be a prespecified positive integer and for each b = 1, . . . , B, draw a simple random sample O (b) = {O (b) i ; i = 1, . . . , n} of size n with replacement from the observed data O = {O i ; i = 1, . . . , n}. Letβ (b) n denote the proposed estimator of β based on the resampled data set O (b) . Then one can estimate the covariance matrix ofβ n bŷ

A simulation study
A simulation study was performed to evaluate the empirical performance of the pairwise pseudo-likelihood estimation procedure proposed in the previous sections. It was assumed that there exist two covariates Z 1 and Z 2 following the Bernoulli (0.5) and the uniform distribution over (0, 1), respectively. The failure times were generated from the semiparametric transformation model (1) with G(x) = r −1 log(1 + rx) (r ≥ 0) and 0 (t) = t. Also we generated the underlying truncation times A * from either the exponential distribution or the uniform distribution with the mean depending on the truncation rate. For the given sample size n, the realisations of A * , T * , Z * were generated until n subjects satisfied the sampling constraint A * ≤ T * . Note that these n remaining subjects are referred to as the observations A, T and Z.
For the generation of the censoring intervals or observation times U ij for each subject, we first assumed that there exists a sequence of pre-specified observation time points t 1 < · · · < t k for all subjects as in most of medical follow-up studies. Then it was assumed that each subject is observed with probability p at each of the pre-specified time point t j 's independently, and for subject i, the U ij 's were defined to be the observed pre-specified observation time points.
For the simulation results given below, it was assumed that t j = A + 0.1j, j = 1, . . . , k = 10, and p = 0.8, and they are based on B = 20 and m = [n 1/4 ] with 500 replications. Also we set n = 400 with the truncation rate being 30%, 50%, 70% or 85% in the case of the truncation time following the exponential distribution and n = 400and600 with the truncation rate being 30%, 50% or 70% in the case of the truncation time following the uniform distribution. To assess the proposed method, we calculated the empirical bias (Bias) given by the average of the proposed estimates minus the true value, the sample standard error (SEE) of the estimates, the mean of the estimated standard error (ESE), the 95% empirical coverage probability (CP), and the relative efficiency (RE) defined as the ratio of the sample variance based on the conditional method to that based on the proposed method. Table 1 provides the results on estimation of β given by the proposed method with the true value of (β 1 , β 2 ) being (0.5, 0.5), A * following the exponential distribution, and r = 0, 0.5 and 1. For comparison, we also applied the conditional method and include the obtained results in the table. One can see from Table 1 that the proposed estimator seems to be unbiased and the bootstrap variance estimation procedure appears to work well. Also the CP results suggest that the normal approximation to the distribution of the estimator seems to be appropriate, and as expected, the proposed method is more efficient than the Bias: Empirical bias given by the average of the proposed estimates minus the true value; SSE: the sample standard error of the estimates; ESE: the mean of the estimated standard error; CP: the 95% empirical coverage probability; RE: the relative efficiency which is the ratio of the sample variance based on the conditional method to that based on the proposed method.
conditional method. In addition, the relative efficiency gain of the proposed estimator over the conditional method increased as the truncation rate increased. Figure 1 provides an intuitive representation of the estimated baseline cumulative hazards functionˆ n for the situations corresponding to Table 1 with truncation rates of 30%, 50% and 70%, respectively. The figure shows that the Bernstein polynomial approximation seems to work well. The same conclusions are given in Table 2, which gives the estimation results obtained based on the same set-up as in Table 1 except A * following the uniform distribution with different sample sizes. In addition, Table 2 indicates that when the sample size increased, the proposed estimator yielded better performance as expected.
Note that we considered the two-dimensional β in the set-ups above. By following the suggestion of a reviewer, we also studied the four-dimensional β with true value β 0 = (0.5, −0.5, 1, −1) T under the exponential truncation and the uniform truncation, respectively, along with Z = (Z T 1 , Z T 2 , Z T 3 , Z T 4 ) T . Here Z 1 ∼ Ber(0.5), Z 2 ∼ U(0, 1) and Z 3 and Z 4 follow the standard normal distributions with the covariance of 0.2. Table 3 presents the obtained results with the other set-ups being the same as Tables 1 and 2 and n = 400. One can see that they gave similar conclusions and again indicate that the proposed method works well.
A problem of practical interest is what is the consequence or impact if one ignores the truncation. To see this, we repeated the study giving Table 1 but only calculated the empirical bias for both the proposed method and the proposed method but assuming no truncation and present the results in Table 4. Furthermore, Figure 2 shows the empirical biases for the estimation of β 1 and β 2 based on 85% truncation rate and various values for the transformation model parameter r. They both show that omitting the left truncation can lead to biased estimates.

An example
In this section, the proposed method is applied to a set of left-truncated and intervalcensored data arising from a cohort of hemophiliacs concerning AIDS diagnosis discussed by Kim, De Gruttola, and Lagakos (1993) and Wang et al. (2021) among others. It consists of 257 subjects with Type A or B hemophilia who were at risk for infection by HIV through the contaminated blood factor that they received for their treatment. Among them, only 188 were found to be infected with the virus and 41 were subsequently diagnosed to have AIDS during the study. For each subject, the occurrence of the HIV infection was observed either exactly or within a narrow interval and for the latter case, for simplicity, a common way is to treat the midpoint of the observed interval as the actual HIV-1 infection time (Kim et al. 1993;Wang et al. 2021). Furthermore, for the AIDS diagnosis time, only interval-censored data are available with being left-truncated by the HIV infection time.
For the analysis below, as others, we will focus on the 188 HIV infected subjects and are mainly interested on assessing the effects of the level of hemophilia treatment received and the age described below on the development of AIDS-related symptoms.
The study includes two factors or covariates whose effects on the AIDS diagnosis time were of interest. One is the treatment group indicator: the patients were divided into two groups, heavily and lightly treated, based on the amount of the contaminated blood received. The other is the age indicator, younger than 20 or not at the time of HIV infection. For the application of the proposed estimation procedure, define T i to be the AIDS diagnosis time of the patient i from 1 January 1978, the beginning of the study, with the time scale being half a year. Also define Z 1i = 0 if the patient i belongs to the lightly treated group and 1 otherwise, and Z 2i = 0 the ith patient's age was below 20 and 1 otherwise.
To apply the proposed method, as in the simulation study, we set G(x) = r −1 log(1 + rx) (r ≥ 0) in model (1)    The meaning of Bias, SSE, ESE, CP and RE are same as Table 1.
It turned out that the optimal combination was r = 0, corresponding to the proportional hazards model, and m = 4, and the obtained results are given in Table 5 with B = 20, 100 or 200 for the variance estimation.
The results include the estimated covariate effects, their estimated standard errors and the p-values for testing the covariate effect being zero. For comparison, we also obtained and include in the table the results given by the conditional method. One can see from the table that all analysis results are consistent and suggest that there was no significant relationship between the age and the AIDS diagnosis time. In contrast, it appears that the patients in the heavily treated group had a significantly shorter time or higher risk of being diagnosed with AIDS than those in the lightly treated group. Following the suggestion of a reviewer, we also performed the analysis by applying the proposed method but assuming a parametric model such as the Weibull model and the proportional odds model. Table 6 provides a summary of the results with B = 20 and gave similar conclusions as before except the proportional odds model, which does not provide a clear indication of the significance of treatment extent. To see the results graphically, Figure 3 displays the estimated survival functions given by the proposed method for the subjects in the four groups defined by the two covariates. Figure 4 presents the nonparametric maximum likelihood estimator (NPMLE) of the survival function given by the corrected Turnbull estimator proposed in Frydman (1994). The estimated survival functions again demonstrate that the covariate 'Age' had no influence while the two survival functions with different treatment groups show a non-ignorable difference.

Conclusion
In this paper, we considered the estimation problem about the semi-parametric transformation models when one faces left-truncated and case K interval-censored failure data, for which there did not seem to exist an established inference procedure. As mentioned above, a key feature or advantage of models is their flexibility as they allow one to model various types of covariate effects and include many commonly used models such as the  Figure 3. Estimates of the survival functions for different treatment groups and ages: lightly group with age below 20 (solid), lightly group with age over 20 (dashed), heavily group with age below 20 (dash-dot) and heavily group with age over 20 (dotted).
proportional hazards and odds models as special cases. For the problem, an effective pseudo-likelihood estimation method was proposed with the use of the sieve approach. The asymptotic properties of the resulting estimators were established and the conducted simulation study indicated that the proposed method works well under actual conditions. Compared with the traditional conditional approach, the proposed method made use of the relevant information in the marginal likelihood of truncation time and does not need the estimation of the unknown truncation distribution.  In the previous sections, for simplicity, we have assumed that covariates do not have effects on the left truncated time or variable and it is apparent that this may not be true in practice. To address this, one could employ model (1) or other commonly used regression models and an estimation procedure similar to that proposed above could be developed. The same is true for the censoring mechanism or the observation process that generates the U ij 's. Another assumption used in the above is that both left truncation variable and the observation process are independent of the failure time of interest. It is clear that this could be violated sometimes, or the observation process could be related to the failure time of interest. In other words, we have dependent or informative interval censoring (Sun 2006) and as a future research direction, It would be useful to generalise the proposed method to these situations although would not be straightforward as much more complicated modelling would be needed in general. Another research direction is to develop some goodness-of-fit tests for the selection of a particular transformation function or model in practice.

Funding
This work was partially supported by the National Nature Science Foundation of China (Grant Nos. 11801212, 12071176). support of A, U j is contained in an interval [a, b], where 0 < a < b < ∞ with 0 < 0 (a) < 0 (b) < ∞ and j = 1, . . . , K.