A destructive shifted Poisson cure model for interval censored data and an efficient estimation algorithm

Abstract In this paper, we consider a competitive risk scenario in which the initial number of competing risks, assumed to follow a shifted Poisson distribution, is subject to a destructive element. In such settings, both the number of initial risks and risks remaining active after destruction represent missing data. Assuming the population of interest to have a cure component and the form of the data as interval-censored, we extend the destructive shifted Poisson cure model with Weibull lifetimes to accommodate interval-censored data and develop an efficient expectation maximization algorithm for this model. By using the conditional distributions of the missing data, the conditional expected complete log-likelihood function is decomposed into simpler functions which are then maximized independently. An extensive simulation study is carried out to demonstrate the performance of the proposed estimation method through calculated bias, root mean square error, and coverage probability of the asymptotic confidence interval. Also considered is the efficacy of the proposed EM algorithm as compared to direct maximization of the observed log-likelihood function. Finally, data from a crime recidivism study is analyzed for illustrative purpose.


Introduction
Conventional survival analysis methods assume that all individuals are susceptible to the event of interest given sufficient follow-up time.However, due to the advancements in medical science, in observing a cohort of patients with a certain type of cancer it may be seen that a proportion of the patients experience no recurrence of the disease for the duration of a long follow-up time and beyond.This group of patients is described in the literature as cured or non-susceptible.The remaining group of patients are subject to recurrence of the disease and are described as susceptible.Thus, the population may be viewed as a mixture of susceptible and non-susceptible patients.Survival models accommodating a cure fraction are called cure rate models.Such cure models are instrumental in reliability and biomedical science and commonly used in cancer clinical trials.
The first cure rate model, developed by Boag (1949), proposed a cure component to represent the proportion of patients not susceptible to recurrence of disease.Known in the literature as the mixture cure rate model, the population survival function of a time-to-event variable Y is given by S pop ðyÞ ¼ p 0 þ ð1 À p 0 ÞS susc ðyÞ, where p 0 is the cure rate and S susc ðÁÞ is a proper survival function of the susceptible patients only.For a book-length account on mixture cure rate model, interested readers may refer to the research monographs by Maller and Zhou (1996) and Peng and Yu (2021).While the mixture cure rate model is most widely used, Chen, Ibrahim, and Sinha (1999) proposed a Bayesian approach for a new cure rate model that assumes an event of interest (e.g., relapse of cancer) to be related to a number of carcinogenic cells (competing risks) left active after an initial treatment.The authors assumed the latent number of competing risks to follow a Poisson distribution, and the new model is known in the literature as the promotion time or Poisson cure rate model.In the promotion time cure rate model, the population survival function is given by S pop ðyÞ ¼ e Àgð1ÀSðyÞÞ , with g representing the mean number of competing risks and SðÁÞ representing the common survival function corresponding to the promotion time of each competing risk.Rodrigues et al. (2009) developed the Conway Maxwell Poisson (COM-Poisson) cure rate model which can account for both over-dispersion and under-dispersion and includes well-known cure rate models, such as those proposed by Boag (1949) and Chen, Ibrahim, and Sinha (1999), as special cases; see also Wang and Pal (2022) and Pal and Balakrishnan (2017a).Balakrishnan and Pal (2012, 2013, 2015a, 2015b, 2016) developed the steps of the expectation maximization (EM) algorithm for the determination of the maximum likelihood estimates (MLEs) for model parameters of the flexible cure rate model of Rodrigues et al. (2009) based on non-informative right censored data and different parametric assumptions on the susceptible lifetime distribution; see also Balakrishnan et al. (2016).The destructive cure rate model, developed by Rodrigues et al. (2011), introduces to the COM-Poisson cure rate model a natural destructive process which imposes on the original number of competing risks.Borges, Rodrigues, and Balakrishnan (2012) extended the destructive cure rate model by incorporating a biological dependence between the initiated cells, and Gallardo, Bolfarine, and Pedroso-de Lima (2016) included the effect of bivariate random variable (U, V), where U is related to relapse times of the disease caused by non-destroyed cells and V is associated with the cure rate corresponding to a particular clinic under study.While Rodrigues et al. (2011) utilized direct maximization of the (observed) log-likelihood function for maximum likelihood estimation of the parameters of the destructive cure rate model, alternative estimation methods have been explored to address problems with convergence of the classical method depending on choice of initial parameter values.The EM algorithm has been developed for the destructive cure rate model using varied distributional assumptions for the competing risk variable; see Pal and Balakrishnan (2016, 2017b, 2017c, 2018), Pal, Majakwara, and Balakrishnan (2018), and Majakwara and Pal (2019).Gallardo, Bolfarine, and Pedroso-de Lima (2016) developed a variation of the EM algorithm for the destructive weighted Poisson cure rate model in which the complete log-likelihood function is split into simpler functions to be maximized independently.
Though traditional survival models, as well as the cure rate models discussed previously, assume the data to be right censored, it is not uncommon in practice for data to be interval censored.In a scenario where subjects are not monitored continuously but rather observed at discrete follow-up times, an observed lifetime may be said only to lie in an interval.The mixture cure rate model was extended to accommodate interval-censored data while considering spatial correlation, and Markov Chain Monte Carlo methods were used to develop Bayesian inference (Banerjee and Carlin 2004).Kim and Jhun (2008) introduced the effect of covariates to the mixture cure rate model under interval censoring and developed the EM algorithm for this framework.Pal and Balakrishnan (2017d) developed the steps of the EM algorithm for the determination of the MLEs of the COM-Poisson cure rate model parameters with interval censoring; see also Wiangnak and Pal (2018).
This paper aims to extend the destructive cure rate model developed by Rodrigues et al. (2011) to accommodate interval-censored data assuming initial competing risks follow a shifted Poisson distribution.The shifted Poisson distribution is preferable to the (standard) Poisson distribution for settings in which no cure component is present prior to a destructive process imposed by intervention.Applications exist in biomedical studies, such as the study of tumor recurrence in cancer patients, where all individuals are susceptible to the event of interest prior to treatment.In criminal recidivism applications, a group of inmates is susceptible to committing crime at time of arrest, however rehabilitative measures such as counseling or vocational training may take place during incarceration with the goal of preventing a relapse of crime.By using the identity weight function, the shifted Poisson distribution allows for modeling the scenario where at least one initial risk exists without introducing an additional parameter.Noting that both right-censored and left-censored data may be considered special cases of interval-censored data, this work provides a generalization of the destructive cure rate model.Further, the EM algorithm proposed by Gallardo, Bolfarine, and Pedroso-de Lima (2016) is implemented in the interval-censored setting with a simulation study.
The rest of this paper is organized as follows.In Sec. 2, we describe the destructive shifted Poisson cure rate model under interval censoring and present the complete log-likelihood function.In Sec. 3, we develop the steps of the EM algorithm for the model in Sec. 2. Section 4 presents two Monte Carlo simulations, first evaluating the performance of the proposed EM algorithm in parameter recovery then comparing the performance of the EM algorithm to that of direct maximization of the observed log-likelihood function under three initial parameter settings.In Sec. 5, the EM algorithm is applied to real data from a study on crime recidivism.Finally, in Sec.6 we summarize and suggest areas for future related research.

Destructive shifted poisson cure rate model under interval censoring
Let us assume the unobserved number of competing risks M to follow a weighted Poisson distribution with probability mass function (pmf) where wðm; /Þ is a non-negative weight function with parameter /, p Ã ðm; hÞ is the pmf of a Poisson distribution with parameter h > 0, and E h ½Á implies that the expectation is taken with respect to p Ã ðm; hÞ: Taking wðm; /Þ ¼ m, the form of (1) becomes thus ðM À 1Þ follows a Poisson distribution with parameter h, and M is said to follow a shifted Poisson distribution.Now suppose that an intervention takes place resulting in a quantity Dð MÞ competing risks remaining active, each of which could result in a detectable tumor with probability p.
To each competing risk M we can associate a Bernoulli random variable X j such that PðX j ¼ 1Þ ¼ p: The remaining quantity of competing risks not destroyed can be modeled as For the cure rate model with the pmf of the number of competing risks as in (2), the distribution of D is given by A proof of proposition 2.1 follows from the results in Rodrigues et al. (2011) and Rodrigues et al. (2012).We note that random variable D may be obtained as a sum of two random variables having Poisson and Bernoulli distributions with respective means hp and p, thus we will say D $ PoisðhpÞ þ BernðpÞ: Let W a be the time taken for the ath active risk to produce a detectable disease (event of interest).The waiting times W a , a ¼ 1, 2, :::, are assumed to be identically and independently distributed with a common distribution function Fðt; kÞ, where k is an unknown set of parameters, and are also independent of D. Because the number of active competing risks D and lifetime W a associated with a given risk are latent variables, one generally only observes the time taken by the first active risk to produce the event.To accommodate the presence of a cured proportion, the timeto-event or lifetime is defined as where W 0 is such that P½W 0 ¼ 1 ¼ 1: After some algebra, the population survival function can be expressed as (Rodrigues et al. 2011) S pop ðt; h, p, kÞ ¼ PðT > tÞ ¼ exp fÀhpFðtÞgð1 À pFðt; kÞÞ: For the sake of simplicity, we will use FðÁÞ instead of FðÁ; kÞ: Similarly, we will use SðÁÞ instead of SðÁ; kÞ: We note that S pop ðÁÞ is an improper survival function with corresponding cure fraction p 0 ¼ S pop ð1; h, p, kÞ ¼ e Àhp ð1 À pÞ: To study the effects of covariates, we propose to use two sets of covariates, z 1 and z 2 , where z 1 is related to the initial number of competing risks and z 2 is related to activation probability for non-destroyed risks, with corresponding link functions where b 1 and b 2 represent vectors of regression coefficients.We note that z 1 and z 2 share no common elements, and an intercept term is excluded either from b 1 or from b 2 to avoid identifiability problems (Li, Taylor, and Sy 2001).Let w ¼ ðb 1 , b 2 , kÞ denote the vector of unknown parameters.

Form of data and likelihood function
We consider a scenario where the true lifetimes are not exactly observed and are subject to interval censoring.Let T i denote the i-th patient's true failure time (unobserved), for i ¼ 1, 2, :::, n, with n denoting the number of patients in the study.To develop an interval censoring scheme, let us assume the i-th patient is observed at times Y i ¼ ðY 0ðiÞ , Y 1ðiÞ , :::, Y jÀ1ðiÞ , Y jðiÞ Þ, with 0 < Y 0ðiÞ < Y 1ðiÞ < Á Á Á < Y jðiÞ < 1: In a practical setting, observation occurs only until either the event of interest has occurred or the lifetime is right-censored.In the case that the lifetime is observed, the interval ðY jÀ1ðiÞ , Y jðiÞ Þ is known to contain T i , and we consider the lifetime to be interval-censored.If the i-th patient's failure time did not occur prior to the last observation time, we consider the lifetime right-censored and can only conclude the lifetime is contained in the interval ðY jðiÞ , 1Þ: The observed data is denoted as D obs ¼ ðl, r, dÞ, where l ¼ ðl 1 , l 2 , :::, & Based on the observed data ðl, r, dÞ, the observed likelihood function under non-informative censoring is given by while the observed log-likelihood function is given by The complete data are denoted by D comp ¼ ðl, r, d, M, DÞ which includes the observed data and the missing data, where M ¼ ðM 1 , M 2 , :::, M n Þ and D ¼ ðD 1 , D 2 , :::, D n Þ are the missing data.Following Yakovlev and Tsodikov (1996), the joint distribution of the complete data can be expressed as Noting that the second and third terms in the product above are well defined, we focus on the derivation of For the cure rate model with the pmf of the number of active risks as in (3), the joint distribution of ðl i , r i , A proof of proposition 2.2 is provided in the supplemental material.Using proposition (2.2), the complete data likelihood function may be defined as The complete data log-likelihood function can then be written as where and K is a constant independent of model parameters, given by

EM algorithm
In this section we present the construction of the EM algorithm to produce estimates of the parameters for the destructive shifted Poisson cure rate model under interval censoring.The first step in the implementation of the EM algorithm (McLachlan and Krishnan 2008) requires taking the conditional expectation of a complete data log-likelihood function, such as (5), given some proposed parameter values.The second step requires maximizing the conditional expected complete data log-likelihood function.In the traditional approach, this may involve maximizing a complicated function that involves all model parameters.Such an approach is computationally less efficient and may lack robustness with respect to initial values, specifically in the presence of a large set of unknown parameters.The implementation of the EM algorithm proposed here is motivated by the work of Gallardo, Bolfarine, and Pedroso-de Lima ( 2016) and uses the conditional distributions of both initial competing risks and active competing risks to decompose the conditional expectation of l c ðwjD comp Þ into three simpler functions that can each be maximized independently.This makes the entire formulation of the EM algorithm much more efficient. Let 2 , k ðkÞ Þ be the estimate of w at the kth iteration, and let Qðwjw ðkÞ Þ denote the conditional expectation of l c ðwjD comp Þ given the observed data and w ðkÞ : Then by ( 5), where DðkÞ i ¼ EðD i jD obs , w ðkÞ Þ, MðkÞ i ¼ EðM i jD obs , w ðkÞ Þ, and K Ã is a constant independent of w: The following results will be needed to compute the required conditional expectations.
Proposition 3.1.For the cure rate model with the pmf of the number of competing risks as in (2), the conditional distribution of M i À d i given the observed data under interval censoring is given by A proof of proposition 3.1 is provided in the supplemental material.Proposition 3.2.For the cure rate model with the pmf of the number of active risks as in (3), the conditional distribution of D i À d i given the observed data under interval censoring is given by A proof of proposition 3.2 is provided in the supplemental material.Applying the results of propositions 3.1 and 3.2, we can compute DðkÞ i , MðkÞ i as and k, respectively.This can be done using readily available optimization routines in R such as the "optim()" function.In this regard, interested readers may also look at new optimization techniques studied by Pal andRoy (2020, 2021).The steps of the EM algorithm can be summarized as follows: Step 1 (Expectation step or E-step): For i ¼ 1, :::, n, compute DðkÞ Step 3 (Iterative step): The E-step and M-step are repeated until a suitable convergence criterion is met.For this purpose, we use the relative difference in successive values of the estimates, j w ðkþ1Þ Àw ðkÞ w ðkÞ j, as stopping criterion with a tolerance value of 10 À4 : Note that in the work of Pal and Balakrishnan (2018), an EM algorithm has been developed to estimate the parameters of the destructive length-biased Poisson cure rate model when the form of the data is right censored.In the construction of the EM algorithm, the authors considered the latent cured statuses to be the missing data.This resulted in one complicated objective function (containing all model parameters) to be maximized in the M-step of the EM algorithm.Such an approach is certainly not computationally efficient, and the computational complexity increases with an increase in the covariate dimension.In this paper, we develop an EM algorithm for estimating the parameters of the destructive length-biased Poisson cure rate model when the form of the data is interval censored.In this case, we assume the latent number of initial competing risks (M) and the latent number of active competing risks (D) to be the missing data.Interestingly, this approach, motivated by the work of Gallardo, Bolfarine, and Pedroso-de Lima (2016), results in an objective function that can be split into simpler functions.Each simple function can be maximized independently of the others since these simple functions do not share common model parameters.This makes the entire computation much more efficient and also suitable for incorporating more covariates.

Simulation study
This section presents the results of two simulation studies.The first study evaluates the performance of the proposed EM algorithm to recover parameter values, while comparison of the EM algorithm to direct maximization of the observed log-likelihood function is considered in the second study.

Parameter recovery
In this section, we evaluate the performance of the proposed EM algorithm to recover parameter values for a simulated data set.This empirical study partially mimics the real melanoma dataset which was used for illustrative purposes by Rodrigues et al. (2011) using covariates of treatment group (1: treatment, 0: placebo) and tumor thickness (in mm).

Data generation
To simulate covariate data we first generated treatment group, denoted by z 1 , from a Bernð0:5Þ distribution.Noting that tumor thickness values in the melanoma data set range from 0.1 to 17.42 mm, we generated tumor thickness values, denoted by z 2 , from a uniform Uð0:1, 20Þ distribution.We link parameter h to treatment group only using h ¼ e b 11 z 1 and parameter p to tumor thickness only, using p ¼ exp ðb 20 þb 21 z 2 Þ 1þ exp ðb 20 þb 21 z 2 Þ : Note that only one regression parameter corresponds to parameter h in order to circumvent problems with identifiability in the sense of Li, Taylor, and Sy (2001).A higher cure rate, and consequently a smaller value of h, is expected for the treatment group, while a lower cure rate and larger value of h is expected for the placebo group.A negative value of b 11 is consistent with this expectation, and we select b 11 ¼ À0:5: To select regression parameters for p, we posit that the activation probability will be proportionally higher for higher values of tumor thickness and select values b 20 ¼ À1, b 21 ¼ 0:1 in accordance with this observation.We shall assume the waiting time W to follow a Weibull distribution with shape parameter 1 k 1 and scale parameter 1 k 2 , and density function given by Random censoring was introduced through censoring time C following exponential distribution with rate a ¼ 0:05: To generate interval censored lifetime data ðl i , r i , d i Þ, i ¼ 1, 2, Á Á Á , n, we execute the following steps: Step 1: Generate censoring time C i , competing risks M i $ PoisðhÞ þ 1, and damaged cells D i jM i ¼ m i $ Binðm, pÞ; and data generation is complete.
Step 3: If D i > 0, generate times to event due to each non-eliminated risk, W j , j ¼ 1, 2, Á Á Á , D i , from the considered Weibull distribution with parameter k; generate l 1i from U(0, 1) distribution and l 2i from Uð0:2, 0:7Þ distribution.Construct intervals ð0, l 1i , ðl 1i , All simulations are done using the R statistical software (version 4.0.5) and all results are based on M ¼ 1000 Monte Carlo runs.The choice of the lifetime parameter k was obtained by equating the mean and variance of the underlying Weibull distribution to fixed values.For this purpose, we considered two different choices for the variance as 1.5 and 3 for the same fixed choice of mean as 5, which gives us two possible choices of ðk 1 , k 2 Þ as ð0:215, 0:183Þ and ð0:316, 0:179Þ: Sample sizes of both n ¼ 100 and n ¼ 200 are used in order to observe the performance of the algorithm under small and moderate sample sizes.This study applies the EM algorithm proposed in Section 3 to interval-censored data from the destructive shifted Poisson cure rate model, simulated using the parameters and data construction methods outlined above.

Parameter estimation
To assess the performance of this method in parameter recovery, we report the empirical bias and root mean square error (RMSE) of the estimates, as well as the coverage probabilities (CP) of the asymptotic confidence intervals at 95% confidence level under the assumption that the estimators are asymptotically normally distributed.The results are presented in Table 1.
Computational code is available from the first author upon request.Initial values of parameter estimates for a given Monte Carlo run were selected as follows: for a given parameter C, initial guess C init is generated such that C init ¼ C þ Uð0, 0:1ÞjCj: We observe that in all cases, the EM algorithm produces parameter estimates with high accuracy.The empirical coverage probabilities are close to the nominal level for all model parameters, and in all cases the bias and RMSE decrease with an increase in sample size, which is consistent with the large sample properties.

Comparison with direct maximization of log-likelihood function
In this study, we compare the performance of the proposed EM algorithm to that of direct maximization of the observed log-likelihood function using three different methods for selecting initial parameter estimates: where i $ Uð0, 0:5Þ, i ¼ 1, Á Á Á , 5: For each true and initial value parameter setting, M ¼ 1000 convergent Monte Carlo runs were performed with n ¼ 100 using both the proposed EM algorithm and direct maximization of the observed log-likelihood function with the "optim()" function in R. The median (Med) for each parameter under all settings was computed.To identify atypical estimate values, we compute Q 1 , Q 3 , and IQR ¼ Q 3 À Q 1 , the first quartile, third quartile, and interquartile range, respectively, and report the percentage of times that the estimate is outside the interval ðQ 1 À 1:5 Â IQR, Q 3 þ 1:5 Â IQRÞ, denoted by PT. Results are presented in Table 2.With the exception of median of To compare the divergence rate of direct maximization of the observed log-likelihood function to the proposed EM algorithm, we compute the percentage of divergent runs out of M ¼ 1000 Monte Carlo runs of each method using data simulated with true parameter values as in Sec.4.1.1,sample sizes of both n ¼ 100 and n ¼ 200, and initial parameter estimates w 0 , w 1 , and w 2 : We note that parameter setting 1 is w ¼ ðÀ0:5, À 2, 0:1, 0:215, 0:183Þ and parameter setting 2 is w ¼ ðÀ0:5, À 2, 0:1, 0:305, 0:179Þ: Additionally, in order to provide the best basis for comparison, the same simulated data sets and initial parameter estimates were used for both estimation methods under each sample size and parameter setting.Percentage of divergent runs, denoted by Div (%), are presented in Table 3. Direct maximization of the observed log-likelihood function is seen to be divergent in all settings, with the lowest divergent percentage of 16%, and percentages increasing to 40% and higher when initial estimates deviate from true parameter values.The EM algorithm, however, is observed to be convergent in all settings.

Illustration using data from a crime recidivism study
In this section, we apply the proposed EM algorithm to a real data on crime recidivism.The dataset, available in the "penPHcure" package in R software, includes 432 inmates who were released from Maryland state prisons from October 1971 to June 1973.Subjects were followed for one year after release, with the event of interest being rearrest.The aim of this study, conducted by Rossi, Berk, and Lenihan (1980), was to investigate the relationship between the time to first arrest after release and some covariates observed during the follow-up period.The data contains continuous covariates age at release (Age) and number of prior convictions, and binary covariates race (black or other), marriage status at release, whether the subject received financial aid after release, whether the subject had full-time work experience prior to incarceration, and whether the individual was working full time during the observation period (EMP, 1:Employed and 0:Not Employed).It bears mentioning that while the employment status of some subjects was observed to vary over time, for the purpose of this analysis we take EMP as employment status during the last observation interval.To mimic the covariate configuration in Section 2 we considered regression models linking each potential continuous and binary covariate pair to model parameters h and p.The proposed algorithm produced MLEs for models using each potential covariate pair, and results from all models are available from the first author upon request.For illustrative While the aim of this study is not model selection, we compare AIC and BIC values for the above models to identify the model that best fits the data by first selecting appropriate initial parameter estimates, applying the proposed EM algorithm to obtain MLEs, and computing the observed log-likelihood function (4) using MLEs.The process by which initial parameter estimates were selected is described in the supplemental material, and values for the observed log-likelihood function (Obs log-lik), AIC, and BIC are reported in Table 4.We choose the model producing the lowest AIC and BIC values, Model 1.1, as our working model.Kaplan-Meier curves stratified by EMP, as shown in Figure 1, level off to non-zero proportions which supports the presence of a cure component in the data.The stratified Kaplan-Meier curves do not intersect and demonstrate a trend of higher survival times for those subjects who were employed during the last observation interval.Table 5 presents the estimates and standard errors of the parameters of the working model, as well as p-values and 95% asymptotic confidence intervals.The negative sign of the estimate for b 11 agrees with the stratified Kaplan-Meier plot in Figure 1 and the predictor of EMP is significant at a 5% level of significance.The estimate of b 21 < 0 indicates that older individuals tend to relapse later, but the predictor of Age fails to be significant at a 5% level of significance.
Though the modeling framework differs from the structure adopted in this work, we find it interesting to compare the regression estimates reported herein with the analysis performed by Beretta and Heuchenne (2021) using the methodology developed in R package "penPHcure."The semiparametric proportional hazards (PH) cure model of Sy and Taylor (2000) was extended to accommodate time-varying covariates by using the covariate's time-weighted average over all observation periods and a PH cure model was fit to the crime recidivism data with all explanatory variables included in both the latency and incidence components.Confidence intervals created using the basic bootstrap method identified only one covariate, EMP, as statistically significant at a 5% level of significance.The regression coefficient for (time-varying) EMP was statistically significant in the latency component with a negative value which implies that among susceptible individuals, working full time after release is associated with a lower risk of rearrest.While the model as described in 2 does not separate the covariates' effects on the incidence and the latency, and the regression developed in this section does not consider the time-varying nature of EMP, our analysis is consistent with the conclusion of Beretta and Heuchenne in finding EMP to have a statistically significant effect associated with a reduced incidence of rearrest.Figure 2 shows the predicted survival probabilities for subjects with ages of 18, 23, 27 and 30.9 years, which correspond to the 5 th , 50 th , 75 th , and 95 th percentiles, stratified by employment status.Note that the survival probability for employed subjects is higher across all values of age.While it may be seen by comparing the plots fixing age at 18 and 40.9 years that the survival probability is greater for greater values of age, this effect is more fully observed in Figure 3 where the cure rate is seen to increase in a nearly linear fashion as age increases.
We inspect the goodness-of-fit of our model by using the calculated normalized randomized quantile residuals (Dunn and Smyth 1996).Figure 4 presents the quantile-quantile plot, with each point corresponding to the median of five sets of ordered residuals.The linearity depicted in this plot supports the conclusion that the shifted Poisson destructive cure rate model with Weibull lifetimes provides a good fit to the crime recidivism data.Finally, the assumption of normality of residuals is tested using the Kolmogorov-Smirnov test.The resultant p-value of 0.9946 strongly supports the assumption of normality of residuals.

Conclusion
In this paper, we have generalized the destructive shifted Poisson cure rate model to accommodate interval censoring and, motivated by the work of Gallardo, Bolfarine, and Pedroso-de Lima (2016), developed likelihood inference based on an implementation of the EM algorithm which splits the complete log-likelihood function into simpler functions to be maximized independently.A detailed study is completed examining the performance of the EM algorithm for this model.Through Monte Carlo simulations, we demonstrate that the proposed method produces parameter estimates accurately and performs favorably when compared to direct maximization of the observed log-likelihood function with respect to both lower incidence of atypical estimates and higher rates of convergence.In the real data analysis, the proposed algorithm converged to parameter estimates that were consistent with prior findings.
While the objective of this study was to develop the EM algorithm for the destructive cure rate model under interval censoring and perform a detailed analysis of its performance for the destructive shifted Poisson cure rate model, a valuable future work will be to extend this application to other destructive weighted Poisson cure rate models.For example, taking weight function wðm; /Þ ¼ Cð/ À1 þ mÞ in (1) yields the destructive negative binomial model.Additionally, considering a generalized family of lifetime distributions, for instance, the generalized gamma distribution, would allow us to perform model discrimination for the lifetime distribution.Recent  applications of the stochastic EM algorithm using varied distributional assumptions for the competing risk variable demonstrate superior performance to the EM algorithm assuming right censoring (see Davies, Pal, and Siddiqua 2021;Pal 2021).It is of particular interest to investigate the stochastic EM algorithm in an interval-censored setting and compare its performance to the EM algorithm developed herein.We hope to report on these problems in future manuscripts.
find w ðkÞ that maximizes (7)-(9) in relation to b 1 , b 2 , and k, respectively, to obtain an improved estimate w ðkþ1Þ :

Figure 2 .
Figure 2. Predicted survival probabilities stratified by employment status for patients of different ages.

Figure 3 .
Figure 3. Cure rate against age stratified by employment status.

Table 1 .
Estimates, standard errors (SE), bias, root mean square error (RMSE), and 95% coverage probabilities (95% CP).11 , the medians of estimates from the EM algorithm are either comparable or closer to the true values than the medians of estimates obtained from direct maximization of the log-likelihood function.In comparing the PT values however, we see evidence that the atypical values obtained through the EM algorithm are consistently smaller, most notably when comparing percentage of atypical values in estimating b 11 .

Table 3 .
Comparison of divergence of the EM algorithm with direct maximization of the observed log-likelihood function.

Table 4 .
Model discrimination.Figure 1. Kaplan-Meier plot of survival curves stratified by employment status.

Table 5 .
Estimation results corresponding to the working model for the crime recidivism data.