Generalized log-logistic proportional hazard model: a non-penalty shrinkage approach

This paper considers the pretest and shrinkage estimation methods for estimating regression parameters of the generalized log-logistic proportional hazard (PH) model. This model is a simple extension of the log-logistic model, which is closed under the PH relationship. The generalized log-logistic PH model also has attributes similar to those of the Weibull model. We consider this model for right-censored data when some parameters shrink to a restricted subspace. This subspace information on the parameters is used to shrink the unrestricted model estimates toward the restricted model estimates. We then optimally combine the unrestricted and restricted estimates in order to define pretest and shrinkage estimators. Although this estimation procedure may increase bias, it also reduces the overall mean squared error. The efficacy of the proposed model and estimation techniques are shown using a simulation study as well as an application to real data. We also compare the performance of generalized log-logistic, Weibull, and Cox PH models for unimodal and increasing hazards. The shrinkage estimator poses less risk than the maximum likelihood estimator when the shrinkage dimension exceeds two; this is shown through simulation and real data applications.


Introduction
Proportional hazard (PH) models are a class of survival models similar in concept and interpretation to the Cox PH model.They are used to investigate the effects of several covariates on survival time.These models are increasingly found in biomedical and health services research, where they are used to analyze covariates associated with the occurrence and timing of events, such as hospital admission or death.They assume the relationship between the hazard rate and the log-linear function of covariates is multiplicative.An appealing feature of PH models is that the regression coefficients have relative risk interpretation and many clinicians prefer this for their quantitative results.The fundamental assumption of these models is that the hazard ratio remains roughly constant over time with different covariate levels.For example, if we are comparing a new treatment with the standard treatment, it is assumed that the hazard ratio for a patient on a new versus standard treatment remains constant over time.A common way to handle a violation of this assumption is to use time-dependent covariates and to fit extended or stratified Cox models [1].The accelerated failure time (AFT) model is a parametric survival model that can be used as an alternative to PH models [2], however it is not our main focus in this paper.
The Cox PH model is widely used in survival analysis and models the relationship between a survival time and a set of covariates.This model does not focus on survival times, rather it compares the hazard functions of outcomes [3].It may be preferred over others as it requires no assumption about the survival time probability distribution.In other words, it is a semi-parametric model.On the other hand, a fully parametric PH model does require the specification of a probability distribution for survival times [4,5].This also leads to the added requirement of checking the validity of the chosen distribution.Efron [6] and Oakes [7] found that parametric PH models give more precise estimates to analyze survival data than the Cox PH model under certain circumstances.In fact, the standard errors obtained from a parametric model are smaller than those obtained from a semi-parametric model [8], such as the Cox PH model.
The log-logistic, log-normal, Weibull distributions are the most commonly used parametric time-to-event models.The log-logistic and log-normal distributions belong to the AFT family, which is particularly useful for modeling non-monotone hazard rates [5].Note that the log-logistic distribution also accommodates decreasing hazard functions.However, only a few parametric models are closed under the PH assumption.The Weibull distribution exclusively accommodates monotone hazard functions and uniquely falls under both the AFT and PH families [4].In this paper, we have employed a straightforward extension of the log-logistic model which is closed under the PH relationship.The generalized log-logistic model, which is an extension of log-logistic model, is a three-parameter distribution and shares similarities with the original log-logistic model.Furthermore, it converges to the Weibull distribution in the limit.These characteristics make it capable of effectively handling both monotone and nonmonotone (unimodal) hazard functions.
The main objective of this paper is to use the pretest and shrinkage methods to estimate regression parameters for the generalized log-logistic PH model.We are interested in the case where a subset of the covariates do not significantly contribute to the overall fit of the model.To select the subset of insignificant covariates, we can apply the backward elimination procedure or evaluate the corresponding log-likelihood value.These insignificant covariates are considered auxiliary information and are used to form a restricted model by shrinking the unrestricted model estimate.The pretest and shrinkage estimators optimally combine the restricted and unrestricted estimators to obtain more efficient estimators.This occurs when the shrinkage is adaptive and based on the Euclidean distance between the subspace and the full space.It is important to remember that insignificant covariates are not removed altogether from the model.Instead, the information from these covariates is used to estimate the significant covariates' coefficients efficiently based on pretest and shrinkage methods.
Several research works have established the efficiency of the pretest and shrinkage techniques for different survival regression models.LASSO was compared to shrinkage estimators in Ahmed et al. [9] for Weibull regression when the model contained several inactive covariates.Hussein et al. [10] proposed James-Stein shrinkage estimators for the nonparametric regression coefficients in Aalen's additive model under a general linear hypothesis about the coefficients.Shrinkage and penalty methods were applied to the Cox PH regression model when it was suspected that some of the parameters may be restricted to a subspace [11].Hossain & Howlader [12] used the log-normal regression model to evaluate the effectiveness of shrinkage estimators.Shrinkage methods were also applied to the exponential Weibull regression model by Hossain & Khan [13].
The generalized log-logistic PH model is described in Section 2, followed by a discussion about the maximum likelihood and shrinkage estimation processes.A simulation study and its results are analyzed in Section 3. Real data is used to illustrate the usefulness of the proposed method in Section 4. A conclusion in Section 5 provides a brief summary of the results and findings from this paper.

Generalized log-logistic PH model
The generalized log-logistic PH model described in this section is summarized from Khan & Khosa [14].The hazard function for a random variable T under the generalized loglogistic distribution can be written as where κ > 0, γ > 0, ρ > 0 and φ = (κ, γ , ρ) .Note that the hazard function ( 1) is the same as the log-logistic hazard function [5] when γ = ρ.However, it is clear that (1) is closed under the PH relationship when γ does not depend on ρ [14].When κ > 1, this hazard function is unimodal and it is monotonically decreasing when κ ≤ 1.Therefore, as γ → 0, the hazard function (1) approximates to Weibull hazard.Let z = (z 1 , z 2 , . . ., z p ) be a vector of covariates for a PH model where p is the number of covariates.The hazard function when z = 0 is written as h 0 (t; φ) and is known as the baseline hazard function; it is defined by the parameters in φ.The resulting hazard function is written as where the regression coefficients are denoted by β = (β

Maximum likelihood estimation
Model parameters can be estimated using the maximum likelihood (ML) approach.Consider data from a censored random sample.Let t i denote the survival or censoring time of the i th individual according to whether δ i = 1 or 0, respectively.A vector of covariates for individual i is z i = (z i1 , z i2 , . . ., z ip ) .This data can be written as (t i , δ i , z i ), where i = 1, 2, . . ., n.Therefore, the log-likelihood function under the generalized log-logistic PH model is where m = n i=1 δ i , a i = exp(z i β), b i = (γ t i ) κ and θ = (φ , β ) .Score equations are used to calculate the ML estimate for an unknown parameter vector and are obtained by setting the first derivative of the log-likelihood function equal to 0. Following are the first derivatives of the log-likelihood function (3): where Setting κ * = log κ, γ * = log γ and ρ * = log ρ, results in φ * = (κ * , γ * , ρ * ) and removes the range restrictions of the parameters.Therefore, the convergence of iterative procedures for ML estimation is improved and large-sample methods are more accurate [14].The maximum likelihood estimate of θ * = (φ * , β ) can then be obtained by solving the following score equations which are evaluated at φ = exp(φ * ).These score equations will be solved iteratively to obtain the ML estimates.Solving the score equations results in the unrestricted ML estimate (UMLE) for parameters θ = ( φ , β ) .If these are unable to be solved numerically, a numerical nonlinear optimization algorithm such as the Newton-Raphson method may be used.The second partial derivatives of the log-likelihood function are contained in the observed information matrix, I( θ * ).Under some regularity conditions, the asymptotic distribution of θ * is approximately a p+3-variate normal distribution with mean θ * and variance-covariance , where is the (p + 3) × (p + 3) observed information matrix.Furthermore, by the same regularity conditions and multivariate delta method, √ n( θ − θ ) has an asymptotically p+3-variate normal distribution with mean vector 0 and variance-covariance matrix DI( where D is the diagonal matrix diag( φ, 1, 1, . . ., 1) and φ = exp( φ * ) [4,chap. 5].
In the unrestricted model, we assume a moderate number of covariates, p, some of which contain redundant information.A backward elimination procedure can be used to obtain the restricted model.Let the p × 1 regression coefficient vector of the unrestricted model be partitioned as β = (β 1 , β 2 ) , such that p = p 1 + p 2 .In this case, β 1 is the p 1 × 1 regression coefficient vector of the restricted model and β 2 is the p 2 × 1 regression coefficient vector.The restricted model is the unrestricted model subject to the constraint β 2 = 0. We partition θ as θ = (θ 1 , θ 2 ) , where θ 1 = (φ , β 1 ) and θ 2 = β 2 are of orders (p 1 + 3) × 1 and p 2 × 1, respectively.The observed information matrix can be written as The restricted ML estimate (RMLE), θ , under the constraint θ 2 = 0, can be obtained by maximizing the restricted log-likelihood function where λ is a p 2 × 1 vector of Lagrange multipliers.We can test H 0 : θ 2 = 0 using the likelihood ratio statistic which is asymptotically χ 2 p 2 under H 0 for large sample size, n [4,chap.3].

Pretest and shrinkage estimation
The pretest estimator is a discontinuous function of θ and θ and depends on the choice of the level of significance α.If I(•) is the indicator function and the upper 100α% critical value of the test statistic ˆ is c n,α , the pretest estimator can be written as A smoother version of this estimator is obtained using a shrinkage estimator.The shrinkage estimator is a linear combination of θ and θ and is defined as There are scenarios where the shrinkage estimator may reverse the sign and over-shrink the unrestricted estimator.The problem arises when ( ) < 1 and can be alleviated by taking its positive part.This makes it not only a shrinkage estimator, but also a thresholding estimator.Thus, the positive shrinkage estimator can be written as where (z) + = max(0, z).For example, if the true value of (1 − p 2 −2 ˆ ) < 0, the positive shrinkage estimator sets the term to zero.This avoids a negative value and enhances the understanding of the estimate.

Asymptotic analysis of proposed estimators
This section investigates how the proposed estimators behave asymptotically under the following local alternatives.
where δ ∈ p 2 .In this framework, θ 2 is the true value of the parameter, 0 is a centering value and δ ∈ p 2 is a localizing parameter.Using (8), we specify that the centering value 0 lies within the restricted parameter space.The distance between the parameter vector θ 2 and the center value 0 is controlled by the localizing parameter δ and the sample size n.Note that under the local alternative (8) and the usual regularity conditions [15], as n → ∞, the likelihood ratio statistic (4) converges to a non-central χ 2 p 2 ( ) distribution with non-centrality parameter = δ (I 22 − I 21 I −1 11 I 12 )δ [13].As all other estimators are functions of θ and θ , we can derive the asymptotic joint normality of these estimators under the sequence of local alternatives (8), which can then be used to derive the asymptotic distributional risk (ADR) for each estimator.These results are useful in evaluating the relative efficiency of the proposed estimators.To determine the ADR for pretest and shrinkage estimators, the following theorem is used.

Theorem 3.1:
where O is a p 2 × (p 1 + 3) matrix of zeros and B is a p 2 × p 2 identity matrix.Under the local alternative (8) and the usual regularity conditions [15], where Proof: See Appendix.
To evaluate the performance of the proposed estimator, we consider a quadratic loss function where W is a symmetric non-negative definite matrix and θ is one of the proposed estimators.First let's define the mean squared error (MSE) under quadratic loss of an estimator for the reader's convenience. where Theorem 3.2: Under local alternatives K (n) in (8) and the regularity conditions, as n → ∞, the risk functions of the θ , θ, θ PT , θ SE and θ PSE are where Consider χ 2 g,α as the α-level critical value of the central χ 2 -distribution with g degrees of freedom.
Proof: See Appendix.
This theorem discusses theoretical risks of the proposed estimators, including θ , θ , θ PT , θ SE , and θ PSE , under alternative hypotheses.Under H 0 : θ 2 = 0, the estimator θ is the best choice as it strongly dominates θ.However, the performance of θ depends on the validity of H 0 .When the alternative hypothesis is true and δ moves away from the null vector, the ADR of θ monotonically increases and goes to infinity as ϕ Wϕ → ∞.Under H 0 , the difference in ADRs for θ and θ PT is given by q+4 (χ 2 q+2,α , 0)tr(W 13 ).Thus, the gain in risk from using θPT instead of θ is substantial when H 0 is correct.However, when the alternative hypothesis is true, the risk of θ PT approaches the ADR of θ after achieving a maximum value.This suggests that there may not be a clear preference for one estimator over the other.For a suitable choice of the matrix W, it shows that θ SE is expected to perform better than θ for all values of , and strictly better for some values of ≥ 0. Hence, θSE outperforms θ.Moreover, the risk of θ PSE is asymptotically superior to that of θSE in the entire parameter space induced by , so θPSE is also superior to θ .Overall, this comparison suggests that there exists a hierarchy of estimator performance, with θ SE being better than θ, and θ PSE being better than θ SE , in terms of some measure of accuracy risks.Next, we present a simulation study that compares the numerical performance of these estimators for different numbers of insignificant covariates and sample sizes.

Simulation study
To demonstrate the performance of the proposed estimators with respect to the UMLE, we conduct a thorough simulation study.After considering both the underlying hazard rate shape and the effect of censoring proportions, we choose the estimator which accurately predicts the survival time and obtains the smallest mean squared error.

Generating data and simulation designs
Eight to sixteen covariates are considered in a PH regression framework for this simulation study.Of the possible sixteen, there are fourteen continuous and two binary covariates.The continuous covariates are generated using the standard normal distribution, while binary covariates are generated using the Bernoulli (0.5) distribution.The values for the PH regression coefficients are chosen to be β = (0.80, −0.80, 1.2, −1.1, 0) , where the dimension of the zero vector varies from 4 to 12.This corresponds to the covariate vector z = (z 1 , z 2 , . . ., z 16 ) .To assess the performance of the Generalized log-logistic PH (GLLPH) model in comparison to the Weibull PH (WPH) and Cox PH (COXPH) models, survival times are generated from the generalized Weibull distribution, with neither the GLLPH nor the WPH being correctly specified.The corresponding probability density function for this distribution is Under this distribution, φ = (κ, γ , ρ) , where the distributional parameters are κ > 0, −∞ < γ < ∞, and ρ > 0. The hazard function of the generalized Weibull distribution is (a) monotone increasing for κ ≥ 1 and γ ≥ 1, (b) monotone decreasing for 0 < κ ≤ 1 and γ ≤ 1 and (c) unimodal for κ > 1 and γ < 0. This simulation study investigates the effectiveness of the proposed methodology when employing the GLLPH and WPH models.The rationale behind such a setting is that if the GLLPH or WPH performs well, we are more confident in generalizing our results.For these results, we explore scenarios in the simulation studies based on both unimodal and increasing shapes of the hazard function.Results from the log-logistic AFT (LLAFT) model are not included in the simulation study as it performs similar to the GLLPH model.
To investigate the performance of RMLE, PT, SE and PSE with respect to the UMLE, we define three models as follows: • For the simulation model, we assume the true distributional parameters are φ = (1.8,−0.  2), we set θ = (θ 1 , θ 2 ) , where θ 1 = θ sim and θ 2 = β 2 is a vector of regression coefficients for p 2 additional covariates.We assume θ 2 = (0, 0, . . ., 0) , creating p 2 insignificant covariates that provide auxiliary information which may help to improve the estimates of coefficients of the significant covariates.Three different scenarios are considered while holding p 1 , or the number of significant covariates, fixed at four: p 2 = 4, 8 and 12. • The restricted model is simply defined as the unrestricted model subject to the constraint θ 2 = 0.If θ 2 = 0, the proposed estimators do not need to be investigated as this falls under the null hypothesis.We study the behaviour of the proposed estimators when the restricted model is different from the unrestricted model.To do so, we consider = ||θ − θ 0 || 2 = d, where d is a positive constant and θ 0 = (θ 1 , d, 0, 0, . . ., 0) .In other words, represents how far the restricted model is from the unrestricted model in the spirit of the local alternative, H a : = d.All simulation results for RMSE and MSE analyzes as well as the model comparison are summarized in Figures 1-5 for combinations of n, p 2 , cp, and .Note that RMSEs for UMLE, RMLE, PT, SE and PSE are displayed by the solid (black), long dash (grey), dotted long dash (navy), dotted long dash (blue), and long dash (red) lines, respectively.The RMSE values for p 2 = 4, 8 and 12 are distinctively displayed by the curves with squares, circles and triangles.The RMSE and MSE curves are displayed in Figures 1 and 4 for n = 100, and Figures 2 and 5 for n = 200, respectively.The model comparison results are shown in Figure 3.
Figures 1-2 demonstrate that the RMLE (grey curves) consistently outperforms all other estimators when is at or near 0, irrespective of n, p 2 and cp.This superiority is due to the RMLE's unbiased nature.However, as increases, the risk associated with RMLE also increases and becomes unbounded.This is the case for all three PH models: GLLPH, WPH and COXPH.The performance of the PT (navy curves) is best for close to zero, as it becomes a high risk estimator when it depends heavily on how far is from zero.For large values of , however, RMSE of the PT approaches 1 from below.This characteristic suggests that neither the PT nor RMLE is uniformly better than the other when > 0. Both the SE (blue curves) and PSE (red curves) have lower risk (higher RMSE) compared to the UMLE regardless of the accuracy of the selected restricted estimator at hand.This means the shrinkage estimators are suitable even if the RMLE is not correctly specified,   which is consistent with what has been observed in the shrinkage method literature [12].Aside from some sampling fluctuations, we see from Figures 1 and 2 that the PSE has lower risk than the SE for almost all the combinations of n, p 2 and cp.Furthermore, the PSE outperforms the UMLE in the entire parameter space.For fixed n and cp, SE and PSE gain the highest accuracy when p 2 is large (e.g., p 2 = 12; see the curves with triangles at each value) and is zero (no substantial difference between the unrestricted and the restricted models).
To further investigate the effect of p 2 on RMLE, PT, SE and PSE, we compare RMSE curves for n = 100 (Figure 1) and n = 200 (Figure 2) with cp fixed.RMSEs for p 2 = 4, 8and12 are displayed by the curves with squares, circles and triangles, respectively.After increasing p 2 from 4 to 12, it is interesting to note that the RMSEs increase for all estimates in the case of the GLLPH, WPH and COXPH models.Furthermore, Figures 1 and 2 show that the RMSE for p 2 = 4 relative to n = 100 is larger than p 2 = 4 relative to n = 200.For example, a comparison between the RMSE curves for GLLPH of Figures 1 and 2 shows more improvement due to shrinkage for p 2 = 4 with n = 100 as compared to p 2 = 4 with n = 200.The other two PH models obtained similar results.
To compare the performance between GLLPH and WPH models, we generated data from the generalized Weibull distribution with (a) φ = (1.8,−0.1, 0.2) for data exhibiting a unimodel hazard shape, and (b) φ = (1.8,1.25, 0.2) for data exhibiting an increasing hazard shape (Figure 3).For unimodal hazard, Figure 3(a) shows that the WPH estimators have higher MSE's than those of the GLLPH estimators, that is, the RMSE's of RMLE, PT, SE and PSE each with respect to UMLE are higher for the GLLPH than those for the WPH model.As expected, the GLLPH outperforms the WPH model when the hazard shape is unimodal, whereas we see slightly more support for the WPH when the hazard shape is increasing, but no significant difference (Figure 3(b)).
MSE curves are used to explore the effect of cp and quantify the performance of the estimators for cp = 20% and 40% (Figures 4 and 5).For fixed p 2 and n, we see that for each model studied, the larger the value of cp (cp = 40%), the less accurate the estimates are.This is a well known result of survival models.
An increase in sample size n, while holding p 2 , cp and constant, reduces bias and MSE values.This results in an increase in the accuracy of the estimates.Given the covariates and cp, we examine the GLLPH, WPH and COXPH models using simulated survival times.When comparing model performance in our simulation study, we must consider that the Cox model is robust and tends to fit the data well, no matter which parametric model is appropriate.The MSE values for all estimates are highest for the WPH model, compared to the GLLPH and COXPH models, where the estimates of the COXPH model have the lowest MSE values overall.

Application to primary biliary cirrhosis data
Primary biliary cirrhosis (PBC) is an incurable liver disease which eventually results in cirrhosis, as it severely damages the bile ducts in the liver.The Mayo Clinic conducted a trial between 1974 and 1984, where a total of 424 patients with PBC met the eligibility criteria [16]; however, only 312 of those patients chose to participate in the study.The data includes 276 patients with complete information and the event of interest is either death or transplantation, whichever occurs first.Of these patients, 111 died or received a transplant within the study period and 165 patients remained alive by the end of the study.This results in a censoring percentage of approximately 60%.This data is obtained from the R package JM [17].
The regression analysis initially considered 15 covariates: serBilir = log (serum bilirubin in mg/dl), albumin = log (albumin in gm/dl), prothrombin = prothrombin time in seconds, age = age at registration in years, histologic24 = I(stage 1 or 2 of the disease), histologic34 = I(stage 3 of the disease), ascites = I(presence of ascites), spiders = I(blood vessel malformations in the skin), sex = I(male), serChol = log (serum cholesterol in mg/dl), alkaline = log (alkaline phosphatase in U/litre), sgot = log (aspartate aminotransferase in U/ml), platelets = log (platelets per cubic ml/1000), hepatomegaly = I(presence of hepatomegaly or enlarged liver) and trt = I(D-penicillamine).We consider the following unrestricted model, The backward elimination procedure is used to fit the restricted model, resulting in a model which only includes significant covariates.The corresponding log-likelihood and Akaike information criterion (AIC) values obtained from this procedure are summarized in Table 1 for each of the models studied [18].This table reinforces that the restricted model is an appropriate alternative to the unrestricted model.The procedure eliminates 11 covariates from the unrestricted model, which are noted as θ 2R = (β 5 , β 6 , β 7 , β 8 , β 9 , β 10 , β 11 , β 12 , β 13 , β 14 , β 15 ) .Several different models are considered, however each distribution under study obtains the same final model.The resulting model includes the remaining four covariates (serBilir, albumin, prothrombin and age) and is fitted under the constraint θ 2 = 0.The PT, SE and PSE are obtained using this model and the estimates are calculated using (5) with α = 0.05, ( 6) and (7), respectively.Bootstrapping is used to calculate estimates and standard errors.
The GLLPH, WPH, LLAFT and COXPH models are fitted using the PBC data.We include LLAFT model in our real data analysis as the generalized log-logistic and loglogistic models belong to the same family and only differ by one parameter.As seen in Table 1, the COXPH model has the smallest AIC value, indicating this is the best-fitting model, as expected.The other three models have higher AIC values, all of which are very 6.Residual plots using restricted estimates of each of the model parameters for PBC data.
similar to each other.From the analysis of AIC values alone and excluding the COXPH model, there is no obvious superiority of one model over another.The GLLPH and WPH residual plots of the restricted model in Figure 6 look very similar, however the GLLPH residuals are in fact slightly smaller than those obtained under the WPH model (exact values omitted here).While analyzing residual plots, interest is in the shape of residuals.A good fit is evident when the residuals form a straight line with unit slope.For the most part, the residuals lie close to the unit slope line under each of the models studied, indicating goodness of fit to the data.However, residuals under the LLAFT model lie closest to the unit slope line as compared to the other models.This shows that, in this case, the LLAFT model may be superior to the other models under study.
The κ value obtained from the GLLPH model is 1.561, while the other parameter values for the models are approximately equal to 0, that is, ( ρ ≈ 0, γ ≈ 0).As mentioned earlier, κ > 1 and γ ≈ 0 models a monotonically increasing hazard function.Since γ ≈ 0, the hazard function obtained approximates that of the Weibull distribution.Both the GLLPH and WPH models produced the parameter estimate κ = 1.561, while the LLAFT model obtained κ = 2.01.Therefore, each of these three models have an increasing hazard rate.
The numerical results obtained from the analysis of the PBC data are summarized in Table 2.We use the relative efficiency (or relative mean squared error, RMSE) with respect to the UMLE to compare these estimates, calculated as described in the simulation study.As expected, the COXPH model achieves the highest RMSE values although some standard errors are double those obtained under the LLAFT model.
On the other hand, the LLAFT obtains the lowest standard errors overall, but it also produces the lowest RMSE values.For the GLLPH, WPH and COXPH models, the RMLE is the most efficient estimator, as this model excludes any insignificant covariates.Interestingly, the shrinkage and positive shrinkage estimators are equally efficient for the GLLPH, WPH and LLAFT, while the positive shrinkage estimator is slightly more efficient than the shrinkage estimator under the COXPH model.In all cases, the shrinkage and positive shrinkage estimators outperform and are more efficient than the pretest estimator.While the estimators of GLLPH have slightly smaller RMSE values than the WPH model estimators, it generally has smaller standard error estimates of coefficients for the covariates.
Based on this tradeoff, the GLLPH is more efficient compare to the other models.The estimates of coefficients of the LLAFT model are negative because in a PH model, we consider measuring the association between a covariate and the risk an event, whereas in an AFT model, we consider measuring the association between a covariate and the time to the occurrence of an event.Thus, a positive coefficient in a PH model implies that the hazard rate is increasing, and hence the survival time is shortened.In such a case, the coefficient for an AFT model must be negative, that is, an increase in the covariate value is associated with a reduction in survival time or an increase of hazard rate.

Conclusion
The Cox PH model is one of the most commonly used models in survival analysis as there are no assumptions required about the survival time probability distribution.This model, which is semi-parametric, has much better robustness as it usually fits the data well no matter which parametric model is appropriate.However, this model may produce larger standard error of estimates if we know that the data are obtained from a parametric model (the AFT model, for example) [8].We propose pretest and shrinkage estimators for the estimation of regression parameters of the generalized log-logistic PH model; specifically, the estimation of these parameters when a subset of the covariates are insignificant.The estimation of parameters is performed using the ML approach.While the significant covariates are obtained using the backward elimination procedure, auxiliary information from the insignificant covariates is used to obtain pretest and shrinkage estimates.The efficiency of the estimators under study (restricted, pretest, shrinkage and shrinkage) is analyzed with respect to the unrestricted estimator.
The simulation study performed in Section 3 has the WPH model produce the highest mean squared error values, no matter the sample size n.However, as sample size increases, the mean squared error values decrease for each of the models studied (GLLPH, WPH, COXPH).As expected, the COXPH model performs well, since it tends be an adequate fit to data even if a parametric model is more appropriate.An application to real data is also used to evaluate the performance of pretest and shrinkage estimators.These estimators outperform the unrestricted estimators for each of the models studied (GLLPH, WPH, LLAFT, COXPH).The GLLPH model performs well among the four models when relative efficiency and standard error are considered.Similar to the simulation study, the WPH model produced higher RMSE values.However, it also obtained standard errors that were higher than those produced from the GLLPH model.These findings confirm our expectations.
This paper evaluates the performance of pretest and shrinkage estimators for the estimation of GLLPH regression parameters for survival data.Both the simulation study and real data analysis indicate that these estimators are the most efficient for the GLLPH model, as compared to the other models studied.However, the simulation study and real data analysis show that pretest and shrinkage estimators are also an efficient approach for the WPH and COXPH models.

Figure 1 .
Figure 1.Simulation results-RMSE curves for RMLE, PT, SE and PSE estimates (with respect to UMLE) for light to moderate censoring, ≥ 0 and n = 100.

Figure 2 .
Figure 2. Simulation results -RMSE curves for RMLE, PT, SE and PSE estimates (with respect to UMLE) for light to moderate censoring, ≥ 0 and n = 200.

Figure 3 .
Figure 3.A performance comparison between GLLPH and WPH in terms of RMSE with 20% censoring, n = 100, and p 2 = 8, where (a) the generalized Weibull is the true model with κ = 1.8, γ = −0.1 and ρ = 0.2 so that the hazard shape is unimodal, and (b) the generalized Weibull is the true model with κ = 1.8, γ = 1.25 and ρ = 0.2 so that the hazard shape is monotone increasing.

Table 1 .
Log-likelihood and AIC values for each model using PBC data.

Table 2 .
Estimate (first row) and standard error (second row) of each significant covariate for the PBC data.
Notes: The UMLE and RMLE are the actual estimates, while PT, SE and PSE are obtained using 5000 bootstrap simulations.The relative mean squared error (RMSE) for each estimator is also based on 5000 bootstrap simulations with respect to the UMLE.