Multilevel mixed effects parametric survival models using adaptive Gauss–Hermite quadrature with application to recurrent events and individual participant data meta‐analysis

Multilevel mixed effects survival models are used in the analysis of clustered survival data, such as repeated events, multicenter clinical trials, and individual participant data (IPD) meta‐analyses, to investigate heterogeneity in baseline risk and covariate effects. In this paper, we extend parametric frailty models including the exponential, Weibull and Gompertz proportional hazards (PH) models and the log logistic, log normal, and generalized gamma accelerated failure time models to allow any number of normally distributed random effects. Furthermore, we extend the flexible parametric survival model of Royston and Parmar, modeled on the log‐cumulative hazard scale using restricted cubic splines, to include random effects while also allowing for non‐PH (time‐dependent effects). Maximum likelihood is used to estimate the models utilizing adaptive or nonadaptive Gauss–Hermite quadrature. The methods are evaluated through simulation studies representing clinically plausible scenarios of a multicenter trial and IPD meta‐analysis, showing good performance of the estimation method. The flexible parametric mixed effects model is illustrated using a dataset of patients with kidney disease and repeated times to infection and an IPD meta‐analysis of prognostic factor studies in patients with breast cancer. User‐friendly Stata software is provided to implement the methods. Copyright © 2014 John Wiley & Sons, Ltd.


Introduction
The occurrence of clustered survival data is commonplace in medical research, where event times are clustered within groups of the same or similar individuals, which means event times from the same group are likely to be correlated. In this paper, we are interested in situations where a researcher wants to make an average inference across all clusters while accounting for any heterogeneity between clusters. For example, in an individual participant data (IPD) meta-analysis, individuals are clustered within different studies, and researchers are interested in synthesizing the IPD to estimate the average (treatment) effect across studies and the between-study heterogeneity in the effect [1][2][3][4]. This form of meta-analysis is growing in its use for survival modeling as it avoids reliance on published results, allows for the adjustment of confounders, and the investigation of non-PH [4,5]. A second area in which clustered survival data occur is in multicenter clinical trials [6]. Although random treatment effects are rarely used in this context, because of greater control over inclusion/exclusion criteria, mixed models can be used to explore between-center heterogeneity in the treatment effect. Some authors have considered center and treatment-by-center effects to be random in this context [7][8][9]. A further common area in which clustered survival data occur is recurrent event data. Patients can experience an event of interest multiple times throughout the follow-up period, and the inherent correlation between events within patients can be accounted for by using a frailty term [10]. In this article, we refer to a frailty model meaning that with only a random intercept, to distinguish that of a mixed effects model, which can have multiple (possibly correlated) random effects, of which a frailty model is a special case.
Frailty survival models have received great attention in the literature, with a variety of estimation methods and frailty distributions being proposed. The extensive frailtypack in R uses nonparametric penalized likelihood estimation for Cox-based frailty models [11] and can provide smooth estimates of the baseline hazard function using cubic M-splines, or alternatively a Weibull or piecewise exponential hazard can be assumed. Models available include a standard shared frailty model [12], a nested frailty model [13], joint frailty [14], and a model with two random effects [15]. The two random effect model of Rondeau et al. (2008) allows for two normally distributed (possibly correlated) random effects (i.e., a random intercept and coefficient), which was applied to the setting of an IPD meta-analysis, investigating heterogeneity in baseline risk and the treatment effect. Further estimation techniques include the use of an expectation-maximization algorithm to estimate both frailty Cox models, mixed effect Cox models [16][17][18], and partial penalized likelihood [19], available in R in the coxph and coxme packages [20]. Finally, Tudur-Smith et al. (2005) and Bowden et al. (2011) have considered the use of restricted maximum likelihood within the random effect Cox model framework and provided SAS and R code, respectively, to implement the methodology [2,3].
Despite the Cox model remaining the most popular survival model choice [21], there is growing interest in parametric survival models [22]. Through a parametric approach, we can directly model the baseline hazard function to obtain absolute measures of risk. In particular, there is growing use of the Royston-Parmar flexible parametric survival model, which uses restricted cubic splines to model the log baseline cumulative hazard function, providing a highly flexible framework to capture complex underlying hazard functions [23][24][25][26]. Many authors have shown the benefits of undertaking a flexible parametric analysis, as hazard ratio estimates closely match those from a Cox model, while gaining the advantages of undertaking a parametric approach, where in particular, the flexible modeling of the baseline hazard function and of time-dependent effects are particularly appealing aspects of the modeling framework [27,28]. Knowledge of the baseline hazard or survival allows absolute risk predictions over time (e.g., in prognostic models) and enables hazards ratios to be translated back to the absolute scale (e.g., useful for calculating the number needed to treat).
Liu and Huang proposed the use of Gaussian quadrature (both adaptive and nonadaptive) for estimation in PH models with a frailty and a piecewise constant baseline hazard function [29]. They illustrated, through simulation and application, the excellent performance of adaptive Gaussian quadrature in this framework, showing benefits over more complex and difficult-to-implement estimation methods, such as penalized partial likelihood. Kong et al. [30] illustrated this approach further, describing standard proportional hazard choices such as the Weibull and Gompertz distributions, and accelerated failure time (AFT) models such as log logistic and log normal.  extended this approach by using mixed effects Poisson regression for the one-stage meta-analysis of IPD [4]. However, assuming a piecewise baseline hazard (or piecewise function for time-dependent effects) can limit the flexibility for both modeling and prediction and is also potentially computationally intensive because of the splitting of data into time intervals.
In this article, we extend parametric frailty models to incorporate any number of random effects, allowing random coefficients to be included in a parametric survival analysis, estimated using adaptive or nonadaptive Gauss-Hermite quadrature. In Section 2, we describe parametric distributions including the exponential, Weibull and Gompertz PH models and the log logistic, log normal, and generalized gamma AFT models. We then extend the flexible parametric survival model of Royston and Parmar, allowing for mixed effects, which provides a highly flexible and fully parametric modeling framework to capture complex baseline hazard functions, and also extend to non-PH. In Section 3, we conduct two simulation studies to evaluate the estimation routine under a set of realistic scenarios: the first representing a multicenter trial setting and the second an IPD meta-analysis. In Section 4, we apply the flexible parametric mixed effects model to two clinical datasets: the first in patients with kidney disease measuring the time to repeated infection and the second an IPD meta-analysis of prognostic studies in patients with breast cancer. Finally, we conclude the paper with a discussion in Section 5. Copyright

2. Mixed effects parametric survival models
We begin with some notation. We define i D 1; : : : ; N clusters (e.g., trials or centers), with each cluster having j D 1; : : : ; n i patients. Let S ij be the true survival time of the j th patient in the i th cluster, T ij D min.S ij ; C ij / the observed survival time, with C ij the censoring time. Define an event indicator d ij , which takes the value of 1 if S ij 6 C ij and 0 otherwise.

Proportional hazards parametric survival models
We define the PH mixed effect survival model where h 0 .t / is the baseline hazard function of a parametric distribution, such as the exponential, Weibull, or Gompertz. We define design matrices X ij and Z i for the fixed .ˇ/ and random .b i / effects, respectively. We assume b i MVN .0; V/. The fixed effect design matrix, X ij , has dimensions 1 n f , where n f is the number of covariates included as fixed effects. The random effect design matrix, Z i , has dimensions 1 n r , where n r is the number of random effects. Z i can potentially contain covariates and a vector of 1's to allow a random intercept. When Z i D 1, Equation (1) reduces to a standard frailty PH model, with only a single random effect on the baseline hazard function, h 0 .t /. AFT mixed effect models are described in Appendix A.

Flexible parametric model
A more flexible, alternative method to commonly used parametric distributions is the flexible parametric model of Royston and Parmar [23]. This is modeled on the log-cumulative hazard scale, and we can incorporate mixed effects as follows where H 0 .t / is the cumulative baseline hazard function. The spline basis for this specification is derived from the log-cumulative hazard function of a Weibull PH model. The linear relationship with log time is relaxed through the use of restricted cubic splines. The restricted nature of the splines forces the function to be linear beyond the boundary knots, k min and k max . The default choice for the boundary knots are the maximum and minimum of the uncensored survival times. The spline functions are forced to join at the m internal knots, k 1 ; : : : ; k m , with continuous first and second derivatives. Internal knot locations are generally based on centiles of the uncensored survival times. Further details can be found in Royston and Parmar [23] and Lambert and Royston [31]. We can therefore write a restricted cubic spline function of log.t /, with knot vector k 0 , letting u D log.t / s.uj ; with parameter vector and derived variables v j (known as the basis functions), where where for j D 2; : : : ; m C 1, .u k j / 3 C is equal to .u k j / 3 if the value is positive and 0 otherwise and This is now substituted for the log-cumulative baseline hazard in Equation (2).
We can now transform to the hazard and survival scales Given the fully parametric nature of the model specification, the derivatives of the spline function required in the definition of the hazard function are easily calculated. Knot location by default is based on the centiles of the uncensored survival times, which has recently been shown to provide excellent approximations for complex hazard functions, with parameter estimates robust to knot location [32]. Under Equation (2), we assume proportional cumulative hazards; however, this in fact implies PH, as in the models described in Section 2.1.

Non-proportional hazards.
The occurrence of non-PH (or time-dependent effects) is common in the analysis of survival data. For example, treatment effects can vary over time [33], and in registry-based studies where follow-up is often over many years, covariate effects can be found to vary in magnitude over the duration of follow-up [25].
Within the flexible parametric framework, non-proportional cumulative hazards can be incorporated by interacting covariates with spline functions of log time and including them in the linear predictor [31]. Equation (7) becomes where X ijp is the p th covariate interacted with spline function s.log.t /jı p ; k p /, with associated coefficient vector ı p . The number of spline variables used for each time-dependent effect depends on the number of knots, k p . Often, fewer knots can be used than chosen for the baseline cumulative hazard function, k 0 .

Likelihood and estimation
We can now define the likelihood for the i th cluster under the mixed effects survival modeling framework with parameter vector, Â. Under a hazard scale parametric model with h./ defined as the exponential of Equation (1). Under the flexible parametric survival model (assuming PH) Finally, we assume the random effects follow a multivariate normal distribution with variance-covariance matrix, V , and q is the number of random effects. The (possibly multidimensional) integral in Equation (10) is analytically intractable, requiring numerical techniques to evaluate. In this article, we propose to use M -point nonadaptive or fully adaptive Gauss-Hermite quadrature [29,34,35]. Note that it is important to use an increasing value for M to check the consistency of results and to ensure that suitable convergence has been achieved. This is described further in Section 3. As noted in Liu and Huang [29], estimation of frailty survival models require reasonable starting values for efficient convergence of the estimation process. We adopt a two-stage estimation process as the default choice. To begin, starting values for the fixed effects are obtained from fitting the standard fixed effect parametric survival model, and values of 1 and 0 are taken for variances and covariances, respectively, for any random effects. The first stage of the estimation is then undertaken to refine the starting values, using nonadaptive Gauss-Hermite quadrature, and proceeds for two full Newton-Raphson iterations. The current parameter estimates are then taken as starting values for the second stage of the estimation process, switching to fully adaptive Gauss-Hermite quadrature, which proceeds until standard convergence criteria are met. The first and second derivatives are estimated numerically, as implemented in the ml command in Stata [36]. We obtain variance estimates using the inverse of the negative Hessian matrix evaluated at the maximum likelihood estimates.
In the associated Stata software, four choices are available for the variance-covariance structure of the random effects, namely, independent, identity, unstructured, and exchangeable.

A note on the impact of the number of participants per cluster.
It is important to note that when clusters are particularly large, that is, many participants within a specific cluster, given that the numerical integration must be conducted on the likelihood function, rather than the log-likelihood function, with contributions then combined at the panel level, some rescaling of the timescale may be required to ensure model fit. For example, in the analysis described in Section 4.2, where one cluster contained 36.61% of all patients (2722 of 7435), the timescale must be divided by a large number (e.g., 500) to allow model convergence.

Stata software
The methodology described earlier is implemented as user-friendly Stata software, which can be downloaded within Stata by typing ssc install stmixed. All the models described earlier are available, with adaptive Gauss-Hermite quadrature, the default numerical integration technique. A range of prediction options are also available, including empirical Bayes estimates of the random effects.

Simulation studies
In this section, we conduct two simulation studies to assess the performance of adaptive and nonadaptive quadrature to estimate mixed effects survival models. As is often the case with methods that use numerical integration, estimation difficulties can arise with challenging datasets. To accommodate this within the simulation studies, we have five levels of model fitting: Each successive model is applied only if the previous fails to converge.

Simulation study 1: proportional hazards with a frailty
Liu and Huang [29] illustrated excellent performance of adaptive Gauss-Hermite quadrature in their simulation study but only chose one frailty standard deviation of 1 with a normally distributed frailty. Here, we replicate one of their simulation scenarios, but vary the magnitude of the frailty standard deviation. We assume a multicenter trial scenario, with 100 centers, and six patients in each center. A binary center level covariate X 1i (such as treatment if individual centers were randomized to a single treatment group) is generated from Bin.1; 0:5/, and a patient-specific covariate (such as a biomarker level) X 2ij is generated from U.0; 1/, with associated fixed effects of f1; 1g, respectively. We therefore simulate from the following model assuming a Weibull baseline hazard with scale D 1 and shape D 2. We have a hazard ratio of exp.ˇ1/ D 2:718 for a one-unit increase in X 1i , assuming X 2ij is held constant, and exp.ˇ2/ D 0:368, our hazard ratio for a one-unit increase in X 2ij given X 1i is held constant. Censoring times are generated from U.0; 2/. A normal frailty is incorporated from b 0i N.0; 2 /, with D f0:2; 0:5; 1g. In each scenario, we conduct 1000 replications. A Weibull survival model with frailty is applied to each simulated dataset. Results are presented in Table I, including the number of simulations that converged at each model fitting stage, and in total. Bias, percentage bias, and coverage are calculated on the parameterized scales, using estimates of the log-hazard ratios,ˇ1 andˇ2, and the log of the scale and shape parameters of the baseline hazard and the heterogeneity standard deviation, that is, log. /; log. /, and log. /. When D 1, we observe entirely consistent results to that of Liu and Huang [29], showing excellent performance of the estimation method, with minimal bias in all parameter estimates and coverage probabilities close to the optimum of 95%. When D 0:5, we again find minimal bias and optimum coverage probabilities across parameters, with only some minor underestimation of the frailty standard deviation, . Both baseline scale and shape parameters, and , have been captured with minimal bias and optimum coverage probabilities, across scenarios 1 and 2. However, when D 0:2, 92 of the 1000 simulations failed to converge across all five of the models applied. Of the 908 that did converge, we observe minimal bias in the log-hazard ratiosˇ1 andˇ2, and coverage levels of 86.5%, indicating some underestimation of standard errors. In particular, we observe substantial bias and percentage bias in estimates of log. /; however, this is because the mean calculation of bias is skewed because of some large negative estimates (or equivalently estimates of close to 0). Using the median to summarize estimates of is much more informative in this case, because of being bounded by zero, in which case, we observe a median estimate of 0.191, close to the true value of 0.2. This result should be interpreted with caution given the amount of nonconvergence.

Simulation study 2: proportional trial effects with a random treatment effect
We now conduct a simulation study to assess the performance of the estimation method in scenarios representing IPD meta-analyses of survival data, incorporating a random coefficient, that is, a treatment effect with heterogeneity. It is well-known that maximum likelihood estimation leads to downwardly biased estimates of variance parameters in small samples, which, in a meta-analysis, is when the number of studies is small. This has been shown in other simulation studies, such as mixed effects logistic regression and Poisson regression models [4,37]. We investigate this further by varying the magnitude of the heterogeneity variance and applying mixed effect Weibull PH models. We simulate from the following model Copyright assuming a Weibull distribution, with shape and scale parameters of D 1:276 and D 3:121, respectively. We incorporate proportional trial effects,ˇ0 i , for i D 2; : : : ; n, drawn from N.0; 0:25 2 /, withˇ0 1 constrained to be zero for the reference trial, and therefore exp.ˇ0 i / represents the proportional effect on the baseline hazard because of the i th trial, where i D 2; : : : ; n. We generate a binary covariate, X 1ij Bin.1; 0:5/, to represent treatment group, and whereˇ1 represents the mean log-hazard ratio for a population of treatment effects, with b 1i the deviation of the log-hazard ratio of the i th trial from the population mean. We therefore assume .ˇ1 C b 1i / N. 0:663; 2 /, with D f0:25; 0:5; 1g. Administrative censoring is applied at 0.24 units. We assume 15 trials with 500 patients in each. The number of patients represents an approximate average trial size for the example described in Section 4.2, which also informs the random treatment effect mean and standard deviation and baseline parameters for the Weibull distribution, to provide plausible simulation scenarios. Equation (15) is then applied to each simulated dataset. Results are presented in Table II, including the number of simulations that converged at each of the five nested model stages. Once again, bias, percentage bias, and coverage are calculated on the parameterized scales, using estimates of the log-hazard ratio,ˇ1, and the log of the scale and shape parameters of the baseline hazard and the heterogeneity standard deviation, that is, log. /; log. /, and log. /. When D 1 or D 0:5, we observe very good performance of the estimation method, with only minor underestimation of the heterogeneity standard deviation, . Estimates of the mean treatment effect,ˇ1, are generally unbiased, with slightly sub-optimum coverage probabilities of 91.2% and 91.0% in scenarios 1 and 2, respectively. Estimates of baseline parameters, and , were essentially unbiased and had optimum coverage probabilities. However, when D 0:25, some issues with convergence were observed, with 55 of the 1000 simulations failing to converge across all five of the model fitting stages. Of those that did converge, we generally observe unbiased estimates of the average treatment effect, but moderate underestimation of the heterogeneity standard deviation (median of estimates D 0.208) was seen in the simulations that did converge. As in simulation study 1, mean bias and percentage bias of log. / were substantial because of some large negative estimates, on the log scale, equivalent to some estimates of close to 0. Of course, this result must be interpreted with caution given the moderate amount of nonconvergence.

Kidney data-repeated times to infection
In this section, we consider a reanalysis of a dataset given in McGilchrist and Aisbett (1991) [19]. The dataset consists of 38 patients with kidney disease, where the event of interest is infection at the catheter insertion point. Each patient has two possible recurrence times, recorded from initial insertion. A total of 58 failures were observed. The clinical question of interest is to examine the effect of age and sex on the hazard of infection and accounting for potentially recurrent events for each individual (in this case, the clusters are the individual patients themselves). Here, we wish to fit a flexible parametric frailty model. In preliminary modeling, with no covariates or frailty, we can use the AIC to guide the choice of number of degrees of freedom (DOF) required to capture the baseline hazard function [32], with results presented in Table III. From Table III, we see that the AIC selects three DOF as the optimum choice for the baseline hazard function, with the estimated function shown in Figure 1. Figure 1 indicates the presence of two turning points in the underlying hazard function, which would clearly be missed if fitting a standard parametric frailty model such as the Weibull. We can now fit a frailty flexible parametric survival model (which can equivalently be written on the hazard scale for simplicity) where b 0i N.0; 2 /. To give an indication of the heterogeneity in the underlying hazard across individuals, we can calculate patient-specific predictions of the hazard function, including empirical Bayes predictions of the frailty effect [35], shown in Figure 2. Figure 2 shows substantial variation in the baseline hazard across patients. We can now incorporate covariates under the following model adjusting for age (years), X 1ij , and sex (male as the reference group), X 2ij , with associated log-hazard ratios,ˇ1 andˇ2, respectively. Results presented in Table IV, where results are also presented from a Cox frailty model for comparison. From Table IV, results from the flexible parametric frailty model show a statistically significant difference between women and men, with a hazard ratio of 0.231 (95% CI: 0.088, 0.605) for women compared with men of the same age. We also find a nonstatistically significant hazard ratio of 1.007 (95% CI: 0.982, 1.034) for a one-unit increase in age, assuming the same sex. We observe a heterogeneity standard deviation of 0.800 (95% CI: 0.415, 1.542), indicating that across individuals, the hazard function is highly heterogeneous. It is important to establish the consistency of the numerical integration by using an increasing number of quadrature points and comparing estimates [35]. In this case, we obtained identical estimates up to four decimal places with 9 and 10 adaptive quadrature points.
We also compared the flexible parametric frailty model with the Cox frailty model. Results from Table IV show similar estimates in both the hazard ratios and associated 95% CIs and the estimated frailty standard deviation. The penalized partial likelihood approach used to fit the Cox frailty model does not provide a CI for frailty parameters.
One could potentially use the fitted flexible parametric model to make predictions over time for a new individual with a given age and sex, as we have the average baseline risk and the hazard ratios for age and sex. The Cox model does not give the average baseline risk and so could not be used for this purpose.

Breast cancer-individual participant data meta-analysis of prognostic biomarkers
We now apply the mixed effect flexible parametric survival model to an IPD meta-analysis of prognostic studies in patients with breast cancer [38]. IPD was obtained from 15 studies, with a total of 7435 patients, 2042 (27.48%) of which died.
For illustration purposes, we are interested in the effect of hormone receptor status on the hazard of death over time, coded 1 2 for negative or unknown and 1 2 for at least one positive. Hormone receptor status is determined by two measurements, namely, estrogen receptor and progesterone receptor, dichotomized into low and high. When a random effect is placed on hormone receptor status, this group coding assumes equal variability in the log-hazard rate for both groups [2].
Commonly used one-stage meta-analysis models will assume either proportional study effects or stratify by study membership. Within a parametric modeling framework, stratifying by study membership allows the estimation of separate baseline hazard functions for each study, which is often more plausible than assuming proportional study effects. We can assess this through preliminary fixed effect models and using model selection criteria. Applying a flexible parametric survival model to the IPD with proportional study effects and using the AIC to select the number of DOF for the reference study (three DOF in this case), we obtain an AIC of 11 348.37. Comparing this to a flexible parametric model stratified by study, with the DOF for each study's baseline hazard selected using the AIC, we obtain an AIC of 11 342.978, indicating an improved fit when stratifying by study. The estimated baseline hazard functions are shown in Figure 3. Of course, within the mixed effects framework, we can assume a random baseline, that is, random study effects; however, this assumes that the studies used in the analysis are a random sample of a distribution of studies. This approach has been advocated by some authors [15,39], particularly as it is clinically useful to obtain an estimate of the average risk (across reference groups), in which case, a parametric approach is preferable.
In our example, we proceed by stratifying the study and investigate heterogeneity in the effect of hormone receptor status, under the following flexible parametric model.
where h 0i .t / is the baseline hazard function for the i th study, with the baseline hazard for each trial modeled using restricted cubic splines within the flexible parametric framework, X 1ij is hormone receptor status,ˇ1 is the average log-hazard ratio for a distribution of covariate effects, with b 1i the deviation of the i th study from this average effect. For comparison, we also fit the equivalent stratified Cox mixed effect model. Results are presented in Table V. From Table V, we observe a statistically significant average effect of hormone receptor status, with a hazard ratio of 0.515 (95% CI: 0.426, 0.624), with an estimated heterogeneity standard deviation of   It is also often of interest to adjust for patient-level covariates, particularly to see if any heterogeneity in, for example, a treatment effect, is explained by covariates. Within the proposed framework, any number of covariates can be included as fixed or indeed random effects. To illustrate this, we now include age as a fixed effect, with results presented in Table VI. We can see that after adjusting for age, the heterogeneity in the effect of hormone receptor status has reduced slightly from D 0:257 to D 0:249, and we obtain a hazard ratio of 1.013 (95% CI: 1.010, 1.017) for a 1 year increase in age. We once again obtained consistent estimates to four decimal places with five and six adaptive quadrature points. We again find very similar estimates from the stratified Cox mixed effect model.
Following a random effects meta-analysis, we can calculate a prediction interval to provide a range for the predicted parameter in a new study. For the hazard ratio of hormone receptor status, this gives a 95% prediction interval of (0.277, 0.878), calculated on the natural log scale and then exponentiated [40,41]. As the prediction interval does not contain 1, this reveals that in at least 95% of settings, hormone receptor status will have prognostic value (in the same direction). This finding is masked when ignoring heterogeneity and when focusing on only the 95% CI for the summary effect. Therefore, despite the heterogeneity, there is strong consistent evidence here that hormone receptor status is a prognostic factor for breast cancer patients across the settings covered by these studies.
Further extensions such as treatment-covariate interactions, non-PH, and multiple random effects all follow naturally within the framework and are available within the associated Stata package.

Discussion
In this article, we developed methodology for the analysis of clustered survival data, incorporating mixed effects into the parametric survival analysis framework using maximum likelihood. This extends parametric frailty survival models, allowing any number of normally distributed random effects to be used, including the exponential, Weibull, and Gompertz PH models and the log logistic, log normal, and generalized gamma AFT models. Furthermore, we extended the flexible parametric survival modeling framework of Royston and Parmar (2002), modeled on the log-cumulative hazard scale using restricted cubic splines, to include random effects, and also allowing for non-PH (time-dependent effects). This framework has the potential to be used in a variety of settings, such as the analysis of repeated event data, the one-stage IPD meta-analysis of survival data, the analysis of multicenter trial data, and the development of prognostic models using data from multiple clusters.
There is growing use in parametric survival models, in particular, the class of Royston-Parmar flexible parametric models [27,28]. By incorporating mixed effects into this framework, we demonstrate a useful modeling approach to capture complex hazard functions, often observed in real datasets. Furthermore, we extend to time-dependent effects in flexible parametric mixed effects survival models. In the example applications, we used model selection criteria, such as the AIC, to select the DOF for the baseline hazard function. These, combined with sensitivity analysis for knot location, have been shown to be highly effective in model selection [24,26,32]. Through the flexible parametric approach, we can model the baseline and time-dependent effects in continuous time, allowing smooth predictions, both in and out of sample, compared with other piecewise approaches [4,29].
We conducted two simulation studies in this article to assess the performance of the estimation routine in a set of clinically realistic scenarios, including a multicenter clinical trial and an IPD meta-analysis. In simulation study 1, chosen to reflect a multicenter clinical trial context, we found excellent performance in estimating the frailty variance when moderate and large values of the variance were chosen. However, when small values were chosen, some convergence issues arose, with some bias and sub-optimum coverage probabilities. Log-hazard ratios and baseline parameters were essentially unbiased with coverage probabilities at approximately 95%. In simulation study 2, we simulated data to represent an IPD metaanalysis, similar to that of the prognostic study meta-analysis analyzed in Section 4.2. In agreement with the findings of   [4], some underestimation of random effect standard deviations were observed in this simulation setting, although the magnitude of the bias appears to decrease as the magnitude of the standard deviation increases. Baseline hazard parameters of the Weibull distributions used were also captured with minimal bias and optimum coverage probabilities.
In the application to the kidney data, we highlighted the ability of the flexible parametric frailty model to capture complex hazard functions, which detected two turning points in the underlying hazard function. If interest lies in measures of absolute risk such as the hazard rate, which are clinically meaningful from an epidemiological perspective, it is crucial to correctly model the baseline. This is especially important for prognostic models, which need to predict absolute risk over time.
It is generally accepted that one-stage models are preferable for the meta-analysis of survival data but are often described as computationally formidable [42]. In the example analyses shown in this article, the maximum computation time was less than 5 minutes on a standard laptop computer with 4 gb of RAM, which was applied to the meta-analysis of 7435 patients, indicating this framework is entirely feasible for application to clinical datasets. Some authors have proposed to incorporate a random effect on the baseline risk and thus not include proportional or separate baseline hazards for each study [2,15]. Indeed, this may be preferable in situations with large numbers of studies, to speed up estimation. More importantly, it is useful if a meta-analysis can provide us with the average baseline risk across all trials (across all placebo/reference groups), which falls directly into the parametric framework described in this article. Subsequently, relative effects can be easily transferred to an absolute scale. Schmid et al. (2004) advocate this approach [39]: 'The assumption of random trial effects in [2] implies that results of trials with similar measured characteristics will vary because of unmeasured characteristics. In essence, each trial is a random sample from a population of potential trials. It is more common in multicenter trials for which inference is desired about the particular centers to treat the study terms as fixed effects [29]. We would argue that the random effects formulation is more appropriate to meta-analysis because consumers wish to generalize results to future trials or to different settings. However, this view is by no means universal [30]. ' A final benefit of undertaking a parametric approach is in the area of economic decision modeling [43], where survival is often required to be extrapolated to a lifetime horizon, beyond the range of the data. Through a parametric approach, uncertainty in baseline risk can be directly incorporated into the economic model.
Further work is needed to evaluate the estimation methods in clinically realistic scenarios, to investigate how well it performs for varying effect sizes, heterogeneity standard deviation sizes, number of clusters, and number of participants per cluster. Previous work has also investigated the use of restricted maximum likelihood within the Cox mixed effects framework [3,44], which could be extended to the parametric mixed effect framework described here, with the aim to reduce the underestimation in heterogeneity parameters under a maximum likelihood approach.
To facilitate the use of the methods in practice, user-friendly Stata software has been developed by the first author. All the models described in this article are available, with a variety of prediction options. Further possible extensions to the framework include allowing for higher levels of clustering and further simulation studies to establish robustness to underlying assumptions, such as normally distributed random effects.

A.1. Accelerated failure time models
Mixed effects can also be incorporated into an accelerated failure time (AFT) framework. This is included in the associated Stata package, and therefore, we briefly discuss them here.
A.1.1. Log logistic. We define the survival and density functions for the log-logistic AFT model and treated as an ancillary parameter.
A.1.2. Log normal. We define the survival and density functions for the log-normal AFT model .´/ is the standard normal CDF, and treated as an ancillary parameter. with f .T ij / and S.T ij / defined above.