Flexible parametric joint modelling of longitudinal and survival data

The joint modelling of longitudinal and survival data is a highly active area of biostatistical research. The submodel for the longitudinal biomarker usually takes the form of a linear mixed effects model. We describe a flexible parametric approach for the survival submodel that models the log baseline cumulative hazard using restricted cubic splines. This approach overcomes limitations of standard parametric choices for the survival submodel, which can lack the flexibility to effectively capture the shape of the underlying hazard function. Numerical integration techniques, such as Gauss–Hermite quadrature, are usually required to evaluate both the cumulative hazard and the overall joint likelihood; however, by using a flexible parametric model, the cumulative hazard has an analytically tractable form, providing considerable computational benefits. We conduct an extensive simulation study to assess the proposed model, comparing it with a B‐spline formulation, illustrating insensitivity of parameter estimates to the baseline cumulative hazard function specification. Furthermore, we compare non‐adaptive and fully adaptive quadrature, showing the superiority of adaptive quadrature in evaluating the joint likelihood. We also describe a useful technique to simulate survival times from complex baseline hazard functions and illustrate the methods using an example data set investigating the association between longitudinal prothrombin index and survival of patients with liver cirrhosis, showing greater flexibility and improved stability with fewer parameters under the proposed model compared with the B‐spline approach. We provide user‐friendly Stata software. Copyright © 2012 John Wiley & Sons, Ltd.


Introduction
The joint modelling of longitudinal and time-to-event data has received remarkable attention in the literature over the past 15 years [1,2], with the ability to investigate the inter-relationships between the joint processes being advocated in ever widening fields of study [3,4]. Extensions to the now standard single longitudinal response and single time-to-event joint model include incorporation of multiple longitudinal markers, both classically [5] and using a Bayesian approach [6], extension to the competing risks setting [7], investigation of a cure proportion [8], and a variety of time-toevent submodels [3,9]. Tsiatis and Davidian [10] and Yu et al. [11] describe extensive outlines of the field.
The form of joint model, which has dominated the literature, assumes that shared random effects characterise the association between the time-to-event and longitudinal marker, and we adopt this approach. In order to estimate such models, computationally intensive numerical integration techniques, such as Gauss-Hermite quadrature [12], are required to evaluate both the cumulative hazard function and the full joint likelihood.
We propose to use restricted cubic splines to model the log baseline cumulative hazard function, to provide a highly flexible framework to capture complex hazard functions. Royston and Parmar [13] first proposed this form of survival model by expanding log time into a restricted cubic spline basis.

Longitudinal submodel
We assume a linear mixed effects model for the continuous longitudinal process. Therefore, we observe: e ij N 0; 2 e (1) with design matrices X i and Z i for the fixed (ˇ) and random (b i ) effects, respectively, consisting of time variables. Furthermore, we also have a vector of time invariant baseline covariates, u i 2 U i , and corresponding regression coefficients, ı. We assume the error is independent from the random effects and that cov.e ij ; e ik / D 0 (where j ¤ k). We can incorporate flexibility in the longitudinal submodel through the use of fractional polynomials of time, for example, which will often be sufficient to capture the longitudinal trajectory [20].

Survival submodel
We define the proportional cumulative hazards time-to-event submodel: where H 0 .t / is the cumulative baseline hazard function,˛denotes the association parameter and is a set of regression coefficients associated with a set of baseline covariates, v i , again a subset of U i . In this formulation we assume the association is based on the current value of the longitudinal response. A useful discussion regarding the choice of association measure is found in Rizopoulos and Ghosh [6]. We derive the spline basis for this specification from the log cumulative hazard function of a Weibull proportional hazards model. The linear relationship with log time is relaxed through the use of restricted cubic splines. Further details can be found in Royston and Parmar [13] and Lambert and Royston [16]. We can therefore write a restricted cubic spline function of log(t ), with knots k 0 , as sflog.t /j ; k 0 g. For example, with K knots and letting x D log.t /, we can express a restricted cubic spline function as This is now substituted for the log cumulative baseline hazard in Equation (3).
We can now transform to the hazard and survival scales Given the fully parametric nature of the model specification, the derivatives of the spline function required in the definition of the hazard function are easily calculable. These functions are of course specific to using the current value as the measure of association and can be adjusted according to the form of association that is being investigated.

Full joint likelihood
We can now construct the full likelihood for the joint model: and with parameter vector Â and V the variance-covariance matrix of the random effects. We can evaluate Equation (9) using m-point non-adaptive or fully adaptive Gauss-Hermite quadrature [12,21]. Fully adaptive quadrature iteratively places the nodes at the optimum positions for each patient, resulting in a much reduced number of nodes required to obtain reliable estimates, providing substantial computational benefits.

Simulation study
We undertook a simulation study to assess the performance of the proposed joint model for finite sample sizes, comparing with the model of Rizopoulos et al. [14]. Under each scenario, we apply both the proposed joint model with five degrees of freedom, plus an intercept, resulting in six parameters to capture the baseline cumulative hazard function, and a B-spline function of degree 3 (cubic splines for consistency) and two internal knots, again resulting in six parameters to capture the baseline hazard, providing a fair comparison. We also apply the proposed joint model with one degree of freedom (equivalent to a Weibull-based joint model). We implement our proposed model in the stjm command in Stata [22]. We use the JM package [23]  Furthermore, what has often been neglected in the literature is the assessment on the number of quadrature nodes required to obtain consistent parameter estimates of effect and sufficient coverage probabilities. To each scenario, we use non-adaptive Gauss-Hermite quadrature to evaluate the joint likelihood of both the proposed model and the B-spline model, with 5 and 15 nodes for each random effect dimension to assess performance. We also implement five-point fully adaptive quadrature for the proposed model for comparison.
For each scenario, we included 300 patients in each 500 repetitions. We generated the true longitudinal profile from W ij Dˇ0 i Cˇ1 i t ij C ıu i , withˇ0 i N.0; 1/,ˇ1 i N.0; 0:25 2 / and correlation between .ˇ0 i ;ˇ1 i / of 0.25. We then generated the observed longitudinal measurements from Y ij N.W ij ; 0:5 2 /. We fixed time of measurements at (0, 1, 2, 3, 4). We also generated survival times from log.
We applied censoring at 5 years. We generated a binary treatment group variable from u i Bin.1; 0:5/. We fix the direct treatment effect on the longitudinal response, ı, at 0:25, the direct treatment effect on the time-to-event, , at 0.25, and vary the association parameter,˛, between f 0:25; 0:25g.

Generating survival times
Often simulation studies will generate survival times from an exponential distribution, which assumes a constant baseline hazard function. In many situations, this may lack biological plausibility. For example, the method of Rizopoulos et al. was evaluated in a simulation study with survival times generated from an exponential distribution (however, the primary motivation of the simulation study was to evaluate the Laplacian estimation method, not the survival submodel).
Under standard survival models, Bender et al. [24] have described an efficient algorithm to generate survival times with a variety of parametric choices for the baseline hazard function; however, when incorporating a time-varying biomarker, this produces an equation that cannot be directly solved for T , where T is the generated survival time. Furthermore, an assumption of a constant baseline hazard could be considered too simplistic to fully assess the performance of a model. To fully assess our approach Copyright  in capturing complex baseline hazard functions, with turning points, we generate survival times from a two-component mixture Weibull distribution [25], with where 0 6 p 6 1, 1 and 2 are scale parameters, 1 and 2 are shape parameters, and We now add the linear predictor for the association and time independent covariates, on the log cumulative hazard scale: Following the formulation of Bender et al. [24] S.t/ D 1 F .t/; where F U.0; 1/ and using Equation (15), we can generate survival times from Equation (18) is analytically intractable and so cannot be directly rearranged to find t ; however, methods to overcome this include Newton-Raphson iterations or nonlinear least squares. This approach could be used in a variety of settings to better assess survival models. Hence, although we do not fit the true model from which we simulate, we use a sufficiently complex underlying shape to truly assess the proposed model specification.
We chose three scenarios of baseline parameters: a standard Weibull with increasing hazard function, f 1 D 0:1, 1 D 1:5, and p D 1g; a mixture Weibull with a single turning point in the baseline hazard function, f 1 D 0:1, 1 D 1:5, 2 D 0:1, 2 D 0:5, and p D 0:9g; and finally a Weibull distribution with f 1 D 1E 05, 1 D 6:1, and p D 1g. The final scenario is to assess the validity of our approach when the hazard is essentially zero for a portion of the follow-up time.

Results
Tables I-III present bias and coverage estimates from all simulations generated under the three baseline hazard functions. Under the three scenarios, survival submodel parameters estimates from the proposed model, that is the direct treatment effect on survival ( ) and the association parameter (˛), appear to be   very closely approximate the desired 95% in all scenarios when using restricted cubic splines, even with a small number of non-adaptive quadrature nodes. For the longitudinal submodel parameters, we observe generally unbiased estimates, however, in respect to variance parameters, only when the number of non-adaptive quadrature nodes >15 or when using fully adaptive quadrature. Under non-adaptive quadrature, coverage estimates are generally below the desired 95% indicating a marked underestimation of the standard errors, compared with optimum coverage probabilities across scenarios when fully adaptive quadrature is used. Further simulations, not shown here, illustrated that 35 non-adaptive quadrature nodes were required to provide optimum coverage probabilities. Standard errors of variance parameters are not available in R so coverage could not be assessed for all parameters in the B-spline models. Our proposed model also produces moderate bias in the variance estimate of the slope parameter when we use five-point non-adaptive quadrature; however, we eliminate this bias under both 15-point non-adaptive and 5-point adaptive quadrature. Comparing across degrees of freedom, we observe almost identical estimates of bias and coverage probabilities between models. Table II presents bias and coverage estimates for simulations generated from a two-component mixture Weibull baseline hazard, described in Section 3.1. Results appear entirely consistent with those found when generating under a standard Weibull distribution. The underestimation of the standard errors of the longitudinal parameters remains a problem when we use an insufficient number of quadrature nodes. Despite generating data from a complex baseline hazard, the joint models fitted with only one degree of freedom appear to estimate all parameters just as effectively as with five degrees of freedom, specifically the three treatment effects. This can perhaps be expected, as is often the case, the hazard ratio can be insensitive to specification of the baseline hazard function [26].
We discuss the implications of the choice of the number of quadrature nodes and the insensitivity to the baseline hazard in Section 5.

Analysis of liver cirrhosis data set
In this section, we apply the proposed joint model to the data set introduced in Section 1, where primary interest is the effect of treatment after adjusting for the repeatedly measured prothrombin index on the time to all-cause death. A total of 488 patients had their prothrombin index measured at baseline, with further scheduled measurements at 3, 6, and 12 months, and annually thereafter. Median number of measurements was 6 (range: 1-17). Two hundred ninety-two (59.8%) patients died during the study. Patients were randomised to two treatment groups, namely prednisone and placebo. For further details regarding the data set, we refer the reader elsewhere [19].  Figure 1 provides an initial exploration of the relationship between prothrombin trajectory and the time (in years) to death by plotting the observed longitudinal responses against observation time, where we adjust the timescale by taking away the observed censoring/event time. We overlay a lowess smoother. From Figure 1, it is apparent that patients who experienced the event, compared with patients who were censored, had decreasing levels of the biomarker during the 2-3-year period before death. If we assume that the association between the longitudinal and survival models is based on the current value parameterisation discussed in Section 2.2, we would expect a negative association indicating a lower value of prothrombin index has an increased risk of death. This form of plot can be a useful exploratory tool in the analysis of joint longitudinal and survival data.
We now apply the joint model described in Section 2 to the liver cirrhosis data set. In the longitudinal submodel, we assume a random intercept with random effect of log(time) and also adjust for the interaction between treatment and time. In preliminary analysis, log(time) showed an improved fit compared with a linear effect of time. In the survival submodel, we adjust for the direct effect of treatment. We model the association between prothrombin index and time to death through the current value parameterisation. We use five degrees of freedom to model the baseline cumulative hazard, equivalent to four internal knots. Boundary knots are placed at the 0th and 100th percentiles of the uncensored log survival times. For comparison, we also apply the model of Rizopoulos et al. Under the B-spline model, we use cubic splines with two internal knots to provide a comparison of model fit with the same number of parameters used to model the baseline cumulative hazard function. As adaptive quadrature is not available for the B-spline model, we apply both models using 35-point non-adaptive quadrature.
In Table IV, comparing between our proposed approach and the B-spline model, we generally observe similar parameter estimates, in particular both models show a negative association between prothrombin index and time to death, for example under our approach, we observe an association of 0.038 (95% CI: 0.045, 0.031), indicating a lower value of prothrombin index increases the risk of death. We observe a non-statistically significant direct effect of treatment on survival with a log hazard ratio of 0.210 (95% CI: 0.038, 0.457).
We now return to our primary motivation of our approach which is to effectively capture complex hazard functions. We compare the fitted marginal survival functions across models with the Kaplan-Meier estimates for the liver cirrhosis data set, shown in Figure 2. It is evident from Figure 2 that the restricted cubic spline approach provides an improved fit compared with the B-spline approach, using the same number of parameters to model the baseline cumulative hazard function. Indeed, in Figure 3, we show the marginal survival function with an increased number of internal knots under the B-spline approach, highlighting that we need to use five internal knots to achieve a function that fits as closely as the restricted cubic splines approach. In other words, we need to use nine parameters under    the B-spline approach compared with only six under the restricted cubic spline approach to achieve a well-fitting function.

Predictions
To illustrate the prognostic benefits of the joint modelling framework, we can tailor conditional survival predictions at the individual level using the empirical Bayes predictions from the random effects, and appropriate sampling schemes to calculate accurate standard errors for these predictions have been proposed by Rizopoulos [17]. We adapt the approach of Rizopoulos to calculate conditional survival  predictions of two patients with similar baseline values of prothrombin index, using the fitted restricted cubic spline-based joint model, shown in Figure 4. Given the negative association between prothrombin index and an increased risk of death, we see from Figure 4 that patient 98 has a sharply increasing pattern of prothrombin index across follow-up time, resulting in higher survival probabilities, conditional on survival at time of final measurement, when compared with patient 253. Patient 253 maintains lower values of prothrombin index, resulting in lower survival predictions. We discuss the reliance of these predictions on accurately specifying the baseline hazard in Section 5.

Sensitivity to location and number of knots
In our experience, we have found that the default knot locations, based on the distribution of uncensored event times provides the most sensible approach to modelling using spline formulations, as was found in Rizopoulos et al. [14]. This allows the data to be modelled more accurately in the areas of greatest   Table V contains the parameter estimates across models with differing knot choices, illustrating once again the robustness of parameter estimates when compared with the original results in Table IV, with only minor differences observed in the third decimal place. Similarly, the left plot in Figure 5 shows very stable predicted marginal survival curves across knot choices. Furthermore, the right plot in Figure 5 illustrates the fitted marginal survival function when using two, three and five internal knots (with locations based on equally spaced quantiles of the distribution of uncensored survival times), illustrating the stability of our proposed model. In comparison with Figure 3, we observe much more variability in the marginal survival predictions when using B-splines with varying number of knots.

Discussion
We have described a highly flexible joint model for a single longitudinal continuous biomarker and an event of interest. The restricted cubic spline basis for the log cumulative baseline hazard function provides a flexible framework where often the time-to-event is of primary interest. We can incorporate flexibility in the longitudinal submodel through the use of fixed and/or random fractional polynomials of time, which can capture a variety of shapes [20]. The simulation study conducted to assess the proposed joint model raised three important issues. Firstly, we observed consistent underestimation of the association parameter,˛, under the B-spline approach. We eliminated this bias when using restricted cubic splines, both with one and five degrees of freedom. Secondly, the choice of the number of quadrature nodes can have a marked impact on both parameter estimates and in the associated standard errors. If interest is purely on the time-to-event, then we can use a lower number of quadrature nodes and obtain unbiased estimates with optimum coverage levels; however, if the longitudinal submodel is of interest, then the choice of quadrature nodes and method is crucial. For example, in studies where quality of life is the longitudinal marker of interest [28], the longitudinal response profile can be of direct interest in order to be included into an economic decision model, where reliable estimates of associated standard errors can be pivotal in assessing costeffectiveness and thus health policy decisions [29]. The simulation study highlighted the superiority of fully adaptive Gauss-Hermite quadrature in the joint model setting. The use of adaptive quadrature means we can use a much reduced number of quadrature nodes, resulting in substantial computational benefits. Finally, the simulation study showed in general how the estimates of covariate effects were insensitive to the specification of the baseline hazard. This of course can be beneficial; however, one of the key benefits of the joint model framework are the predictions which can be obtained. These predictions will rely heavily on the accuracy of the model in estimating the baseline hazard function. We illustrate this in Figure 6, whereby data are simulated under a two-component mixture Weibull baseline hazard function with a turning point. We apply joint models to the single simulated data set, firstly with one degree of freedom (equivalent to a Weibull model) and then five degrees of freedom. We then predict the marginal survival function and compare with the Kaplan-Meier survival curve. It is evident from Figure 6 that only with a sufficient number of degrees of freedom can the baseline survival function be adequately captured.
In application to the liver cirrhosis data set, we found that the restricted cubic spline approach provided improved flexibility in capturing complex baseline hazard functions when compared with a B-spline formulation with the same number of parameters, implying that we can obtain greater flexibility with fewer parameters. Of course, B-spline functions of other degrees may in fact provide well-fitting models; however, our results have shown that they can produce unstable fitted functions.
There are a multitude of extensions to this joint model framework. For example, we can incorporate a cure fraction simply because of the restricted linear basis for the final spline function. Imposing the constraint that the final spline function beyond the last knot is constant has been implemented to allow for a cure fraction in population-based cancer studies within the flexible parametric framework [30]. Furthermore, extension to the competing risks setting by modelling cause-specific hazards can be accommodated, introducing cause-specific association parameters. We can adapt the generalised linear mixed effects framework for the longitudinal measures submodel to handle categorical responses [6]. Finally, we could investigate and contrast a Bayesian approach to the proposed model [31].
In application to the liver cirrhosis data set, a single term of observation time provided sufficient flexibility to capture the shape of subject specific longitudinal trajectories; however, we could investigate further flexibility through the use of splines [6,32].
A reviewer and an associate editor raised concerns about ensuring the monotonicity of the cumulative hazard function. In our experience, including all scenarios of the simulation study, this is not a practical issue. If at any point in the estimation process, the hazard function goes negative, then the algorithm will fail. This was not observed in any simulations, ensuring that we estimated valid cumulative hazard and subsequently survival functions. We facilitate implementation of the model through user friendly Stata software [22], developed by the first author. Three choices of association are available, namely the current value association discussed earlier, the first derivative of the longitudinal submodel, and random coefficients such as the random intercept. Both non-adaptive and fully adaptive Gauss-Hermite quadrature are available. A range of other joint models can be fitted, with a variety of extensions under development, including those discussed earlier.