Simulating biologically plausible complex survival data

Simulation studies are conducted to assess the performance of current and novel statistical models in pre‐defined scenarios. It is often desirable that chosen simulation scenarios accurately reflect a biologically plausible underlying distribution. This is particularly important in the framework of survival analysis, where simulated distributions are chosen for both the event time and the censoring time. This paper develops methods for using complex distributions when generating survival times to assess methods in practice. We describe a general algorithm involving numerical integration and root‐finding techniques to generate survival times from a variety of complex parametric distributions, incorporating any combination of time‐dependent effects, time‐varying covariates, delayed entry, random effects and covariates measured with error. User‐friendly Stata software is provided. Copyright © 2013 John Wiley & Sons, Ltd.


Introduction
Simulation studies are conducted to assess the performance of current and novel statistical models in pre-defined scenarios. The quality and reporting of simulation studies vary considerably, which has led to the establishment of general guidelines for the development and reporting of simulation studies in medical research [1]. In order to establish certain properties, such as bias and coverage, or robustness to deviations from underlying assumptions, it is often desirable that chosen simulation scenarios accurately reflect a biologically plausible distribution. This is particularly important in the framework of survival analysis, where distributions are chosen for both the event time and the censoring time.
Previous studies have introduced a highly flexible framework to simulate survival data for Cox proportional hazards models [2,3]. Bender et al. noted that many simulation studies that generated survival data assumed an exponential distribution for the distribution of event times. Although many recent studies have gone beyond the standard exponential choice to a slightly more complex Weibull distribution [4,5], these choices are often not flexible enough to fully reflect the underlying distributions observed in clinical data.
Often in clinical trials [6] and population-based studies [7,8], at least one turning point is observed in the underlying hazard function. Although hazard ratios can be insensitive to a poorly specified baseline hazard function, when interest lies in measures of absolute risk, it is vital to accurately capture the baseline [9]. Through fully flexible parametric models, we can both accurately model complex hazard functions and also simulate biologically plausible survival data. Such methods are becoming more commonplace as the benefits of a parametric approach, such as the reporting of measures of absolute risk, become recognised in applied research [10]. In order to assess such parametric approaches, we require methods to simulate survival data from a variety of complex distributions, beyond standard distributions such as the exponential, Weibull and Gompertz.
Furthermore, a variety of extensions to the standard survival analysis framework, such as incorporation of time-dependent effects (non-proportional hazards), the occurrence of time-varying covariates, random covariate effects, and covariates measured with error, all require suitable simulation techniques to assess statistical models developed for each setting. A further, often observed, phenomenon in survival analysis is the presence of informative censoring. Standard survival models make the assumption of no dependence between the survival and censoring mechanisms. Assessing the robustness of methods to deviations from this assumption is a key question in survival analysis [11].
In this paper, we describe a general algorithm for the simulation of survival times. In Section 2, we introduce a motivating data set, which exhibits turning points in the underlying hazard function, which cannot be captured by standard parametric distributions. In Section 3, we briefly describe the methods of Bender et al. to simulate data from standard parametric distributions with an analytically tractable and invertible cumulative hazard function, which forms the basis for our simulation framework. In Sections 4 and 5, we describe a range of simulation scenarios, culminating in our general simulation algorithm to simulate survival data from complex distributions using root-finding techniques with nested numerical integration. In Section 6, we describe how to incorporate time-dependent effects, both with standard, and complex parametric distributions. In Section 7, we describe how to incorporate both binary and continuous time-varying covariates. In Section 8, we describe how to incorporate delayed entry, whereas in Section 9, we describe how the techniques can be applied to incorporate dependent censoring. The methods are illustrated using the publicly available survsim package in Stata [12,13]. Finally, in Section 10, we conclude the paper with a discussion.

Motivating data set
To illustrate various aspects of the simulation techniques, we use a data set of 686 women diagnosed with breast cancer in Germany [14], with 246 patients randomised to receive hormonal therapy and 440 to receive a placebo. Outcome is recurrence-free survival, with 299 patients experiencing the event of interest. We apply a Weibull proportional hazards model and a proportional hazards flexible parametric survival model with 5 degrees of freedom, adjusting for treatment in each. The flexible parametric model uses restricted cubic splines on the cumulative hazard scale as a way of getting a smooth but highly flexible parametric form incorporating turning points [9]. Figure 1 shows the fitted survival curves from each model overlaid on the Kaplan-Meier curves, by treatment group, showing the much improved fit from the more flexible model. Furthermore, in Figure 2, we have the fitted hazard functions, across treatment group, for the Weibull and flexible parametric survival models, illustrating a marked difference in the estimated underlying shapes, clearly showing that the Weibull model is inappropriate.

Simulating from standard parametric distributions
A pivotal paper by Bender et al. [3] described a highly efficient and easy to implement technique to generate survival times from a variety of parametric distributions. We briefly describe the methods of Bender et al., as it forms the basis for the extensions discussed later. The hazard function of a proportional hazards model can be expressed as where h 0 .t / is the baseline hazard function specified by some parametric distribution and X is a vector of time-independent covariates with corresponding regression coefficients,ˇ. The corresponding cumulative hazard, H.t/, survival, S.t/, and cumulative distribution, F .t/, functions are obtained as follows: S.tjX / D expOE H.tjX / and F .tjX / D 1 expOE H.tjX / If we let T be the simulated survival time, Bender et al. [3] showed that by letting or equivalently, we can write Thus, if h 0 .T / > 0, then Equation (5) can be re-arranged and directly solved for T , as long as H 0 .t / can be directly inverted.
The data-generating process then only requires draws from a uniform distribution, followed by application of Equation (5). The three standard choices for h 0 .T / are the exponential, Weibull and Gompertz distributions. These three choices can be considered restrictive in terms of the shapes of the baseline hazard function that can be generated. However, these distributions remain appealing to researchers conducting simulation studies, perhaps because they all have invertible cumulative hazard functions, allowing the direct application of Equation (6). Example 1 in the Appendix illustrates Stata code for the simulation of proportional hazards data under an exponential, Gompertz or Weibull distribution.

A general framework for simulation of survival data
We now extend the method of Bender et al. [3] to allow for more complex simulation studies where either the integral to obtain the cumulative hazard in Equation (2) is intractable or H 0 .t / is non-invertible. Figure 3 shows a schematic flow diagram illustrating the general framework for simulating survival data from a defined hazard or cumulative hazard function.

Scenario 1
Scenario 1 involves the setting of Bender et al. [3] described in Section 3.1, where the cumulative hazard function has a closed-form expression and can be directly inverted to solve for T, the simulated survival time.

Scenario 2
Scenario 2 arises when we wish to use a more complex baseline hazard function to simulate data under a proportional hazards model. In this case, we assume that the cumulative hazard function has a closedform expression. However, if we choose a more complex hazard and consequently cumulative hazard function, here we assume that we can no longer directly invert the cumulative hazard function and therefore cannot directly solve for T, the simulated survival time. In this situation, we proceed by applying root-finding techniques. We describe this in more detail in Section 5.1.

Scenario 3
Finally, scenario 3 arises when we wish to define a complex hazard function, which cannot be integrated analytically to obtain the cumulative hazard function. To accommodate this setting, we can use numerical integration techniques such as Gauss-Legendre quadrature. Following this, we once again have a cumulative hazard function, which cannot be directly inverted to solve for the simulated survival time, T , therefore requiring root-finding techniques as in scenario 2. This results in a general two-stage algorithm involving numerical integration nested within an iterative root-finding procedure. We describe this in more detail in Section 5.3.

Root finding
The first extension we describe involves the situation where we wish to use a more complex baseline hazard function, to simulate data under a proportional hazards model. In this case, we still assume that the cumulative hazard can be evaluated analytically for a given hazard function.
The step between Equations (5) and (6) is reliant on being able to directly re-arrange Equation (5) to solve for T , the simulated survival time. When this condition fails, we require iterative techniques to find the root of Equation (5). We illustrate this situation through an example.

Example: Two-component mixture Weibull distribution.
We now begin to introduce some complexity in the parametric distribution used to generate survival times. Motivation for going beyond the standard parametric approaches described in Section 3.1 originates from the often observed situation of a turning point in a data set's baseline hazard function, illustrated in Section 2. One such approach is to use a mixture distribution.
Here, we define the overall baseline survival function of a two-component parametric mixture model. Finite mixture survival models of this form have been used in standard survival analysis [15], and mixture and non-mixture cure models to obtain improved estimates of statistical cure [16]. We define parametric components additive on the survival scale where S 01 .t / and S 02 .t / are the survival functions of any standard parametric distributions, and p represents the mixing parameter where 0 6 p 6 1. For illustrative purposes, we proceed by assuming a two-component mixture Weibull distribution, with where f 1 ; 2 g and f 1 ; 2 g are scale and shape parameters, respectively. Transforming to the cumulative hazard scale and differentiating with respect to t , we obtain the baseline hazard function Proportional hazards can then be induced where X is a vector of time-independent covariates with associated regression coefficients,ˇ. This model can be used to simulate survival data from a variety of functions with turning points, to better reflect observed clinical data sets. We illustrate some examples in Figure 4, based on those seen in real data sets [6,14,17]. Equation (11) can be directly integrated with respect to t to obtain the cumulative hazard function and subsequently the survival function. However, we are still left with a survival function that when substituted into Equation (5), produces an equation that cannot be directly solved for t . We now describe two root-finding techniques to accommodate this situation. We do note, however, that in this example, we could simulate from the two-component mixture distribution by generating a categorical latent variable and then drawing from the appropriate mixture component, thus avoiding iterative techniques.

Brent's univariate root-finding method.
To generate our survival times, we need to solve for t , as follows: An efficient method to calculate the simulated survival times is to use Brent's univariate root finder. This algorithm combines the bisection method with linear or quadratic interpolation [18]. The algorithm is executed until a desired tolerance (we use a default of 1E 08) is met.

Newton-Raphson root finder.
An alternative method to Brent's root finder is to use Newton-Raphson iterations, which uses the first two terms of the Taylor series expansion of g.t/, our objective function. In our experience, we have found Brent's method to be far superior in terms of reliability and accuracy compared with Newton-Raphson iterations, which often has convergence problems.

Example simulation study
We illustrate the root-finding technique described in Section 5.1, applied to the two-component mixture Weibull distribution. In each of 1000 repetitions, we generate 1000 survival times from a two-component mixture Weibull baseline hazard function, with parameters 1 D 0:3, 1 D 2:5, 2 D 0:025, 2 D 1:9 and p D 0:3. These parameter values are chosen to closely approximate the observed hazard function seen in the preceding example data set. We also include a binary treatment variable, drawn from X i Bin.1; 0:5/, with associated hazard ratio exp.ˇ/ D 0:7, and apply administrative censoring at 5 years. Computation time to generate the 1000 data sets was 34 s on a standard 2.5 GHz desktop computer. To each simulated data set, we apply a Weibull survival model and the two-component mixture Weibull model [19], monitoring estimates of the log hazard ratio. Furthermore, we monitor estimates of the survival probability and hazard rate in the reference group (X i D 0), at time points t D f1; 2; 3; 4; 5g.
Results are presented in Table I. The mixture Weibull model produces unbiased estimates and good coverage probabilities in the log hazard ratio, and the estimates of survival and hazard, indicating its ability to capture the complex underlying hazard function. In comparison, estimates from the Weibull model indicate minor bias in the log hazard ratio, with large bias observed in the estimates of survival and hazard, across the five time points, indicating its inability to effectively capture the underlying distribution shape.
Example 2 in the Appendix illustrates Stata code used to generate proportional hazards data from the described two-component mixture Weibull model.

Numerical integration
We now describe the scenario where we define a more general functional form for the hazard function, which will require numerical integration techniques to evaluate the cumulative hazard function. This is then followed by the root-finding technique described earlier. We once again illustrate through example.   non-linearity [20]. In this case, the continuous 'covariate' of interest is survival time. Here, we use fractional polynomials to define the log hazard function.
where h 0 .t / is any general function that satisfies h 0 .t / > 0 for t > 0. Here, we expand log.h 0 .t // into a fractional polynomial function with two turning points, in this case an FP3 model with powers f1; 0:5; 0:5g.
The assumed hazard function is shown in Figure 5. This provides a reasonable fit to the example data set described in Section 2. The next step to simulate survival times from this underlying hazard function is to calculate the cumulative hazard function; however, when we substitute Equation (14) into Equation (13) and attempt to integrate, we obtain an analytically intractable integral, therefore requiring numerical techniques.

Gaussian quadrature.
Numerical integration techniques, such as Gaussian quadrature [21], provide an approximation to an analytically intractable integral. Gaussian quadrature turns an integral into a weighted summation of a function evaluated at a set of pre-defined points called nodes. For example, integrating a hazard function Z t 0 h.u/du (15) we first need to undertake a change of interval using (16) We can now numerically integrate, using for example Gauss-Legendre quadrature, resulting in where w i and´i are sets of weight and node locations, respectively. Under Gauss-Legendre quadrature, the weights w i D 1. The numerical accuracy of the approximation depends on the number of nodes, m.
In our experience, we have found that often 30 nodes are sufficient. The accuracy can be assessed by setting a seed and simulating survival times with an increasing number of nodes. Now that we can calculate the cumulative hazard, we then apply one of the root-finding procedures described in Section 5.1. The iterative algorithm, however, in this case, now, has multiple steps, including numerical integration nested within either Newton-Raphson steps or Brent's method.

Example simulation study
We now illustrate the algorithm by simulating survival data from the baseline hazard function defined in Equation (14). In this case, we can use the flexible parametric survival model, which models the baseline log cumulative hazard function using restricted cubic splines with 5 degrees of freedom, to assess how well it captures complex hazard functions. For comparison, we also apply a Weibull proportional hazards model. For each of 1000 repetitions, we simulate 1000 survival times, incorporating a binary and continuous covariate, representing gender, X 1i Bin.1; 0:5/, and age, X 2i N.65; 12/, with associated log hazard ratios ofˇ1 D 0:5 andˇ2 D 0:02, respectively. We assume administrative censoring at 5 years. Computation time to generate the 1000 data sets was 144 s on a standard 2.5 GHz desktop computer. In each repetition, we monitor estimates of the log hazard ratios for the effects of gender and age. Furthermore, we assess estimates for survival probabilities and hazard rates at t D f1; 2; 3; 4; 5g, estimated at X 1i D 0 and X 2i D 65.
Results are presented in Table II. Under the Weibull model, we observe substantial bias of 0.082 foř 1 , the treatment effect, compared with unbiased estimates under the flexible parametric survival model. Example 3 in the Appendix shows Stata code used to generate data from the described fractional polynomial baseline hazard function.

Simulating time-dependent effects
The presence of non-proportional hazards, that is, time-dependent effects, is commonplace in the analysis of time to event data [22]. This is often observed in the analysis of registry-based data sources where follow-up time can be over many years [23]. Furthermore, evidence is often found of time-dependent treatment effects [24].

Standard parametric distributions
Under standard parametric distributions, the inclusion of time-dependent effects can be conducted so as to ensure an analytically tractable and invertible cumulative hazard function. For example, under an exponential distribution or a Gompertz distribution, covariates can be interacted with time, to result in a baseline hazard function, which can still be directly integrated, and subsequently directly solved for the simulated survival time, t . Similarly, under a Weibull distribution, an interaction can be formed between covariates and log time.
For example, consider a binary covariate, X 1 , which takes values 0 or 1. Under a Gompertz baseline hazard function, we can invoke non-proportional hazards by interacting X 1 with linear time, t : Equation (18) can be re-arranged, integrated to obtain the cumulative hazard function We therefore have, from Equation (5), which can be inverted and solved for t , the simulated survival time This of course can be extended to multiple time-dependent effects; however, if we wish to use a more complex distribution, we once again have analytically intractable and non-invertible cumulative hazard functions.

Complex hazard functions
Incorporating time-dependent effects when simulating more complex hazard functions returns us to the scenario where we require both numerical integration and iterative root-finding procedures. For example, this arises when including a time-dependent effect into the two-component mixture Weibull model.
whereˇ1.t / is a function of time, t , such as a simple linear term, or something more complex such as a fractional polynomial, or spline function.

Example simulation study
We now conduct a simulation study assessing the performance of the Weibull and flexible parametric models under proportional hazards, when simulating a time-dependent diminishing treatment effect. We further apply a flexible parametric model, allowing for a time-dependent treatment effect. In all models, we assess estimates of the hazard and survival functions in each treatment group. Survival times are simulated from the baseline hazard function shown in Equation (14). For each of 1000 repetitions, we simulate 1000 survival times, incorporating a binary and continuous covariate, representing treatment, X 1i Bin.1; 0:5/, and age, X 2i N.65; 12/, and assume administrative censoring at 5 years. We simulate a time-dependent treatment effect under the following: and proportional age effect ofˇ2 D 0:02. The true hazard ratio of the treatment effect at 1, 2 and 5 years is 0.502, 0.668 and 0.994, respectively. Computation time to generate the 1000 data sets was 173 s on a standard 2.5 GHz desktop computer. In each repetition, we also monitor estimates of the log hazard ratios for the effect of age. Results are presented in Table III. As in Section 5.4, we observe very poor performance when using a Weibull model, with large bias in estimates of the hazard and survival functions in both treatment groups. We observe improved performance under the proportional hazards flexible parametric model; however, some bias and poor coverage are seen in estimates of the hazard and survival functions, particularly in the treatment group, but as this model assumes proportional hazards and the true model has non-proportional hazards, this is to be expected. Under the flexible parametric model allowing for a time-dependent treatment effect, we observe much reduced bias and improved coverage probabilities, indicating that the model has captured the complex time-dependent effect, even though we do not fit the true underlying model.

Simulating time-varying covariates
Time-varying covariates occur frequently in medical research. In cancer clinical trials, the occurrence of treatment switching or non-compliance occurs when a patient switches from, for example, the standard therapy to the new treatment, often around the time of progression. An area of increasing interest in the biostatistical literature is the joint modelling of longitudinal and survival data, where we observe a repeatedly measured biomarker and wish to investigate the association of this time-varying biomarker and survival.
Recently, Austin [25] extended the methods of Bender et al. to simulate time-varying covariates of three types: first, a dichotomous time-varying covariate that can change at most once; second, a continuous time-varying covariate; and third, a dichotomous time-varying covariate where subjects can switch groups multiple times. Austin derived closed-form expressions, including time-independent covariates, under the exponential, Weibull and Gompertz distributions.
Under our simulation framework, we generalise the approach of Austin to incorporate any combination of time-varying covariates, with a user-defined baseline hazard function to incorporate more biologically realistic hazard functions.

Simulating treatment switching
In this scenario, we wish to simulate a time-varying binary covariate. We define X 1 to represent initial treatment a patient is randomised to, with treatment A (X 1 D 0) and treatment B (X 1 D 1). We assume patients were randomised to treatment arms at t D 0. For simplicity, we allow patients to switch arm at most once. We also include a binary covariate, which represents disease severity, X 2 , with each patient having a 40% chance of having a bad prognosis (X 2 D 1), which increases a patient's event rate. Under a general baseline hazard function, h 0 .t /, we have whereˇ1 is the log hazard ratio for the effect of treatment, which in this case we assume is the same regardless of whether patients switched. In this example, we assume that a patient initially randomised to treatment (X D 1) has a treatment effect of exp.ˇ1/ until their switching time, t S , after which their hazard ratio is 1. Thus, the time dependence is introduced through the indicator functions I.t 6 t s / and I.t > t s /. The switching times need to be generated: in the example code included in the Appendix, we generate the potential switching times from a uniform distribution, which is dependent on disease severity (X 2 ). Endogenous/non-ignorable treatment switching can be created if the variable X 2 is deleted and not available for analysis. This scenario can be easily extended to allow for any number of switches. Alternatively, we could first generate a vector of survival times, t s , to represent time to progression.

Simulating a continuous time-varying biomarker and survival incorporating random effects and covariates measured with error
Here, we wish to simulate a continuous biomarker exhibiting a linear trend, under the following model: whereˇi and u i is a vector of baseline covariates with associated coefficients ı. By including our trajectory function, W .t/, in the linear predictor of our survival model, multiplied by an association parameter˛, we use the simulation algorithm described in Section 5.3 to directly simulate survival times under a joint model framework [26].
where h 0 .t / is our user-defined baseline hazard function and v i is a vector of baseline covariates with associated coefficients .
Following the simulation of survival times, we can then construct any measurement schedule we desire for the longitudinal outcome, using Equation (26), and subsequently calculate the observed longitudinal measurements. To complete the joint model framework, measurement error in the longitudinal outcome can be incorporated simply by drawing the observed longitudinal values from N.W .t/; 2 e /, where 2 e is our measurement error variance. This example further illustrates the ease at which incorporating random covariate effects can be conducted. We include example Stata code in the Appendix.

Delayed entry
It is becoming increasingly popular in epidemiological contexts to use age as the timescale, as it is often an improved way of controlling for age than including it as a baseline covariate in a standard survival analysis [27]. Equation (17) can be extended to incorporate delayed entry (left truncation) using the following transformation: where t 0 denotes the time of entry. Equation (29) can be used as before to simulate survival times, accounting for a user-specified entry time. We provide example code in the Appendix (Example 7), where the user generates entry times from a normal distribution, N.30; 3/, to represent age at entry. We can of course include any combination of time-dependent effects and time-varying covariates within our defined hazard function, h.t /.

Simulating a censoring distribution
In the previous examples, we have assumed an administrative censoring time, that is, maximum followup time that each patient can be observed. In practice, we often observe intermittent censoring, which we may wish to simulate. All of the scenarios and techniques described earlier can be used to generate censoring times. By simulating a set of event times and a second set of censoring times, for each patient, we can simply take the minimum to obtain the observed survival time and consequently the event indicator. Furthermore, by making our censoring distribution dependent on covariates (be they baseline covariates, with time-dependent effects, or time varying covariates), we can incorporate informative censoring [11]. Alternatively, we could simulate our survival times and draw from a uniform distribution between the minimum and maximum follow-up times to define a censoring fraction.

Discussion
We have described a general framework for the generation of survival data, incorporating any combination of complex baseline hazard functions, time-dependent effects, time-varying covariates, delayed entry, random effects and covariates measured with error. This centres on scenarios where we cannot define the simulated survival time in a closed-form expression. Previous work in the simulation of survival includes using standard baseline distributions, such as the exponential, Weibull and Gompertz, with time-invariant covariates [3]. Mackenzie and Abrahamowicz described techniques to allow for time-dependent effects, and allowed specification of the marginal distribution of event times and covariate distributions [28]. A recent paper by Austin provided closedform expressions to incorporate three types of time-varying covariates, which built on work by Leemis et al. [2], who described techniques to invert the cumulative hazard function with a single time-varying covariate. Furthermore, Sylvestre and Abrahamowicz described two algorithms (permutation based and binomial model based) to generate survival times with time-varying covariates [29]. Finally, Royston [30] provided a method to simulate from parametric models that use restricted cubic splines on the log cumulative hazard scale.
Our approach relies on numerical integration to evaluate analytically intractable hazard functions. In our experience, 15 to 20 Gauss-Legendre quadrature points is often sufficient to provide accurate generation of survival times; however, we use 30 nodes as computation time is minimal. As with any estimation method that utilises numerical techniques, the accuracy of the generation process can be assessed by Copyright  defining a seed and changing the tolerance of the root finder, and/or the number of quadrature nodes, and establishing that the generated survival times do not change.
We have illustrated our simulation approach through a variety of simulation studies and examples. In particular, by simulating from a complex underlying distribution, we have shown that substantial bias can be observed in estimates of the log hazard ratio for a treatment effect, when fitting a standard Weibull proportional hazards model.
Although in this article we have extolled the benefits of simulating from distributions beyond the standard choices, it must be stated that in many settings, a simpler distribution may be adequate. For example, if fitting Cox models under proportional hazards and only the hazard ratio is of interest, then the baseline distribution used is inconsequential, and therefore a simpler distribution should take preference. However, as described earlier, if evaluating parametric methods or incorporating time-dependent effects, then using a more complex distribution can provide much more realistic scenarios in order to fully assess the methods being evaluated.
To facilitate the use of the methods described in this article, user-friendly Stata software has been developed [12,13]. For each of the examples described in this article, we include example Stata code to simulate the survival times in the Appendix. Further extensions not shown in this article include incorporating a cure proportion. This can be easily achieved by defining a mixture or non-mixture cure hazard function. This framework can also be applied in the generation of competing risks data, be it through cause-specific hazards or the approach of Beyersmann et al. [31].
Given the inherent requirement of simulation studies to assess the statistical properties and performance of current and novel methods, we believe this framework can play an important role in allowing the generation of more biologically realistic survival data, incorporating sufficiently complex scenarios. For example, the two-component mixture distribution described in Section 5.1.1 has recently been used to simulate joint model data from a baseline (cumulative) hazard function, to assess the use of splines to capture complex baseline hazard functions [26]. Although we have concentrated on parametric survival models in this article, the framework is entirely applicable to examining the performance of the Cox model in any of the scenarios described [32].

Example 5: User-defined log hazard function using fractional polynomials, with a binary time-varying covariate
We simulate 1000 survival times from a user-defined log hazard function using fractional polynomials of time, incorporating a proportional effect of age = 0.02, a binary time-varying treatment group and a binary disease severity indicator. Treatment switching is dependent on disease severity. Administrative censoring is assumed at 5 years through use of the maxt() option.

Example 6: User-defined log hazard function using fractional polynomials, with a continuous time-varying covariate
We simulate 1000 survival times from a user-defined log hazard function using fractional polynomials of time, incorporating a continuous time-varying biomarker. We further construct the observed biomarker values incorporating measurement error. Administrative censoring is assumed at 5 years through use of the maxt() option.

Example 7: User-defined log hazard function with delayed entry
We simulate 1000 survival times from a user-defined log hazard function using fractional polynomials of time, conditional on a vector of entry times, incorporating a proportional effect of treatment = 0.5. Administrative censoring is assumed at 5 years through use of the maxt() option.