Four alternative methodologies for simulated treatment comparison: How could the use of simulation be re‐invigorated?

Simulated treatment comparison (STC) is an established method for performing population adjustment for the indirect comparison of two treatments, where individual patient data (IPD) are available for one trial but only aggregate level information is available for the other. The most commonly used method is what we call ‘standard STC’. Here we fit an outcome model using data from the trial with IPD, and then substitute mean covariate values from the trial where only aggregate level data are available, to predict what the first of these trial's outcomes would have been if its population had been the same as the second. However, this type of STC methodology does not involve simulation and can result in bias when the link function used in the outcome model is non‐linear. An alternative approach is to use the fitted outcome model to simulate patient profiles in the trial for which IPD are available, but in the other trial's population. This stochastic alternative presents additional challenges. We examine the history of STC and propose two new simulation‐based methods that resolve many of the difficulties associated with the current stochastic approach. A virtue of the simulation‐based STC methods is that the marginal estimands are then clearly targeted. We illustrate all methods using a numerical example and explore their use in a simulation study.


Highlights
What is already known • Simulated treatment comparison (STC) is an established method for performing population-adjusted indirect treatment comparisons.• It is typically used where the results from two trials are compared, one of which only provides aggregate-level data.• The most commonly used approach, which we call 'standard STC', does not in fact involve simulation.• The other existing approach is to simulate patient profiles.
• Both of these approaches have statistical problems.

What is new
• We propose two new simulation-based methods for performing STC.
• These new approaches resolve most of the difficulties with the current simulation-based approach.
Potential impact for research synthesis methods readers • Much of the existing literature on STC is non-specific.
• By fully describing the steps involved in this type of analysis, we aim to make the corresponding statistical theory more tangible.• In addition to presenting two new statistical methods, we examine the history of STC and provide a statistically rigorous account of this methodology.• This article is intended to provide an accessible introduction for readers who may be unfamiliar with STC, and develop this statistical methodology further.

| INTRODUCTION
With an increasing number of new treatments, it becomes even more important for clinicians, patients and regulators to compare the available treatments using comparable estimates and outcomes.In medical statistics, an indirect treatment comparison is frequently used to compare treatment effects from different studies when there is no evidence from head-to-head trials.Standard methods for indirect treatment comparisons [1][2][3][4][5] either assume no important differences in the trial populations, incorporate these differences as between-study heterogeneity or try to explain these differences using metaregression. 5When making indirect comparisons, adjusting for population differences between trials is often desirable to make this comparison more equitable.A variety of statistical methods for populationadjusted indirect comparisons have been developed for the case where we compare active treatments from two different trials.][8][9][10] This aggregate level data are typically provided in published papers and reports.We distinguish between anchored and unanchored analyses by considering if a common comparator treatment is present in both trials.For the anchored case, a common comparator, such as a placebo, exists in each trial, but this is not the case in the unanchored scenario. 9n the anchored case, we only adjust for population differences of effect modifiers (covariates that have impact on treatment efficacy) but do not adjust for the solely prognostic covariates.This is because the indirect comparison is performed by comparing estimated treatment effects, and any imbalances between the prognostic covariates have already been accounted for by withintrial randomisation, as mentioned by Phillippo et al. 9 However, in the unanchored case, it is necessary to adjust for both effect modifiers and prognostic variables, 9 because the indirect treatment comparison is then performed by comparing average outcomes.In this work, we will consider the anchored case where patients in the company's trial randomly receive treatment A or treatment B, and patients in the competitor's trial randomly receive treatment A or treatment C. Hence treatment A is the common comparator.For brevity, we will also refer to the company's trial as the 'AB trial' and the competitor's trial as the 'AC trial'.The aim is to adjust the AB trial patient outcomes to match the AC trial population, and so estimate the relative effect of treatment B versus A in the AC trial's population.The indirect treatment comparison of treatments B and C can then be performed using standard methods for indirect treatment comparisons. 1 This adjusted indirect comparison is performed in the AC trial population because we only have access to aggregate level information for the AC trial, and so we cannot make individual level adjustments for this trial.
Simulated treatment comparison (STC) is an established approach for performing this type of statistical analysis.The history of STC starts with Caro and Ishak, 7 who simulate data in an additional arm within a trial without any indirect treatment comparison involving another trial.Ishak et al. 11 build upon this idea, proposing to fit a model in the company's trial where IPD are available (referring to this as the 'index trial') and using this model to predict what patient outcomes would have been if the trial had been performed in the population of competitor's trial.By using these predicted outcomes in the indirect treatment comparison, a population-adjusted analysis can then be performed in the population of competitor's trial.Two approaches for making the required predictions are discussed in section 4.2 of Ishak et al. 11 : we either 'plug in' the mean values of the competitor's trial or simulate patient outcomes.Both approaches have statistical problems.Ishak et al. discuss bias with the first, when the link function used in the model is not linear.For the second, Phillippo et al. 9 raise concerns about 'correctly quantifying the uncertainty in the resulting indirect treatment comparison', and there is also the additional difficulty in simulating correlated patient covariates (as discussed in section 4.2 of Ishak et al., an issue is that the correlation structure of the covariates is typically unknown for aggregate level data study because this is not usually reported, so covariate correlations will need to be 'borrowed' from the IPD study).In addition, there is the problem that the simulation-based method may be subject to an unacceptable amount of Monte Carlo error and not reproducible because the results could be very sensitive to the random seed used.The former, more straightforward 'plug-in the means' approach is used in the worked example in NICE DSU Technical Support Document 18: Methods for population-adjusted indirect comparisons in submissions to NICE 9 and, in our experience, has become the default 'standard STC' because it is so straightforward to implement.However, it is essential to recognise that the two possibilities of plugging in the mean values from the competitor's trial and simulating from the model fitted using the data from the IPD trial have always co-existed in the literature.
3][14] MAIC is a propensity score reweighting method that calculates weights for the IPD in the company's trial to match the moments of covariates from the competitor's trial.ML-NMR is an extension of standard network meta-analysis (NMA). 2,5,15,16In the ML-NMR model, we specify the individual level model and obtain aggregate level models by integrating over distributions of covariates.Each method has its advantages and disadvantages.For example, MAIC and STC have only been developed to compare two treatments and can be used for both anchored and unanchored cases.MAIC assumes that the distributions of the covariates overlap between trials and can perform poorly when population distributions are significantly different. 13Regression-based methods such as STC do not require overlapping covariates because we can use the regression equation to extrapolate.However, this extrapolation may be sensitive to the model specification and may raise concerns in practice.As mentioned above, STC methods that do not use simulation may lead to bias when the link function used in the outcome model is non-linear. 11Furthermore, it is unclear whether conventional STC targets a marginal or a conditional estimand. 17These are the main reasons we wish to re-invigorate simulation back into STC: it resolves these two concerns with the conventional approach.If Monte Carlo Simulation is not used, STC may appear to be a misnomer unless one interprets simulation in the context of 'virtual' outcomes.However, this is not the usual meaning of the term simulation in statistics.We can apply ML-NMR to any connected network of evidence and obtain estimates in any target population, but methodological development is required for ML-NMR to be applied to time-to-event data.
In this article, we will carefully re-examine the two existing STC methods already in the literature and discussed by both Ishak et al. 11 and Phillippo et al. 9 We will propose two new ways to use simulation in STC, where we can quantify the uncertainty of treatment effect estimates using principled statistical methods.Our two new methods also greatly reduce concerns about reproducibility and the magnitude of the Monte Carlo error.The rest of this article is structured as follows.Section 2 describes Bucher's method for performing indirect treatment comparisons without population adjustment and its limitations.In Section 3, we describe four different methods for STC.These methods include the standard STC method, with no simulation, and the existing 'single imputation method'.We also develop the new 'multiple imputation' and 'infinite population' methods to resolve most of the difficulties with the current simulation-based approach.Section 4 applies all four STC methods to the numerical example in NICE DSU Technical Support Document. 9A simulation study is conducted in Section 5, followed by a discussion in Section 6.

| BUCHER'S METHOD
An unadjusted indirect treatment comparison of two randomised controlled trials (Bucher's method 1 ) is conducted without population adjustment.This method is a special case of network meta-analysis, but it has the potential to produce biased results if the distributions of effect modifiers are not balanced across studies.To estimate the relative effect of treatment C versus treatment B, we use the formula b where, σ AC AC ð Þ is the standard error of b Δ AC AC ð Þ in the AC trial, and we assume this is available from the published report in the AC trial or can be calculated from other published quantities, such as a confidence interval or p value.
An unadjusted analysis is based on the implicit assumption that the marginal relative effects are similar across different populations.Random effects models for network meta-analysis include between-study heterogeneity.However, estimating between-study heterogeneity based on only two studies is not feasible.In our study, we assume that the effect modifiers account for any between-study heterogeneity.Consequently, when variations exist in the distribution of effect modifiers across trials, the relative effect may no longer remain constant.As a result, using the unadjusted method may lead to biased estimates of the relative effect.To address this, we need to consider the influence of effect modifiers carefully in our analysis to ensure more equitable results.In the next section we present four versions of STC that can be used for this purpose.

| FOUR DIFFERENT METHODS FOR STC
An essential step of any STC is to fit an outcome model to the company's trial with IPD available.As explained in the introduction, we consider the anchored case so that the model should include all effect modifiers.Typically, this model will be a generalised linear model of the form where, θ t is the mean outcome on treatment t, for t A,B f g, and g is an appropriate link function.
x AB denotes the effect modifier, the term β T 1 describes the main effect of the effect modifier and β T 2 describes the interaction of the effect modifier with treatment group.The term I t ¼ B ð Þ is an indicator that the treatment t is the active treatment B, so that γ B is the treatment effect of treatment B, relative to treatment A, for a patient with 2) is essentially also given as eq.( 8) of Phillippo et al. 9 that is also fitted to the 'AB trial', where we have not included any prognostic variables.We use the subscript AB in x AB to emphasise the fact that this model is applied to the AB trial data, whereas Phillippo et al. 9 place this emphasis on the model parameter, but this is a cosmetic difference.
Phillippo et al. 9 state that adding other prognostic covariates might increase precision and improve model fit (see their supplementary materials).Hence, instead of the model (2), we could fit the slightly more general model where the term β T p x AB,p is the effect of the additional prognostic variables in the model.

| The standard 'no simulation' method (STC)
In the standard 'no simulation' method, we predict what the AB study's estimated treatment effect would have been if in AC trial's population, by 'plugging in' the mean covariate values in the AC trial's population into the model fitted to the AB trial data.Furthermore we may determine how the predicted treatment effect depends on the properties of the population of the competitor trial by substituting other representative values of x AC .When we use the fitted model (2) or (3), the predicted treatment effect will be b γ B þ β T 2 x AC .Hence the required prediction, and its variance, would need to be calculated from this linear combination.As explained in the supplementary materials of Phillippo et al., 9 the necessary calculations are simplified by re-parameterising models (2) or (3) so that the covariates in the company's trial x AB are centred at the corresponding competitor's trial means.Defining z AB to be these centred values, that is, z AB ¼ x AB À x AC , instead of fitting model ( 2), we equivalently fit the model to the company's trial's data.Note that there is a clash of notation between Equations ( 2) and (4), for example, the intercepts β 0 take different values when using the transformed data z AB .We then substitute the covariate means of the aggregate data from the AC trial into the outcome model and also use the regression model parameter estimates to predict the outcomes of the competitor's trial.This is straightforward when using the transformed covariates because, by definition, the transformed covariate means of the population of competitor's trial are z AC ¼ 0. This results in the prediction Equation ( 5) is a straightforward formula for prediction, where b γ B is the predicted effect of treatment B, relative to A, in the AC trial population.The advantage of using the transformed z AB now becomes clear because the predicted treatment effect b γ B is directly an estimated model coefficient that can simply be extracted from the model fit, reported by statistical software.We now define b to emphasise that this is the estimated treatment effect of treatment B, relative to treatment A, but predicted for the AC trial population.
Finally, we estimate the required effect of treatment C, relative to treatment B, in the AC trial population by performing a standard indirect treatment comparison 1 where, b Δ AC AC ð Þ is the published estimate of the relative treatment effect of C, relative to A, from the competitor's trial.Note that this requires that the link function g relates to the outcome measure used in the indirect comparison.For example, if the outcome is a mean difference, the identity function would be used.However, a binary outcome is most commonly modelled using log odds scale, in which case g is a logit function.The standard error is calculated as where, σ 2 is either available from the aggregate level data published for the competitor's trial or can be calculated from other published quantities.

| The single imputation method (STC-SI)
The standard STC method described in Section 3.1 does not involve any simulation.Instead, we plug the covariate means of the competitor's trial into the outcome model fitted to the company's trial to make the required predictions of what results the company's trial would have produced if it had been performed in its competitor's population.It is simple to implement, but it may be biased when the link function g is non-linear. 11Ishak et al. 11 propose a second approach for STC using simulation.In this section, we describe this alternative approach.We follow Ishak et al. in simulating a trial of the same size as the competitor's trial.They use the terms 'index trial' and 'comparator group' when referring to the trials for which we do, and do not, have IPD available and say, 'It is important to maintain the sample size of the comparator group when generating virtual event times to ensure the variance estimate is a reasonable approximation'.Hence we simulate a trial where the covariate distributions and the sample size are the same as the competitor's trial but where the company's treatment is the active treatment.We therefore simulate what the company's trial's results would have been if these important features had been the same as its competitor.However, we suggest that it is open to debate as to whether or not it is most appropriate to simulate a trial of the same size as the company's or the competitor's trial, and we return to this in the discussion.
A difficulty is that we do not know all aspects of the population of competitor's trial.We therefore do not immediately have the information required to perform the simulation.Typically we will only have the means and standard deviations for continuous covariates and sample proportions for binary covariates reported in the publication or trial report for the competitor's trial.We therefore simulate from the competitor's trial population as accurately as we can, given the information available to us.We simulate continuous covariates with their reported means and standard deviations, and binary covariates with their reported proportions, from the competitor's trial but using correlations and distributional assumptions from a different source of information.If the competitor's trial does not report standard deviations, these will also be obtained from another source of information.Typically this information will come from the IPD from the company's trial and we will use this throughout.We describe the steps of this approach as follows: 1. Simulate patient covariates for a trial that is the same size as the competitor's trial with a. Means and standard deviations reported by the competitor's trial for continuous covariates.b.Proportions reported by the competitor's trial for binary covariates.c.Appropriate properties for other types of covariates, for example categorical or count data, could also be simulated using reported information from the competitor's trial.d.Correlations and any distributional assumptions (e.g., for continuous covariates) from a different information source where this information is available.Typically this information will come from the IPD in the company's trial.
The main difficulty is simulating correlated covariates of different types, for example, continuous and binary.This is discussed by Ishak et al., 11 who says that 'individual characteristics should be generated from a multivariate normal distribution', but they also note that 'complications arise for categorical predictors since a normal distribution will generate a continuous value rather than 0/1 value'.For computational convenience we will simulate correlated covariates of different types using the 'SimCorrMix' package 18 in R. 19 More specifically, the 'corrvar2' function from this package was used to simulate correlated variables (which may be continuous, binary or count) using the properties reported in the aggregate level competitor's trial information and the required correlation matrix computed from the company's trial.By using method = 'polynomial' and specifying both the required mean and standard deviation of continuous covariates, we have found that 'corrvar2' provides simulated continuous random variables that appear nicely normally distributed and with the correct correlations with other variables.However, a limitation of our use of the 'corrvar2' function is that we have to use a normal distribution in situations where continuous covariates are not truly normally distributed even after a transformation, for example where a uniform distribution is required.In the next section, we will discuss this in the context of our example, where the covariate age is known to follow a uniform distribution.In situations where all covariates are continuous, it is simpler to simulate directly from a multivariate normal distribution, possibly using transformed covariates to aid normality.When simulating covariates in steps (a-d) above, suitable aggregate-level information from the competitor trial is required.The methods described by Wier et al. 20 may be useful in situations where some values are not reported.
2. Generate the treatment group t for each of the set of simulated patient covariates produced in step 1.Here we allocate simulated patients to the two treatment groups (A and B) in the company's trial.If the treatment allocation ratios are not 1:1 in either trial, either allocation ratio could be justified here, and we return to this issue in the discussion.We now let the notation x AC 0 , t ð Þ denote a patient simulated covariates from step 1 and treatment group in this step, the subscript AC 0 ð Þ emphasises that the estimation is performed in the simulated AC trial population.3. Fit an outcome model to the data from the AB trial.
An appropriate parametric model should be used, such that data can easily be simulated from, for example model ( 2) or (3).In the context of a timeto-event outcome, this model could be a parametric survival model that includes the main effects and interactions of effect modifiers with the treatment group, and also the main effects of any baseline covariates that are solely prognostic.We suggest that it is better to include the main effects of prognostic variables as in model ( 3) if there are not so many such variables to render this unfeasible.4. Compute the estimated linear predictors using the fitted model from step 3 and the simulated data from steps 1 and 2. For example, if we fit model (3) to the data from the AB trial in step 3, we compute the linear Þ is the probability of Bernoulli distribution if the outcome is binary: 5. Complete the simulation procedure by simulating the outcome data y x AC 0 ,t ð Þ.We simulate this outcome data using the x AC 0 , t ð Þ, simulated in steps 2 and 3, the estimated linear predictors in step 4 (and any additional parameters estimated in the model fitted at step 3 and required for simulation; for example, the residual variance if this is a linear model or the baseline hazard if this is a Cox model).6.Having simulated the outcome data in the AB trial, if it had instead been conducted in the AC trial population with the same size as the AC trial, we can now produce the population-adjusted analysis.We now use the notation b Δ AB AC 0 ð Þ to denote the estimated treatment effect of treatment B relative to treatment A, using the simulated data from steps 1-5 for the AC trial population.We compute b and its variance σ 2 Þ by fitting the required type of regression model on the treatment group variable using the simulated data (e.g., logistic regression for binary outcomes).7. Finally, we perform the indirect comparison in the same way as for the previous standard STC method.We compute the population-adjusted treatment effect of treatment C relative to treatment B, as with standard error where, as in the standard STC method in Section 3.1, b Δ AC AC ð Þ is the published estimate of the relative treatment effect of C, relative to A and σ 2 AC AC ð Þ is either available or calculable from the aggregate level data published for the competitor's trial.Equations ( 9) and ( 10) are similar to ( 6) and ( 7) but in the former pair of equations AC 0 is used in the subscript to emphasise that the estimation is based upon simulated data in the AC trial population.Since we only simulate the AC 0 data once, and we conceptualise the predicted results that would have been obtained if the AB trial had been conducted in the AC trial population as missing data, we refer to this approach as the 'single imputation method'.

| The multiple imputation method (STC-MI)
The single imputation method in Section 3.2 resolves concerns about the bias when the link function g is not linear but at the cost of introducing new issues.One issue is that we only simulate the data once the single imputation method, which is not robust.This is because if a different random seed is used, then substantively different results may be obtained.Another related issue is that the results may only be reproducible if the random seed is judiciously specified prior to analysis.Finally, the single imputation model fails to account for parameter uncertainty because this is not allowed in step 4, where the estimated linear predictor is used as if it were the truth.
Repeating the single imputation method multiple times, giving rise to what we call the 'multiple imputation method', is a more advanced (but also more complicated and computationally intensive) approach.We propose using a resampling approach to produce robust results and accommodate parameter uncertainty.Here we resample patients in the AB trial within treatment groups with replacement to produce a new representative sample.By creating N draw resamples in this way and applying the single imputation method to each of these in the way described in the previous section, we produce adjusted estimates b These N draw estimates are then averaged to provide a final b Δ AB AC 0 ð Þ that can be used in Equation (9) in exactly the same way as in the single imputation approach, to produce the adjusted estimate.By using a relatively large N draw , for example, N draw ¼ 50 or N draw ¼ 100, or an even larger value, we obtain results that are robust to the random seed used.If the random seed (or seed for each resample) is specified prior to analysis, the analysis can be made reproducible.By resampling the AB trial multiple times and obtaining different estimated regression parameters for each simulated dataset in step 4 in Section 3.2, we accommodate parameter uncertainty in the outcome model fitted to the AB trial.
We call this the 'multiple imputation method' because, as in the single imputation method, we conceptualise the predicted results, that would have been obtained if the AB trial had been conducted in the AC trial population, as missing data.However, now we simulate multiple realisations of this missing data, analyse them as if they were observed and average the resulting estimates, as in Rubin's Rules for multiple imputation.
However, a difficulty is that we must obtain a valid σ 2 AB AC 0 ð Þ to use in Equation (10), which is now the variance of the average population-adjusted estimate of treatment B relative to A across the N draw replications.Rubin's rules provide an estimated variance formula that allows for within and between imputation variation of estimators, 21 but this is for the case where a more conventional type of statistical analysis is performed, that is, a single statistical model is adopted, and some data used for model fitting are missing.Here a much more unusual statistical procedure is used, whereupon fitting a statistical model at step 4 in Section 3.2, we then use this to make predictions for an entirely new set of patients at step 5 and fit a substantively different statistical model at step 6 to provide our estimator.In particular, step 6, where an additional statistical model is used, does not align with the assumptions of Rubin's rules.Hence, rather than attempting to use or modify the usual Rubin's rules variance formula, 21 non-parametric bootstrapping was used.Here patients are resampled, with replacement and within each treatment group, to produce a bootstrap sample.The multiple imputation method is then implemented for each bootstrap sample.Finally, σ 2 AB AC 0 ð Þ is obtained as the sample variance of the estimates from all bootstrap replications.One consequence is then that the resamples made for the bootstrap replications are then resamples of resamples of the original data (because the bootstrap samples are themselves resamples).This can have implications for the stability of the standard error, as we will illustrate in our simulation study in Section 5.This method is very similar to a double bootstrap approach which has been previously used to, for example, estimate the correlation with an interval used as an input in the multivariate meta-analysis of mixed outcomes. 22,23.4 | The infinite population method (STC-IP) Finally, we consider a third STC method that involves simulation.This approach is conceptually different to the first two methods.This is because it translates what Equation ( 9) implies for the estimated effect of treatment B, relative to treatment A, in the AC trial population, where this population is of infinite size.Populations are usually conceptualised as being of infinite size in statistics, and this elegantly resolves questions about the size of the counterfactual AB study in the AC population that should be simulated (see also Section 3.2, where we discuss Ishak's recommendation in this regard).This is because we no longer simulate realisations of the AB trial as if its key characteristics had been that of the AC trial, where previously this had included both its covariate distributions and its size.Instead we now direly translate Equation ( 9) to the AC trial population that is regarded as infinite.
To apply this method, we follow the single imputation method in steps 1-6, but in the first step, we simulate a very large sample of, say 100,000, to approximate the infinite AC trial population.An even larger sample is even better if it is computationally feasible, and we use 1:1 randomisation in step 2. This gives us a point estimate of b Δ AB AC 0 ð Þ for our adjusted analysis in step 6.We use non-parametric bootstrapping to compute σ 2 AB AC 0 ð Þ in Equation (10) and so propagate the uncertainty in the model fitted to the IPD from the AB trial.Here we create sufficient bootstrap samples by resampling patients of the AB trial with replacement in each treatment group.For each bootstrap sample, we obtain the point estimate b Δ AB AC 0 ð Þ by applying the single imputation method to the resampled AB dataset and the large sample of AC 0 simulated earlier.Then σ 2 AB AC 0 ð Þ is calculated using the sample variance of all point estimates from bootstrap replications.

| APPLICATION TO A NUMERICAL EXAMPLE
In this section, we apply all four STC methods described above to the numerical example from NICE DSU Technical Support Document. 9This is a simulated example but has been used to illustrate methodological work in the past. 24

| Data generation
In this example, 500 patients in the company's trial (the AB trial) receive treatment A (a common comparator such as a placebo) or treatment B with equal proportions of 0.5.
Similarly, 300 patients in the competitor's trial randomly receive treatments A and C with equal proportions of 0.5.
The covariates considered are age and sex.The age of the patients from the company's trial is uniformly distributed between 45 and 75, and the age of the patients from the competitor's trial is uniformly distributed between 45 and 55.In the company's trial, 64% of patients are female, while in the competitor's trial, 80% of patients are female.The IPD are available from the company's trial, and only aggregate data are available from the competitor's trial.The outcome variable of interest is binary and is simulated from the logistic model where p it is the probability of an event that the individual patient i receives treatment t for t A, B, C f g, β B ¼ À2:1 if the individual patient receives treatment B, β C ¼ À2:5 if the individual patient receives treatment C, and I t ≠ A ð Þ is an indicator function for t ≠ A. The true relative effect Δ BC , that is, the log OR for treatment C versus B is conditional (on covariates age and sex) and calculated by β C À β B ¼ À2:5 À À2:1 ð Þ¼À0:4.Thus treatment C has better performance than treatment B if the outcome is taken as an adverse event.It is worth noting that from Equation (11), the conditional relative effect of treatment C versus B is unique for all ages because it does not depend on age.However, the calculation of conditional relative effects of B versus A and C versus A will both depend on the individual's age.
The covariate age is an effect modifier, and the patients enrolled in the competitor's trial are older than the patients in the company's.Hence we require population adjustment in the indirect comparison to make this more equitable.The patients of the AC trial were simulated at the IPD level but aggregated to the number of events in each group, so that only the number of events in each treatment group, the proportion of male patients and the mean age (and its standard deviation) are available for analysis. 9

| Marginal and conditional estimands
There are two main ways to define the true value of the relative effect of treatment C versus B in AC trial population, Δ BC AC ð Þ : the marginal relative effect and the conditional relative effect.The topic of the most appropriate estimator to use has been debated particularly by Remiro-Az ocar et al. 17 and Phillippo et al. 25 The conditional relative effect implied by model ( 11) is calculated at the individual level because the treatment effect β t is the coefficient of logistic regression, which is conditional on the effects of covariates.The marginal relative effect is the average treatment effect at the population level in the AC trial population.The conditional and marginal effects differ when the effect size is not collapsible, 26,27 as is the case here.
Which type of treatment effect standard STC targets is unclear, because the calculation of its relative effect b Δ BC AC ð Þ involves plugging in the covariate means into a multivariable regression fitted to the AB trial (and so the resulting estimated relative effect is conditional) and comparing this estimated relative treatment effect to an unadjusted, and so marginal, estimate from the AC trial.The STC with simulation methods (STC-SI, STC-MI and STC-IP) however clearly target the marginal relative effect because we simulate the AC trial population and estimate marginal treatment effects from this.
This article uses the conditional relative effect Δ BC ¼ β C À β B ¼ À0:4 (which is the differences of log odds ratios) as the true effect in this numerical example and the simulation study that follows in Section 5.This conditional relative effect is based on the model parameters from (11).One way to calculate the marginal relative effect is by using the Monte Carlo approach to compute an otherwise complicated integral.More specifically, by simulating a very large sample of AC trial covariates, but randomly assigning patients to receive treatment B or C, calculating the probability of an event using Equation (11), simulating the outcomes and calculating the unadjusted log OR comparing treatments B and C. We simulated 1,000,000 patients' covariates and outcomes in this way and obtained a marginal relative effect of À0.3796.The marginal relative effect differs only slightly from the conditional relative effect.Hence we will take À0.4 as the true effect, in both this section and the simulation study in Section 5.

| Statistical methods
In this section, we describe how we apply all four STC methods described more generally in Section 3 to our numerical example.We will compare their results to an unadjusted indirect comparison (Bucher's method 1 ).

| The unadjusted method
To perform the unadjusted analysis with binary outcome case, we obtain the point estimate of b Δ AB AB ð Þ and its variance σ 2 AB AB ð Þ by fitting a logistic regression model to the IPD of AB trial on treatment group only, and so calculating an unadjusted treatment effect.The standard error σ BC of the estimate b Δ BC is calculated as in Equation ( 1) assuming that σ 2 AC AC ð Þ is available from the published report of AC trial.

| The no simulation method (STC)
For the no simulation method, Equation ( 2) was used to produce the adjusted estimate where the outcome is binary, g is the logistic function, θ t is the mean outcome and X AB is the covariate age, which is the effect modifier.This version includes no prognostic variables, but we apply it because it is the standard and the primary analysis in the supplementary materials of NICE DSU Technical Support Document. 9The relative effect b Δ AB AC ð Þ is the estimated model coefficient that can simply be extracted from the model fitted as Equation ( 4).Then we perform the indirect comparison as described in Equation ( 6) and calculate standard error as Equation (7), where b σ 2 can be extracted directly from the model fitted to the company's trial.

| The single imputation method (STC-SI)
For the single imputation method, we follow the steps in Section 3.2.In the first step, we simulate the patient covariates with mean and standard deviation estimated from the AC trial, but the correlation between age and sex estimated from the AB trial.This is because the correlation between covariates in the aggregate level AC trial information is not reported.We allocate half the patients with treatment A and half with treatment B to maintain the 1:1 randomisation as discussed in step 2. Then we follow step 3 to fit the outcome model of Equation (3) to the IPD of AB trial, where x AB,p is an indicator for sex and x AB is age.In step 4, we estimate the linear predictor b θ t x AC 0 ð Þ for the simulated patient covariates using the fitted model.Here the linear predictor b θ t x AC 0 ð Þ is the mean of the Bernoulli distribution because we have the binary outcome.In step 5, the individual outcomes are then simulated from a Bernoulli distribution (or Binomial distribution) with the estimated means.In step 6, the relative effect b Δ AB AC 0 ð Þ of treatment B versus A in simulated AC 0 trial is estimated by fitting the logistic regression on treatment.Finally, the indirect comparison is performed using Equation (9) in step 7, with the standard error calculated as Equation (10).
In this example, age is uniformly distributed, but a normal distribution is used to simulate the covariates in the AC trial.This is a limitation of using the SimCorrMix R package.In practice, the IPD from the AC trial are not observed, so we would not know that this is necessarily an issue, although we might suspect it is due to the corresponding covariate distribution in the AB trial, or even from common-sensical reasoning.Although we are content to simulate from a normal distribution with standard deviation implied by the uniform distribution, and so a normal distribution that resembles the uniform distribution as closely as possible in terms of its first two moments, this does involve an element of pragmatism.We advocate that analysts use the most sophisticated simulation procedures available to them, and we regard the SimCorrMix package as impressive but not a panacea.

| The multiple imputation method (STC-MI)
We apply the STC-SI method multiple times for the multiple imputation method to allow for parameter uncertainty and make the results more robust.Here we simulate N draw ¼ 50 resamples of AB trial with replacement and within the treatment group to maintain the ratio in randomisation of each trial (see Section 3.3).For each resample, we apply the STC-SI method and obtain the point estimate of b Here we simulate N bs ¼ 30 bootstrap replications by resampling the patients of the AB trial with replacement and within the treatment group.Specified seeds were set for any stochastic method (simulation, resampling, bootstrapping) to ensure reproducibility.

| The infinite population method (STC-IP)
When applying the infinite population method, we repeat steps 1-6 of this method discussed in Section 4.3.3 to calculate the point estimate b Δ AB AC 0 ð Þ , except that we simulate a substantial sample (100,000) of patients to approximate the AC trial population.Since we do not maintain the AC trial's sample size when simulated patients, the variance cannot be simply extracted from the model fitted to the simulated AC 0 trial.Instead, we use the non-parametric bootstrapping to propagate the uncertainty of parameters in model ( 2) or (3) and compute σ 2 AB AC 0 ð Þ for this purpose, see Section 3.4.We create N bs ¼ 30 bootstrap samples by resampling patients of the AB trial with replacement in each treatment group.The random seed specification was also set prior to analysis to ensure reproducible results.However, in this method, a very large sample is simulated, and the only other stochastic method is bootstrapping to get standard errors, so the seed setting is not essential.

| Results
The resulting five different estimates of Δ BC (using the four different methods for STC and also the unadjusted indirect treatment comparison using Bucher's method), and the corresponding 95% confidence intervals, are summarised in Table 1.These estimates and confidence intervals are visualised in Figure 1.These results show that all STC methods performed better than the unadjusted comparison in the sense that they reversed its incorrect directional estimated effect.However, all STC methods result in similar estimates and confidence intervals, and all five confidence intervals contain À0.4.It is therefore difficult to determine which method is best on the basis of this single example.In order to compare the performance of these five methods more meticulously, a simulation study was performed, and this is described in the next section.In the previous section, we found that all four methods for STC perform very similarly for the simulated dataset used in NICE DSU Technical Support Document. 9In this simulation study, this scenario is replicated 1000 times so that their performance can be more accurately assessed.The same codes used by Phillippo et al. 9 were used to produce the replications, so that each replication has exactly the same properties as the numerical example in the previous section.As in Phillippo et al. and so in the example in Section 4, the simulated data from the competitor AC trial was reduced to the aggregate level, so that only the number of events in each treatment group, the proportion of patients who are male and the mean age (and its standard deviation) are available for analysis.The five methods used in Section 4 were applied to each replication, so that their properties across all 1000 realisations of the simulated datasets could be assessed.The main quantities of interest are the bias of estimates of Δ BC AC ð Þ (difference between average b Δ BC AC ð Þ and À 0.4), the coverage probability of the corresponding 95% confidence intervals and the average confidence interval length.
The main results of the simulation study are summarised in Table 2 where biases are shown with Monte Carlo standard errors in parentheses.The main conclusion is that all four STC methods are effective in eliminating the bias from an unadjusted analysis.The STC-MI and STC-SI biases are very similar as expected, because these two methods are very closely related, where the former method is simply the repeated use of the latter method using resampled datasets.The Monte Carlo standard error of STC-MI is, however, slightly smaller than the STC-SI Monte Carlo standard error, which is likely because of its greater stability due to being based on multiple resamples.One consequence of this slightly smaller Monte Carlo standard error is that the lower boundary of the corresponding 95% confidence interval for the bias is only slightly less than zero, and so the ramification of the greater precision of STC-MI is that a little bias from it may be harder to rule out.The standard STC method has performed well, suggesting that any issues relating to bias due to a non-linear link function g are unimportant.STC-IP also performs well, but our simulation study provides proof of concept that all four methods for STC are effective in removing bias from an unadjusted analysis.
The bias resulting from the unadjusted analysis has undesirable consequences for the coverage probability of its confidence interval, which is substantively below the nominal value of 0.95.The coverage probability for STC-SI is also too low.This is because only one simulated AB trial, in the AC trial population, is produced where the uncertainty in the model fitted to IPD from the company's trial is not taken into account.One way to improve this coverage probability is to propagate the uncertainty in this model in the same way as in STC-IP, but the resulting hybrid approach would then be STC-IP in a finite population of the same size as the competitor's trial.The other three STC methods, including the conventional approach, have coverage probabilities close to the nominal value.
The poor coverage probability of the unadjusted approach is also due to its shorter average confidence interval length (Table 2).All other methods perform population adjustment, and there is some loss of precision when performing this type of adjustment, so narrower unadjusted confidence intervals were expected.The poor coverage probability of STC-SI appears to be entirely due to its shorter average confidence interval length than the other STC methods, again as expected because this method does not consider parameter uncertainty.All the other three STC methods perform well, and the standard STC method results in the shortest average confidence interval length.Hence this very simple and standard approach is adequate in this scenario, as might be expected because the marginal and conditional treatment effects are so similar, suggesting that the non-linearity of the link function g is not a serious issue.
In Figure 2, we show boxplots describing the distributions of the widths of the confidence interval for all five methods across all 1000 simulations, which reveals additional insights.The STC-MI method occasionally produces very wide 95% confidence intervals, for example, intervals that are wider than 3.Although all methods occasionally produce wider confidence intervals, STC-MI Note: Also reporting the estimated coverage probability and the average length of confidence 95% intervals.
appears to be especially prone to this.On closer inspection of the simulated datasets, this was found to be a consequence of the resamples of resamples required in the bootstrapping: as explained in Section 3.3, we resample to obtain the point estimate, and the bootstrapping procedure involves taking further resamples.In some resamples of resamples, the event rate in one of the study arms was occasionally very low, resulting in quite extreme bootstrap replications and hence a large estimated standard error.Hence we can explain this phenomenon, and this might reasonably be regarded as a disadvantage of this method.
To summarise, all methods have performed as one might expect.In particular, the unadjusted analysis has performed poorly.The STC-SI method performed well in terms of removing bias but less well in terms of its coverage probability.All the other three STC methods performed well, including the standard approach.However, these results may not generalise to other settings, for example, in a situation where the marginal and conditional relative effects are more different.If this is the case then we expect to find more apparent differences in results between the standard STC method and the simulation-based STC methods.A more extensive simulation study investigating all STC methods, and also other methods for population adjustment, in different scenario settings would be helpful as a complete guide for applying population adjustment methods.

| DISCUSSION
As presented in this article, there are at least four different ways to conduct STC, all with their strengths and drawbacks.The commonly used 'plug-in means' approach is convenient for calculating estimates without simulation, despite the fact that the STC method's concept originated with simulations.In this article, we hope to put the 'S' for simulation back into the STC method for two main reasons: The first reason is to avoid the bias that can be caused by a non-linear link function, the second reason is that simulation-based STC methods clearly target the marginal estimands.There are still, however, a number of limitations and imperfections in the methods in their current form and a lack of consensus in their use.
In practice, one of the most challenging aspects of performing STC, and any other statistical method for population adjustment, is choosing appropriate covariates to adjust for.In our simulated numerical example, we know that age is an effect modifier, whereas sex is prognostic, but this type of information will not be known with certainty in real applications.Clinical input from experts, who understand the application area, will often be crucial to determine which covariates should be adjusted for.Statistical criteria may also be used for covariate selection, for example, from descriptive statistics and statistical modelling using individual patient data from the company's trial.Published aggregate-level information, such as the subgroup analysis results, may also be valuable.In practice, covariate selection will likely depend on both clinical input and statistical results.Despite this, uncertainty concerning which covariates to include will be almost inevitable.Sensitivity analyses, that explore the implications of using different covariates, will therefore, often be necessary in practice.Further complications may arise due to covariate information not being available or due to the desire to adjust for more variables than the available data allows.
Another challenge for using STC is that there are different ways of applying it in practice.The size of the trial to simulate is an open question.The STC-SI method follows the suggestion from Ishak et al. 11 to maintain the size of the AC trial when simulating the individual profiles.However, STC-IP uses an infinite population where we simulate a substantial sample of patients and propagate model uncertainty using bootstrapping.When we simulate the treatment group, 1:1 randomisation is the most straightforward and efficient choice, but if the trials have other randomisation ratios, then this could be used instead to make the simulated trial more realistic.
We have used the 'SimCorrMix' package 18 to simulate correlated variables in R.This package's advantage is that it can generate correlated continuous, binary or ordinary variables.This package generates variables using a normal distribution and then transforms variables to the desired distributions.However the age covariate in our F I G U R E 2 Boxplot of widths of 95% confidence intervals of estimates using the unadjusted method and STC methods in the simulation study.
numerical example and the simulation study is uniformly distributed.In practice, we do not know the true distribution of covariates in the AC trial.Other simulation techniques which overcome this limitation should be considered and we are currently exploring the use of other techniques.One issue is that the function 'corr-var2' requires a random seed as an input (with default = 1234).This is undesirable because we usually only set the seed once, at the beginning, of a stochastic statistical analysis to ensure reproducibility.We resolved this issue by setting different seeds in each call of the 'corrvar2' function.
There are two main ways to calculate the true value of the relative effect of treatment C versus B in the AC trial population, Δ BC AC ð Þ : the marginal relative effect and the conditional relative effect.9][30] It is clear that MAIC methods target marginal relative effects.See Remiro-Az ocar et al., 17 and the response of Phillippo et al., 25 for discussion related to the type of conditional treatment effects that ML-NMR targets.STC methods that involve simulation target marginal treatment effects but it is not clear what type of effect standard 'no simulation method' targets.By putting the simulation back into STC, one consequence of our work is to help resolve this particular concern.
A standard network meta-analysis can incorporate, but not account for, the differences in the trial populations as between-study heterogeneity, however this approach usually requires substantially larger number of studies than two.Alternatively, these discrepancies can also be accounted for by using metaregression, 5 where we can use estimated study means as covariates, but this also requires larger number of studies.Our investigations show that the standard STC method performs well for our example, which was taken from NICE DSU Technical Support Document. 9This is because all STC methods perform similarly in our simulation study.However, this need not be the case more generally.Furthermore, in many applications, it is only by applying all four methods we are able to confirm that they give similar results.We expect the simulation-based STC to perform better than the standard STC method under certain circumstances; for example, when the marginal and the conditional estimands differ more.Future work should explore all four methods in different situations.
Different statistical methodologies have their advantages and disadvantages, for example, MAIC can also deal with both anchored and unanchored pairwise comparison, but it relies on the strong assumption that the between study covariates distributions should overlap.ML-NMR can compare treatments from multiple studies as long as they are connected in the network, but this method cannot currently be used for timeto-event data.We are not advocating STC intrinsically, and acknowledge that there are different ways to perform this type of analysis, but we hope that this article will help to make the methods more accessible to analysts.
To summarise, we have reviewed the standard STC the STC-SI methods and have proposed two new simulation based methods (STC-MI and STC-IP).A familiar numerical example has been used to illustrate the application of the STC methods, for which R code is provided in the supplementary materials to assist the applied analyst.Results have been compared with the unadjusted method.All four STC methods have better performance than the unadjusted method in our explorations because there are important differences in the populations between our trials.The simulation study provides additional information about the properties of the STC methods and provides proof of concept that STC is a fully viable approach.We hope to see STC receive greater attention in both applied and methodological statistical work, and that our article will facilitate this.
tion 4.3.3),and then take the average value of the point estimate of each resample to produce the final estimate of b Δ AB AC 0 ð Þ .As explained in Section 3.3, we use bootstrapping to calculate the corresponding standard error.

T A B L E 1 9 F
Summary of b Δ BC for unadjusted analysis and b Δ BC AC ð Þ for all four STC methods with 95% confidence intervals.The results of the unadjusted method and standard STC are already given in supplementary materials of Phillippo et al.I G U R E 1 Comparison of b Δ BC for unadjusted analysis and b Δ BC AC ð Þ for all four STC methods.
T A B L E 2 Bias and Monte Carlo standard error of b Δ BC for unadjusted analysis and b Δ BC AC ð Þ for all four STC methods in the simulation study.