A Comparison of Frequentist and Bayesian Model Based Approaches for Missing Data Analysis: Case Study with a Schizophrenia Clinical Trial

Missing data are common in clinical trials and could lead to biased estimation of treatment effects. The National Research Council (NRC) report suggests that sensitivity analysis on missing data mechanism should be a mandatory component of the primary reporting of findings from clinical trials, and regulatory agencies are requesting more thorough sensitivity analyses from sponsors. However, recent literature research showed that missing data were almost always inadequately handled. This is partially due to the lack of standard software packages and straightforward implementation platform. With recent availability of flexible Bayesian software packages such as WinBUGS, SAS Proc MCMC, and Stan, it is relatively simple to develop Bayesian methods to address complex missing data problems while incorporating the uncertainty. In this article, we present a case study from the DIA Bayesian Scientific Working Group (BSWG) on Bayesian approaches for missing data analysis. We illustrate how to use Bayesian approaches to fit a few commonly used frequentist missing data models. The properties, advantage, and flexibility of the Bayesian analysis methods will be discussed using a case study based on a schizophrenia clinical trial. Supplementary materials for this article are available online.


Introduction
Missing data are inevitable in most longitudinal clinical trials. Extensive research has been devoted to statistical methods on how to handle missing data in clinical trials. Traditionally, single imputation methods such as last observations carried forward and baseline observations carried forward have been used to handle missing data. The single imputation method may provide a conservative estimate for the mean response, but it may not be conservative for estimating and testing the treatment difference. Maximum likelihood (ML) or multiple imputation (MI) based methods are recommended in more recent research papers (e.g., Carpenter and Kenward 2007; EMA guideline 2010; NRC Report 2010). One assumption for ML or MI-based method is that the missing data are missing at random (MAR). This assumption cannot be verified from the observed data. Therefore, sensitivity analysis is suggested for evaluating the robustness of the analysis results against the missing data and MAR assumption. This is recommended in both the European Medicine Agency guideline on missing data in confirmatory trial

C American Statistical Association Statistics in Biopharmaceutical Research
February 2016, Vol. 8, No. 1 DOI: 10.1080/19466315.2015 (EMA 2010) and U.S. Food and Drug Administrationcommissioned panel report, the prevention and treatment of missing data in clinical trials, issued by National Research Council of the National Academies (NRC 2010).
A great amount of research has been devoted in statistical methods for handling missing data in the last two to three decades. Extensive literature and details may be found in comprehensive textbooks (Little and Rubin, 2002;Molenberghs and Kenward, 2007;Daniels and Hogan, 2008). Although there is no universally best method for handling missing data, many methods have been proposed and applied in clinical trials, including ML, MI, and fully Bayesian (FB). However, recent literature research showed that missing data were almost always inadequately handled (Sterne et al. 2009). This is partially due to the lack of standard software packages and straightforward implementation platform.
The Bayesian method provides a natural approach to modeling the uncertainty of missing data in longitudinal trials and have been considered in statistical literature (Best et al. 1996;Carpenter, Pocock, and Lamm 2002;Daniels and Hogan 2008;Hogan, Daniels, and Hu 2014;Ibrahim et al. 2005). In fact, there is a close connection between Bayesian methods and other popular missing data methods such as ML and MI (Chen and Ibrahim 2013). With a large sample size, the implementation of the Bayesian method with noninformative priors on all parameters will lead to ML estimates. The imputation step in MI is based on sampling from a posterior predictive distribution. Therefore, Bayesian methods are generally considered as more powerful for dealing with various missing data problem including missing data in both covariates and response (Ibrahim et al. 2005). However, the application of Bayesian methods in real clinical trials is still not common due to lack of computational software and regulatory considerations. With significant advancement in computation and statistical software, the Bayesian method provides a feasible alternative approach for analysis of longitudinal clinical trials. To investigate the potential application of Bayesian methods with missing data, a subteam was formed within the Drug Information Association Bayesian Scientific Working Group (DIA BSWG) to tackle the problem. Considering the wide variety of methods were readily available for missing data analysis (e.g., Little and Rubin 2002;Daniels and Hogan 2008), the subteam decided to work on several case studies with real datasets from clinical trials to investigate and demonstrate the application of Bayesian approaches. In this article, we illustrate Bayesian approaches along with commonly used frequentist methods for handling missing data for a continuous endpoint measured in a real schizophrenia clinical trial. The properties of the analysis methods will be discussed using this case study dataset. Some descriptions about the case study data and prespecified analysis methods are given in Section 2. Section 3 provides common frequentist sensitivity analysis models including selection model, shared parameter model, and pattern mixture model. In Section 4, we demonstrate how to use Bayesian methods to fit the similar sensitivity analysis models with some additional flexibility to modify the model specification. Some discussions are provided in Section 5.

Study Design and Missing Data
The case study data are from a multicenter, randomized, double-blind clinical trial on patients with schizophrenia. The study had three treatment groups: test, active, and placebo with a 2:1:2 randomization ratio. A total of about 200 patients were enrolled from four eastern European countries. The primary endpoint for efficacy was the mean change from baseline in the Positive and Negative Syndrome Scale (PANSS) total score. The PANSS was measured at baseline (end of placebo lead-in period), Day 4, and Weeks 1, 2, 3, and 4. The primary time point for treatment comparison was at Week 4.
In the study, 19%-33% of the patients dropped out from the study depending on the treatment group prior to Week 4. The majority of the patients dropped out due to lack of efficacy, especially in the test drug and placebo groups. A Kaplan-Meier curve for time to discontinuation is given in Figure 1. A summary of patient dropout by reason is provided in Table 1.
To make the dataset easier for fitting different models using available programs, we delete four patients with intermittent missing data (Patient IDs: 21,65,82,and 180). The remaining missing data are monotone as a result of drop-out. This simplified dataset will be used for all analyses in this article. Graphical summaries of mean profiles by time of dropout for active and placebo groups are given in Figure 2; and for test drug and placebo groups in Figure 3. The numbers of dropout patients at each time point are included in the labeling in the plots. The mean estimates in the plots are based on observed data at each given time point. In general, we can see that for all dropouts, the mean change from baseline values had a worsening trend, which is consistent with the majority of dropouts being due to lack of efficacy as shown in Table 1.
In real applications, several methods may be considered to handle the intermittent missing data. One of the proposed approaches in the literature (e.g., Carpenter, Roger, and Kenward 2013) is to do multiple imputation for the intermittent missing data under MAR assumption and then apply missing data analysis methods (e.g., selection model) and combine the results using the multiple imputation approach. With FB approach, the intermittent missing data can be incorporated in the posterior sampling automatically in some recent software such as Win-BUGS (Lunn et al. 2009), Stan (2012, and Proc MCMC in SAS/Stat 13.2 (2014).

Notations
With general notations, we assume that there are N patients in a clinical trial and Y i = (Y i1 , . . . , Y in ) is a vector of n repeated measurements to be collected for patient i, and X i = (X i1 , . . . , X i p ) is a (p × 1) vector of covariates. Let Y = {Y 1 , . . . , Y N ) and X = {X 1 , . . . , X N }. Likelihood-based method assumes a model for the distribution of Y given X with a collection of unknown parameters γ , f (Y|X, γ ). Assuming patients are  independent, then the full data (i.e., no missing data) likelihood function is given below, When there are missing data, let R i j be the missingdata indicator for Y i j , with value of 1 if Y i j is observed and 0 if Y i j is missing, and R i = (R i1 , . . . , R in ) .The vector Y o i denotes the set of observed values for patient i, and Y m i the set of missing values.
With incomplete data, we need to specify a joint distribution of Y and R with density f (R, Y|X, θ) and parameters θ = (γ, φ); γ contains the parameters for distribution of the full data response Y, and φ contains the parameters for the missing data mechanism. The complete-data likelihood would be

The Original Analysis
The primary analysis as specified in the protocol was based on mixed model for repeated measure (MMRM) (Mallinckrodt et al. 2008). The model included factors for baseline PANSS total score, treatment, week, country, treatment by week and baseline by week interactions, where week, country, and treatment are treated as categorical variables. An unstructured covariance matrix was used for modeling the intra-subject correlation over the repeated measures. To further simplify the analyses, we removed country from our analysis model as it was not a significant factor in the analysis. Specifically, in the MMRM model we assumed that the longitudinal data Y i follow a multivariate normal distribution with unstructured covariance structure , that is, All data from three treatment groups were included in the original analysis model specified in the protocol. To make the analysis results more compatible with other sensitivity analysis models presented in the following sections, we analyzed the data in two separate models: one for active and placebo, and another for test drug and placebo. Specifically, the MMRM for test drug and placebo has the mean The analysis results based on these separate MMRM models are given in the first column of Table 2. The results from the original MMRM model with all three treatment groups (results shown in Table 4) were very similar to that from these separate MMRM models except for some slight increase in variance of the mean estimate on the contrast between active and placebo group. The conclusions are that the tested drug showed no evidence of activity. The point estimate for active control showed some activity but the effect was not statistically significant.
One assumption for the MMRM method is that the missing data are MAR. This assumption cannot be verified from the observed data. Therefore, sensitivity analysis is suggested for evaluating the robustness of the analysis results against the missing data and MAR assumption. It will be interesting to see whether different models/approaches would generate "similar" results.

Some Frequentist Approaches for Sensitivity Analysis
We first applied a few conventional methods to handle missing data and analyze the case study. The methods are frequentist based and include complete case analysis, selection models, shared parameter model, and pattern mixture model. Some brief descriptions of these methods are given in this section. More details can be found in the recent paper by DIA missing data working group (Mallinckrodt et al. 2013), and references therein.

Complete-Case Analysis
With the complete-case analysis, the observations for patients who dropped out are ignored. This model is valid under that the missing data are missing completely at random. The analysis consists of 147 completers. The results from the separate MMRM models for the completers are shown in the second column of Table 2. It is clear that the results are quite different from that of the separate MMRM analysis with all available data. Especially for the test drug, the completers had larger treatment effects compared with that from the separate MMRM analysis with all available data. This implied that the missing data may not be missing completely at random. In fact, majority of the dropouts were discontinued due to lack of efficacy. Therefore, the probability of dropout is likely depending on the previously observed outcomes and/or potentially the missing outcomes. Figures 2 and 3 plot the mean change from baseline by treatment group and missing data pattern as defined by time of dropout. It can be seen that for those who dropped out early the observed mean change values at time of dropout are always larger (worse) than the previous mean changes, and are larger than 0 (worse than baseline) for most of the cases. For the completers, the test drug showed no treatment difference except for some small difference at last time point. Therefore, the analysis for completers had a positive treatment effect at last time point. In the MMRM based on all available data, the test drug had a smaller treatment effect because for those who dropped out early, the test drug showed the negative effect (worse than placebo) as seen in Figure 3.

Selection Models
Selection models specify the joint distribution of R i and Y i through models for the marginal distribution of Y i and the conditional distribution of R i given Y i , where γ and φ are distinct parameters for the full data response and the missing data mechanism, respectively. We consider the parametric selection model as specified in Diggle and Kenward (1994), in which the first part of the likelihood f Y (Y i |X i , γ ), is the same as in the MMRM model. For this case study, we used the SAS macro developed by DIA missing data working group (see http://missingdata.lshtm.ac.uk). We performed the analyses separately for test drug versus placebo and for active versus placebo because the macro can only fit data with two treatment groups at a time. Specifically, for the analysis on drug versus placebo the mean response in the first part of the likelihood is modeled as in the separate MMRM model, For the second part of the likelihood, f R|Y (R i |Y i , X i , φ), the following logistic model is used for the missing data mechanism.
where D i is the indicator for the drug group, q i j is the probability of patient i dropout at time j, v i is equal to n for completers, and is equal to the first visit number with a missing response for patients who dropped out from the study. When φ 5 = φ 6 = 0, this model implies missing data only depends on previous observed data (i.e., MAR). When both models are specified with parametric distributions, all the parameters including φ 5 and φ 6 can typically be identified (Diggle and Kenward 1994). Using the DIA macro, the results for fitting these selection models are given in the third column of Table 2. Compared to the analysis based on MMRM, the selection model result for test versus placebo is similar but that for active versus placebo is somewhat different although the conclusion is similar. This difference may be resulted from the logistic model specification for missing data. We will discuss more details in Section 4.3.

Shared Parameter Models
With shared parameter model, the outcome process Y i and the dropout process R i depend on a shared latent variable U i . Given U i , R i , and Y i are assumed to be independent: To fit this model, parametric forms are assumed for the response and dropout models. For example, linear and quadratic functions are considered in the DIA macro (see http://missingdata.lshtm.ac.uk). As in the selection model, the DIA macro can only fit data with two treatment groups at a time. Here, we consider a quadratic model to allow a potential nonlinear response profile for this case study. Specifically, the model between drug and placebo is: where w j is the time in weeks for jth visit, (U i0 , U i1 , U i2 ) ∼ N (0, U ) are shared random effects, ε i j ∼ N (0, σ 2 ). For this case study, w j = 0.57, 1, 2, 3, and 4 for visit 1 through 5, respectively. The results from fitting these quadratic shared random effects models are provided in the fourth column of Table 2. Compared to the analysis based on MMRM, the shared parameter model results are somewhat different for both active versus placebo and test versus placebo comparisons although the conclusions are similar. The difference may be due to the quadratic functional form used for the response and shared random effects models. Note that in the MMRM, a cell-mean model was used for the response so there is no restriction posted on the mean parameters over time.

Pattern Mixture Models
Pattern mixture models were first proposed by Little (1993). This approach has more appeal to many scientists as it is transparent on how the observed data and missing data are modeled. For this case study, all patients had observations at Day 4. We define the dropout pattern based on the maximum visit number before a patient dropped out. For example, pattern s = 1 contains patients who dropped out after Week 1, pattern s = 2 contains patients who dropped out after Week 2, etc., and finally pattern s = 5 contains all completers.
To identify the parameters in the model, we consider the following 3 commonly used constraints in building the imputation models for missing data: • Complete case missing value (CCMV), f y j |s = k, y 1 , . . . , y j−1 = f y j |s = 5, y 1 , . . . , y j−1 , k < j, which only uses data from completers to fit imputation models.
• Neighboring case missing value (NCMV), f y j |s = k, y 1 , . . . , y j−1 = f y j |s = j, y 1 , . . . , y j−1 , k < j, which imputes missing data y j using a model built from data of pattern s = j, that is, all patients with y j observed and dropped out after visit j.
• All case missing value (ACMV), f y j |s = k, y 1 , . . . , y j−1 = f y j |s ≥ j, y 1 , . . . , y j−1 , k < j, which uses all available data to fit imputation model. For example, the imputation model for missing data y j is built from data of all patterns s ≥ j, that is, all patients with y j observed regardless of time of drop out. ACMV is equivalent to MAR under monotone missingness (Molenberghs et al. 1998). For potential MNAR data, CCMV and NCMV can provide some sensitivity analysis in the sense that the imputation models are built using completers and neighboring cases only, respectively. In this case study, we apply the ACMV, NCMV, and CCMV approaches. For each approach, the missing data are imputed under the given constraint using regression models with factors of baseline PANSS total score, treatment, and previous observed PANSS change from baseline total scores.
The multiple imputed datasets are analyzed using a regression model for the response at last time point (i.e., week 4) with factors of baseline PANSS total score, dropout pattern, treatment, and pattern by treatment interaction. The effects and differences between treatment groups weighted by sample size across the patterns are obtained (e.g., using OM option in LSESTIMATE statement in SAS PROC Mixed procedure). Because the standard error from the regression model treated the proportion of patients in each pattern of missing data as fixed, we also adjusted the variance using the approach as proposed by Hedeker and Gibbons (2006).
In the case study analysis, the first two dropout patterns are combined because there were only eight patients who dropped at the second visit. The analysis results using the pattern mixture model approach are given in Table 3. The results from ACMV are very close to that from the MMRM analysis, which confirms that ACMV is equivalent to MAR under monotone missingness. In CCMV, because the imputation models for all missing data are built from the completers, the estimated treatment differences from CCMV are between that of MMRM with all data and that of MMRM based on completers. It is noticed that the NCMV result between test drug and placebo was quite different from that of the MMRM or completers. This is because the neighboring case missing value approach may impute higher values (worse) for test drug than placebo. As shown in Figure 2, the imputation model built based on the observed responses from those who dropped out after visit 3 or visit 4, the imputed values for test drug could be much higher (worse) than placebo. The standard error values from the NCMV analyses were also much larger than those from the ACMV or CCMV analyses because the imputation models in NCMV were built from neighboring cases that had much smaller sample sizes.
In all the above analysis models, the variance covariance matrix was assumed to be the same between two treatment groups in the analysis. To assess the sensitivity on this assumption, we also run another PMM analysis with ACMV, where the imputation is done for each treatment group separately. As such, different variance covariance matrix was obtained for each treatment group in the imputation model. The results are somewhat different from the MMRM and are included in the last column of Table 2 for comparisons with the results from the Bayesian approach (to be described in the next section). In fact, these results were similar to that from the MMRM analyses (results not shown) when we used different covariance matrix for each treatment group.

Bayesian Approaches for Sensitivity Analysis
In this section, we provide Bayesian analogues to the frequentist models including MMRM models under MAR, selection models, shared parameter models, and pattern mixture models under MNAR. We also illustrate flexibility of using the Bayesian method in a selection model with different parameterization.
In all Bayesian analyses, the missing data are treated as random parameters and will be sampled from the specified joint distribution of the repeated measures. We used SAS Proc MCMC (SAS/Stat 13.2) or Stan for the analyses in which the missing data will be automatically sampled from the specified distribution. Stan uses a variant of Hamiltonian Monte Carlo. For complex models, Stan usually converges more quickly than software using simple MCMC algorithms such as the random-walk Metropolis (Proc MCMC) and Gibbs sampling (WinBUGS) (Han et al. 2014).

Bayesian MMRM and Complete-Case Analysis
We first used Bayesian approach to compute the posterior distribution of the parameters in the MMRM model and complete-case analysis. The MMRM approach assumes MAR (or MCAR for complete-case analysis). Here, we are using Bayesian approach as a computational tool to get parameter estimates. Hence, we assume vague priors (or noninformative priors) on all the model parameters to confirm that Bayesian approach will produce similar results as the likelihood methods. Specifically, we assume the following priors for the parameters in the MMRM specified in Section 2.3, α j , η j , β j , γ j ∼ N 0, 100 2 , j = 1, 2, . . . , n, where n is the number of visits and I n is the identity matrix of size n.
The posterior summaries are shown in the bottom section of the first two columns in Table 2. For both models, the results are fairly close to those from the corresponding frequentist methods. For a study with moderate sample size, this is as expected because the Bayesian models use noninformative priors so that the posterior distribution is similar to the likelihood function from the frequentist method.

Bayesian Selection, Shared Parameter, and PMM Models
Bayesian approach can also be used to fit the selection and shared parameter models specified in Sections 3.2 and 3.3. For the selection model, we assume vague prior, N (0, 100 2 ), for all regression parameters α j , β j , andγ j in the MMRM model, and ∼ invWishart (I n , n) where n is the number of visits and I n is the identity matrix of size n. For missing data mechanism model, we use φ 1 , φ 2 , ∼ logistic(0, 1) and φ 3 , φ 4 , φ 5 , φ 6 ∼ N (0, 5 2 ). These vague prior distributions are chosen so that they bear min-imal weight to the posterior distributions. For particular situations, some appropriate modifications might be made to these prior specifications to speed up convergence without impacting the resulting posterior distribution. For example, with our case study, we also tried φ 3 , φ 4 , φ 5 , φ 6 ∼ N (0, 0.8 2 ). The posterior sample means and standard deviations for the analysis parameters are largely unchanged (data not shown). To ensure that MCMC sampling converges to the target posterior sample, multiple independent MCM chains may be carried out. The convergence can be effectively examined by visual inspections of trace-plot (see some selected trace plots in the online supplemental materials), but the diagnosis might be aided by statistical measures such as Gelman and Rubin diagnostics (Gelman and Rubin 1992).
The results for the selection and shared parameter models are presented in the third and fourth columns of Table 2. The posterior means (and standard deviation) of the treatment effect of test drug versus placebo and for active drug versus placebo are somewhat different from those of the corresponding frequentist models. For this case study, however, the conclusions are the same with respect to statistical significance for testing the treatment effect between groups.
We also use Bayesian method to fit the PMM with ACMV restriction. Similar to the frequentist PMM analysis, the first two patterns are combined. The details of the model specification are given in Appendix A. We assumed that the pattern indicator follows a multinomial distribution, that is, S ∼ Mult(φ), φ = (φ 1 , φ 2 , φ 3 , φ 4 ) with φ s = P(S = s) for s ∈ {1, 2, . . . , 4} and s φ s = 1. The prior for φ follows a Dirichlet distribution with all parameters set to one. For all standard deviations, we assign a uniform prior with support of (0, 50). For other regression parameters, a normal prior with mean of zero and standard deviation of 100 is assumed. Separate models are specified for different treatment groups. The results are shown in the last column of Table 2 and are very similar to the corresponding PMM ACMV analysis based on the frequentist multiple imputation approach (where the imputation is done for each treatment group separately).

Bayesian Selection Model With a Modified Dropout Probability
In the selection model described in Section 3.2, the parameters for the probability of dropout do not depend on visit j but depend on the treatment group. This may pose some restriction on the parameters because the proportion of patients who dropped out can be different by visit. It is also noticed that there were only eight patients dropped out from the active group. This may produce unstable estimates for the six parameters in the missing data probability for active group in the selection model as specified in Section 3.2. In a double-blinded clinical trial, we can expect that the dropout probability may mostly depend on efficacy outcome (observed or unobserved). Conditioning on efficacy outcome, the dropout probability may be less dependent on treatment groups (i.e., dropout may not depend on treatment after adjusting for efficacy outcomes as the treatment group is blinded during the study). Therefore, we modify the dropout probability model as follows.
In this model, the different probabilities of dropout by visit may be captured by intercept parameters ζ j , and the parameters γ k will estimate the dependence of dropout on the efficacy outcome at k-visit before the time of dropout. When γ 0 = 0, the model becomes MAR. Depending on the level of dependence of the dropout probability and the previous outcomes, we may assume some of the high level γ k = 0. For example, assuming γ k = 0 for k >1 corresponds to the dropout probability not depending on the previous outcomes except for the one just prior to the time of dropout. In this one-level dependence model, there are only six parameters in the missing data model, which may help to estimate the parameters from the total of 53 patients who dropped out early (see Figures 2 and 3 for the number of dropout patients by visit). We use SAS PROC MCMC to fit this model and assume vague priors as above for location and covariance parameters (see, e.g., Chen 2013). The MCMC process converges well for these models (some selected trace and posterior density plots are provided in the online supplemental materials). The results from this Bayesian selection model are summarized in Table 4. The results from the primary analysis MMRM model are also included for easy comparisons.
From these analysis models, we can see a few interesting results:

Discussions
Although Bayesian methods for missing data exist in the statistical literature for quite a while, their applications in real clinical trials are still not common. This is partially due to the software to fit the Bayesian models that are not available in commonly used packages such as SAS, which is one of the most widely used statistical analysis packages that are considered as validated and accepted by regulatory agencies. Recent advancements in computational algorithms and Bayesian analysis tools such as MCMC with missing data handling become available in new versions of SAS software (SAS/STAT 12.3 or later). This brings opportunities for pharmaceutical statisticians to start using Bayesian methods in analyzing data from clinical trials with missing data.
In this article, we use a real clinical trial as a case study to explore the applications of Bayesian methods in analysis of continuous repeated measures obtained in a longitudinal trial with missing data, specifically in the area of different sensitivity analysis under missing not at random scenarios. We have considered some common sensitivity analysis approaches from frequentist methods, including selection models, shared parameter models, and pattern mixture models. Except for the MMRM models for all available data or complete-case analysis, these models allow for additional specifications for missing data under missing not at random. For each of those methods, we applied a corresponding Bayesian approach for the analysis. In most of the cases, the results from Bayesian methods with vague (or noninformative) prior were similar to those from the corresponding frequentist methods. This is helpful for applied statisticians to consider Bayesian approach with vague (or noninformative) prior as a computational tool to fit complicated statistical models.
Most of the frequentist sensitivity analysis methods require special software to implement. For selection models and shared parameter models, we have used the SAS macros developed by DIA missing data working group (available at http://missingdata.lshtm.ac.uk). For pattern mixture models, we applied the new multiple imputation procedure in SAS 9.4. In general, these available macros or software have some restrictions that may preclude applied statisticians to apply the methods in real clinical trials. For example, the SAS macros developed by DIA missing data working group can only handle two treatment groups. This may produce different results as modeling all data together in a study with more than two treatment groups. The case study illustrated that Bayesian approach can be an alternative to fit these sensitivity analysis models. Under the noninformative (or vague conjugate) prior, the results would be similar to the corresponding frequentist methods. One advantage to use Bayesian approach is that it can be more flexible for users to implement different models by using Markov chain Monte Carlo and obtain the posterior distribution for the parameters of interest. Some of the sample analysis codes are made available in the online supplemental materials.
For this case study, the conclusions from different methods were consistent although the estimated treatment effects could be different. The consistent conclusions demonstrate some robustness of the analysis results for this study. However, we should notice some limitations of this case study dataset. Because of relatively small sample sizes, both test drug and active groups showed insignificant results compared to placebo across the methods. Therefore, we assess the sensitivity of the analysis results from checking the differences of the estimated treatment effect and corresponding standard error. Compared to the MMRM analysis, the selection model, shared parameter model, and pattern mixture model with ACMV all produced somewhat different results. Each of the sensitivity analysis models made some specific assumptions on the missing data. Based on the plots of mean response by treatment group and missing data pattern (i.e., time of dropout), we would anticipate that MCAR is unlikely and a linear or quadratic functional form may not fit the profile well. Hence, we are less concerned about the different results from the completers or shared parameter models. Among the selection models, the model with different parameters for treatment groups but no difference across visit may pose some unrealistic restriction. The small number of dropout patients in some visits may also cause excessive variation for the estimates from NCMV. To overcome some of these concerns, we considered a selection model with modified parameterization which allows different parameters over time but the same across treatment groups. The estimated treatment effect and its standard error from this modified selection model were similar to that of the original planned mixed model analyses. The analysis indicated that a worsening of response might have led to a higher probability of dropout, which is consistent with the observation that the lack of efficacy was the main cause for discontinuation in this trial.
Although software for Bayesian methods allows more flexibility for model specification, it is always important to understand the assumptions and interpretation for each parameterization. The convergence of MCMC algorithm is another important aspect that needs attention since models like selection models may involve many parameters that can cause slow mixture in the MCMC process. It is essential to check convergence using trace plots and/or some diagnostic measures.
In this article, we only consider the Bayesian application on a few common sensitivity analysis approaches using the case study dataset. There are other sensitivity analyses methods proposed in the recent publications, including control-based imputation models and tipping point analysis. Readers may refer to recent papers by Carpenter and Kenward (2007), Carpenter, Roger, and Kenward (2013), Ratitch, O'Kelly, and Tosiello (2013), Mallinckrodt et al. (2013), and Teshome et al. (2014). Their corresponding Bayesian approaches could be topics for future research. In the analysis for this case study dataset, we mostly used noninformation or vague prior for all model parameters. Further discussions of sensitivity analysis and using informative priors in Bayesian models are out of scope for this article and can refer to Daniels and Hogan (2008) and missing data handbook (2014).

Appendix: Bayesian Model Specification for Pattern Mixture Model Under ACMV
For the case study, the first two patterns are combined due to very few patients in first pattern so that we only have four patterns. The first pattern contains patients who dropped out at visit 2 or 3, second and third patterns contain patients who dropped out at visit 4 or 5, respectively; and pattern 4 are completers. For each pattern, there will be nonidentifiable elements (i.e., model parameters for missing observations). The nonidentified components are marked by * . The complete model is specified as follows for each treatment: Y 1 |S = k ∼ N μ (k) + γ 1 bl, σ (k) , k = 1, . . . , 4 * Y 2 |Y 1 , S = 1 ∼ N α 1 Y 1 + γ 2 bl, τ (1) 2