Bayesian Design of Superiority Trials: Methods and Applications

Abstract In this article, we lay out the basic elements of Bayesian sample size determination (SSD) for the Bayesian design of a two-arm superiority clinical trial. We develop a flowchart of the Bayesian SSD that highlights the critical components of a Bayesian design and provides a practically useful roadmap for designing a Bayesian clinical trial in real world applications. We empirically examine the amount of borrowing, the choice of noninformative priors, and the impact of model misspecification on the Bayesian Type I error and power. A formal and statistically rigorous formulation of conditional borrowing within the decision rule framework is developed. Moreover, by extending the partial borrowing power priors, a new borrowing-by-parts power prior for incorporating historical data is proposed. Computational algorithms are also developed to calculate the Bayesian Type I error and power. Extensive simulation studies are carried out to explore the operating characteristics of the proposed Bayesian design of a superiority trial.


Introduction
Complex Innovative Trial Design (CID) Pilot Meeting Program was initiated by the U.S. Food and Drug Administration (FDA) in 2018 to support the goal of facilitating and advancing the use of complex adaptive, Bayesian, and other novel clinical trial designs (U.S. Food and Drug Administration 2018). The first study ever selected by the FDA for the CID Pilot Meeting Program is the DYSTANCE 51 clinical trial sponsored by Wave Life Sciences, which is a global Phase 2/3, multicenter, randomized, double-blind, placebo-controlled clinical trial that evaluates the efficacy and safety of suvodirsen in ambulatory boys who are between 5 and 12 years of age (inclusive) with a genetically confirmed diagnosis of Duchenne muscular dystrophy (DMD) amenable to exon 51 skipping (Lake et al. 2021). DMD is a rapidly progressive form of muscular dystrophy that occurs primarily in males and manifests prior to the age of six years, affecting approximately 1 in 3600 to 9300 male births worldwide (Mah et al. 2014). From the experience with the CID Pilot Meeting Program shared by Wave Life Sciences, the DYSTANCE 51 clinical trial incorporates the capability to augment the placebo arm with historical data using Bayesian methods (Lake et al. 2021). Natural history studies can be used to support the development of safe and effective drugs and biological products for rare diseases. There has been a joint effort from the DMD communities and worldwide regulatory agencies to access the appropriateness of natural history data to supplement clinical development programs. The suitability in borrowing historical data of DMD has also been supported by studies (Goemans et al. 2020).
The literature on Bayesian sample size determination (SSD) has been growing recently due to recent advances in Bayesian computation and Markov chain Monte Carlo sampling. Joseph, Wolfson, and Berger (1995), Lindley (1997), Rubin and Stern (1998), Katsis and Toman (1999), and Inoue, Berry, and Parmigiani (2005) are the Bayesian SSD articles cited in the FDA guidance for the use of Bayesian statistics in medical device clinical trials (U.S. Food and Drug Administration 2010). The early literature on Bayesian SSD includes Rahme and Joseph (1998), Simon (1999), Wang and Gelfand (2002), Spiegelhalter, Abrams, andMyles (2004), De Santis (2004), De Santis (2007), M'Lan, Joseph, and Wolfson (2006), M'lan, Joseph, and Wolfson (2008), Lee and Liu (2008), and Reyes and Ghosh (2013). Campbell (2011) and Berry et al. (2010) provided a list of Bayesian papers up to 2011. Gamalo-Siebers et al. (2016) gave an excellent review of Bayesian methods for the design and analysis of noninferiority trials. Chen et al. (2011) developed a new Bayesian design methodology with a focus on controlling the Type I error and power for noninferiority trials. Chen et al. (2014) extended the methodology of Chen et al. (2011) to the Bayesian design of superiority clinical trials for recurrent events data. Li et al. (2015) developed the Bayesian design of noninferiority clinical trials with co-primary endpoints and multiple dose comparison, and Li et al. (2018) proposed a Bayesian design via Bayes Factor. Bayesian methods for incorporating historical data include the power prior (Ibrahim and Chen 2000;Ibrahim et al. 2015), the hierarchical prior (Chen et al. 2011), the commensurate prior (Hobbs et al. 2011;Hobbs, Sargent, and Carlin 2012), the meta-analytic-predictive (MAP) prior (Neuenschwander et al. 2010), the robust MAP prior (Schmidli et al. 2014), the covariate-adjusted hierarchical model-based prior (Han et al. 2017), and their respective variations. Recent review papers of these methods include Schmidli et al. (2020), Hall et al. (2021), Ghadessi et al. (2020), andvan Rosmalen et al. (2018).
Motivated by the DYSTANCE 51 clinical trial, we explore different aspects in borrowing historical data within the Bayesian framework. Using the DMD natural history aggregate data, we develop a formal formulation of conditional borrowing of Allocco et al. (2010) via the decision rule. By extending the partial borrowing power priors (Ibrahim et al. 2012;Chen et al. 2014;Ibrahim et al. 2015), we propose a new borrowingby-parts power prior for incorporating historical data. In this article, we also address several critical issues in Bayesian SSD, namely, the amount of borrowing from the historical data, the choice of noninformative priors, and the impact of model misspecification on the Bayesian Type I error and power.
The remaining part of the article is organized as follows. In Section 2, we present the critical elements and necessary steps including the computational algorithm of a Bayesian design of a superiority trial and develop a flowchart of Bayesian SSD. In Section 3, we analytically examine the properties of the Bayesian Type I error and power when the variances are known and empirically investigate the choice of noninformative priors and the impact of model misspecification on the Bayesian Type I error and power. Section 4 presents a comprehensive treatment of leveraging historical data in Bayesian SSD. In this section, we first introduce the DMD natural history aggregate data, then present two commonly used priors, namely, the power prior and the robust mixture prior, to leverage the historical data, and further investigate the impact of the amount of borrowing on the Bayesian Type I error and power. In this very same section, the formal and statistically rigorous formulations of conditional borrowing and the borrowing-by-parts power prior are developed, and extensive simulation studies are conducted to examine the empirical performance of the proposed methodology. We conclude the article with a brief discussion in Section 5.

Bayesian Design of a Superiority Trial
We consider designing a randomized, double-blind, placebocontrolled clinical trial to evaluate the superiority of a drug candidate over placebo with a continuous primary endpoint. Given the large amount of historical data available, we also consider borrowing the historical data to augment the placebo control in the clinical trial design. Let y t = (y t1 , y t2 , . . . , y tn t ) and y c = (y c1 , y c2 , . . . , y cn c ) be the primary endpoint data of test drug and placebo control, with sample sizes n t and n c , respectively.
We assume y t and y c are independent, y ti iid ∼ N(μ t , σ 2 t ), and y ci iid ∼ N(μ c , σ 2 c ). The parameter of interest is the expectation of difference in the effects of the test drug and the placebo control, namely, δ = μ t − μ c . The hypotheses of the superiority trial are H 0 : δ ≤ 0 versus H 1 : δ > 0, equivalently, H 0 : μ t ≤ μ c versus H 1 : μ t > μ c .
Let T(y (n) ) be a test statistic. We define a decision rule based on T(y (n) ) to reject H 0 as where T 0 is a critical value, which only depends on the study design, not on the data y (n) . We define the Bayesian power function as where the expectation is taken with respect to the marginal distribution of y (n) under the sampling prior π (s) (θ ). Based on the approach by Chen et al. (2011), we take where the probability is computed with respect to the posterior distribution given the data y (n) and the fitting prior π (f ) (θ ). Let T 0 = γ be a Bayesian credible level, where 0 < γ < 1. Then the Bayesian power function in (3) reduces to where the indicator function 1{A} takes a value of "1" if A is true and "0" otherwise. The analytical evaluation of (3) or (5) is often not available. The following computational algorithm can be used for β (n) s : Step 0. Set n t , n c , γ , and N (the number of simulated datasets); Step 1. Generate θ ∼ π (s) (θ ); Step 2. Generate y (n) ∼ f (y (n) |θ ); Step 3. Calculate T(y (n) ) = P(δ > 0|y (n) , π (f ) ); Step 4. Check whether T(y (n) ) ≥ γ or not; Step 5. Repeat Steps 1-4 N times; Step 6. Compute the proportion of {T(y (n) ) ≥ γ } in these N runs, which gives an estimate of β (n) s in (3) or (5). The Bayesian Type I error β s0 . For given α 0 > 0 and α 1 > 0, we compute n α 0 = min{n : β The Bayesian sample size is then given by n B = max{n α 0 , n α 1 }. Common choices of α 0 and α 1 are α 0 = 0.025 and α 1 = 0.20. With the Bayesian sample size of n B , the Type I error rate is intended to be less than or equal to 0.025 and the power is intended to be at least 0.80.
We summarize the above Bayesian SSD process in Figure 1. Every Bayesian SSD starts with the trial specification, which primarily consists of the type of trial (superiority, or noninferiority, or equivalence) based on the objective of a study, the number of treatment arms, and the sample size allocation in each arm. The next step of Bayesian SSD shown in Figure 1 includes the specification of a statistical model for the current data, the derivation of the corresponding likelihood, and the mathematical formulation of scientific hypotheses according to the chosen model. According to (1), the next step in the flowchart is to specify a fitting prior and then to derive the posterior. One of the key components in Bayesian SSD is to construct a test statistic T(y (n) ) in (2), which leads to the formulation of a decision rule. The key design quantity, that is, the Bayesian power function β (n) s in (3), can be evaluated either analytically or numerically via a Monte Carlo method, under a given sampling prior π (s) (θ), which eventually leads to the final determination of Bayesian sample size. One additional component in Figure 1 is the historical data when available, which can be incorporated in the fitting priors via the decision rule. The inclusion of the historical data typically leads to a reduction of the Bayesian sample size while at the same time, it may increase the Type I error. The technical details and potential issues in using a noninformative fitting prior or an informative prior by leveraging historical data are discussed in details in the next two sections.

Theoretical Properties of the Bayesian Power Function with Known Variances
In this section, we assume the variances, that are, σ 2 t and σ 2 c are known. Under this assumption, we have θ = (μ c , μ t ) . The null parameter space is 0 = {θ : μ t ≤ μ c } and the alternative parameter space is 1 = {θ : μ t > μ c }. We assume a noninformative uniform prior for the fitting prior, namely, π (f ) (θ ) ∝ 1. Then, the joint distribution of y (n) is given by From (1), the posterior distribution takes the form where is the standard normal N(0, 1) cumulative distribution function (cdf). The decision rule in (2) Taking −1 , which is the inverse function of , from both sides of (6) givesȳ where Z γ = −1 (γ ). Using (7), the critical function in (2) reduces to Assume the sampling prior π (s) (8), the Bayesian power function (5) then becomes Taking π (s) using (9), the Bayesian Type I error is given by It is easy to show that It is easy to see that the maximum Type I error is attained at the boundary δ = μ t − μ c = 0 by specifying the sampling prior π (s) 0 (θ) as a degenerate distribution at δ = 0, denoted by {δ=0} , that is, P(δ = 0) = 1. Furthermore, we take a point mass sampling prior which is a degenerate distribution at δ = δ 1 , where δ 1 > 0.
Using (9) and (11), the Bayesian power is given by We have the following results for the Bayesian Type I error and power.
Result 1. The maximum Bayesian Type I error is 1 − γ and is attained at the boundary of the parameter space corresponding to the null hypothesis.
Result 2. Using the point mass sampling prior in (11), for any choice of δ ∈ 1 , the Bayesian power is equal to the frequentist power given by where 0 < α 0 < 1 is the Type I error.
Suppose we specify the maximum Type I error at α 0 = 0.025, the power at 1 − α 1 = 0.8, and the randomization ratio with n t : n c = 2 : 1. Assume that σ 2 t = σ 2 c = 25, and δ 1 = 3.5. Solving Thus, the required sample sizes are n t = 50 and n c = 25.

Choice of Noninformative Priors and Model Misspecification
Under the unequal variances assumption, θ = (μ c , μ t , σ 2 c , σ 2 t ) , while under the equal variances assumption, θ = (μ c , μ t , σ 2 ) with σ 2 = σ 2 c = σ 2 t . For each case, the fitting prior is specified as π (f ) respectively, where m = 0 corresponds to a uniform prior, m = 1 corresponds to a reference prior, and m = 3/2 corresponds to a Jeffreys's prior. The detailed derivation of the Bayesian power function in (5) under each of these priors is given in Appendix A, supplementary materials. We set n t = 50, n c = 25, μ c = 0, γ = 0.975 and N = 10 6 in the computational algorithm in Section 2 for calculating β (n) s in the following calculations. The Bayesian Type I errors and powers with three noninformative fitting priors are given in Tables 1 and 2 under the models assuming unequal and equal variances, respectively. Several interesting observations are seen from these two tables. First, we see from the right block labeled with "Assuming Equal Variances" of Table 1 that (a) when σ 2 c > σ 2 t (or σ 2 c < σ 2 t ), the Type I errors are 0.0281, 0.0325, and 0.0275 (or 0.0195, 0.0164, and 0.0199) under the uniform prior; (b) when σ 2 c = σ 2 t , the Type I errors are 0.0236, 0.0235, and 0.0236, respectively, under the uniform prior; and (c) the similar results are obtained under the reference prior and the Jeffreys's prior, respectively. Thus, when the fitted model is misspecified, the Type I errors are greater than 0.025 when one arm ("control") has a smaller sample size coupled with a larger variance than another arm ("test"). On the contrary, the Type I errors are less than 0.025 when one arm ("test") has a larger sample size coupled with a larger variance than another arm ("control"). Second, we see from the left block labeled with "Assuming Unequal Variances" of Table 1 that the Type I errors are smaller than, slightly smaller than, and almost at 0.025 under the uniform prior, the reference prior, and the Jeffreys's prior, respectively, when we fit the model with unequal variances while the true model has equal variances. For example, the Type I errors are 0.0195, 0.0235, and 0.0257 under these three priors, respectively, when we fit the model with unequal variances while the true model has σ 2 c = σ 2 t = 20.25. With noninformative priors, a decreased Type I error typically corresponds to a lower power, while an increased Type I  error is associated with a higher power. Tables 1 and 2 exactly show these patterns. For example, the powers are 87.80%, 88.25%, and 88.50% under the uniform prior, the reference prior, and the Jeffreys's prior, respectively, when we fit the model with equal variances while the true model has σ 2 c = 25 and σ 2 t = 16. For σ 2 c = 25 and σ 2 t = 16, the powers are 81.62%, 83.83%, and 84.73%, respectively, under these three priors when we fit the model with unequal variances, which are lower than those when we fit the model with equal variances. Note that in this case, the model assuming equal variances lead to increased Type I errors, as discussed earlier. Thus, when the true variances are unequal, the misspecified model assuming equal variances could lead to a substantial decrease or increase of the Type I error and power, depending on whether σ 2 c < σ 2 t or σ 2 c > σ 2 t . Moreover, the different noninformative priors lead to different Type I errors and powers. From Tables 1 and 2, we see that the Type I errors and powers under the uniform prior and the reference prior are lower than those under the Jeffreys's prior, even when the fitted model is correctly specified. Furthermore, the Type I errors and powers under the Jeffreys's prior are very close to those under the frequentist method. Therefore, the Jeffreys's prior is a more desirable noninformative prior than the other two noninformative priors we consider. Subsequently, the Jeffreys's prior will be used as a default noninformative initial prior in constructing the informative priors.
Using the frequentist method, we use simulation to calculate Type I error and power under the models with equal and unequal variances. We first introduce the following notation: let S c = 1 n c −1 n c i=1 (y ci −ȳ c ) 2 1/2 and S t = 1 n t −1 n t j=1 (y tj −ȳ t ) 2 1/2 denote the sample standard deviations, respectively, for the control and test arms.
Also let se(ȳ t −ȳ c ) = (n c −1)S 2 c +(n t −1)S 2 t n c +n t −2 (1/n c + 1/n t ) and se(ȳ t −ȳ c ) = S 2 c /n c + S 2 t /n t denote the standard errors, respectively, under the models with equal and unequal variances, whereȳ c andȳ t are defined in Section 3.1. Denote the critical value by t 1−α 0 such that F t df (t 1−α 0 ) = 1 − α 0 , where 0 < α 0 < 1, F t df is the cdf of a central-t distribution with degrees of freedom df, which is taken as n t + n c − 2 or (S 2 c /n c +S 2 t /n t ) 2 S 4 c /[n 2 c (n c −1)]+S 4 t /[n 2 t (n t −1)] for the model with equal or unequal variances.
Using the corresponding df and se(ȳ t −ȳ c ) for the model assuming equal variances or the model assuming unequal variances, the simulation algorithm is given as follows: Step 1. Generate random samples y c from N(μ c , σ 2 c ) and y t from N(μ c , σ 2 t ) for the Type I error or N(μ t , σ 2 t ) for the power; Step 2. Calculateȳ c , S c ,ȳ t , S t , and se(ȳ t −ȳ c ) for the corresponding model; Step 3. Calculate test statistic T * =¯y t −ȳ c se(ȳ t −ȳ c ) and its corresponding degrees of freedom df; Compare T * with t 1−α 0 for the corresponding model; Step 4. Repeat Steps 1-3 for 10 7 times. The Type I error or power is the proportion of T * ≥ t 1−α 0 .
For the case of α 0 = 0.025, n t = 50, n c = 25, μ c = 0, σ 2 c = 25, and δ = 3.5, Figure 2(a) shows the Type I error under the model assuming equal variances. When the true σ 2 t is different from σ 2 c , the Type I error is increased in σ 2 c − σ 2 t when σ 2 t < σ 2 c = 25 and is decreased when σ 2 t > σ 2 c = 25 under the wrong model. The Type I errors under the model assuming unequal variances stay at 0.025 since the model is correctly specified. Figure 2

Leveraging Historical Data
Historical data makes its impact on the fitting priors by various ways, including but not limited to full borrowing (Ibrahim and Chen 2000), partial borrowing (Ibrahim et al. 2012;Chen et al. 2014;Ibrahim et al. 2015), dynamic borrowing (Viele et al. 2014;Pan, Yuan, and Xia 2017;Lim et al. 2018), conditional borrowing (Allocco et al. 2010), and propensity score matching (Wang et al. 2019). Suppose that the historical data from a natural history study consists of the sample size of 44, the sample mean of −0.18, and the sample standard deviation of 3.38 for the primary endpoint. We next discuss how to leverage the historical data in Bayesian SSD via informative priors.
We first examine the effect of the amount of borrowing of the historical data on the gain in power within the frequentist SSD framework. For a randomized controlled trial, suppose the starting sample size ratio of the test arm versus the concurrent control arm is 2:1. When the concurrent control is augmented using historical data by 50% or 100% of the concurrent control sample size, the augmented sample size ratios of the test arm versus the concurrent control arm are then 2:1.5 or 2:2, respectively. Set δ = 3.5 and σ 2 t = 25. Table 3 show the frequentist powers for σ 2 c = 11.42, 16, 20.25, and 25; n t = 32, 40, 44, and 52; and n t :n c = 2:1, n t :n c = 2:1.5, and n t :n c = 2:2. We notice that we assume (a) all random samples for the control arm in Table 3 are from the same normal distribution N(μ c , σ 2 c ) and (b) the model with unequal variances is used in all power calculations. From Table 3, we observe that (a) when σ 2 c = 11.42, the powers are 79.8%, 86.5%, and 89.7% for n t :n c = 2 : 1, n t :n c = 2 : 1.5, and n t :n c = 2:2, respectively, and the gains in power are 6.7% and 3.2% when the sample sizes on the control arm increase from 16 to 24 and 24 to 32, respectively; and (b) the results remain similar for the other three values of σ 2 c . Thus, in all of these four cases, the most gain in power is achieved at the first 50% increase in sample size on the control arm, that is, when n t :n c = 2:1 increases to n t :n c = 2:1.5.
We next explore the amount of borrowing within the Bayesian framework. The power prior and the robust mixture prior are considered to leverage the historical data. Let y 0 = (y 01 , y 02 , . . . , y 0n 0 ) be the historical data with sample size of n 0 from the control arm, and assume y 0i iid ∼ N(μ c , σ 2 c ). Letȳ 0 and S 2 0c denote the sample mean and the sample variance of y 0 . Write the historical data as D 0 = (n 0 ,ȳ 0 , S 2 0c ). Denote the likelihood of the historical data by L(μ c , σ 2 c |D 0 ). Under the unequal variances assumption, θ = (μ c , μ t , σ 2 c , σ 2 t ) . The power prior (Ibrahim and Chen 2000) is given by where π (f ) 0 (μ c , σ 2 c ) is an initial prior and 0 ≤ a 0 ≤ 1 is the discounting parameter, which determines the amount of borrowing. The robust mixture prior (Greenhouse and Waserman 1995;Ye et al. 2020) given p 0 is defined by where , which is the power prior in (14) with a 0 = 1 and the weight 0 ≤ p 0 ≤ 1 determines the amount of borrowing. In (15), the skeptical prior when p 0 = 0 is specified as π , which correspond to the normal distribution N(0, 1000σ 2 c ) and the inverse gamma distribution IG(0.001, 0.001). We further assume that (μ c , σ 2 c ) and (μ t , σ 2 t ) are independent a priori. Thus, we have π (f ) Since the historical data are available only for the control arm, π (f ) (μ c , σ 2 c ) is taken as the robust mixture prior given in (15) and a noninformative prior,   0.017, respectively, at μ c = −0.18; and 0.025, 0.030, and 0.038, respectively, at μ c = 0.52. In the above discussion, we consider μ c = −0.51, −0.18 and 0.52, since these values of μ c lead to a negative bias, no bias, and a positive bias, respectively, compared with the historical mean. In Figure 3(a), the maximum values of μ c so that the Type I error is less than 0.025 are 0.27 and 0.15, respectively, for a 0 = 0.18 and 0.36. These results indicate that (a) borrowing the historical data may lead to a gain in power and a reduction in Type I error at the same time and an example for this scenario is that the power increases but the Type I error decreases in a 0 when μ c = −0.51; (b) borrowing the historical data may lead to a gain in power and an increase in Type I error and, for example, both the power and Type I error increase in a 0 when μ c = 0.52; (c) the Type I error can still be less than 0.025 (a prespecified significance level) even when the historical data and the data from the concurrent control arm are not similar, for example, the Type I errors are less than 0.025 when μ c < 0.27 when a 0 = 0.18; and (d) when a 0 = 0, the Type I errors are around 0.025 and the powers are about 0.80, which may be due to the fact that a noninformative prior is specified as the initial prior. Figure 4(a) and (b) show the plots of the Type I error and power using the robust mixture prior in (15) for p 0 = 0, 0.18, 0.36, 0.5, and 1. The Type I errors for p 0 = 0, 0.18, 0.36, 0.5, and 1 are 0.025, 0.022, 0.019, 0.017, and 0.009, respectively, at μ c = −0.51; 0.025, 0.024, 0.022, 0.022, and 0.018, respectively, at μ c = −0.18; and 0.025, 0.031, 0.038, 0.043, and 0.060, respectively, at μ c = 0.52. The powers for p 0 = 0, 0.18, 0.36, 0.5, and 1 are 79.82%, 81.92%, 84.13%, 85.83%, and 91.82%, respectively, at μ c = −0.51; 79.79%, 82.56%, 85.29%, 87.39%, and 95.08%, respectively, at μ c = −0.18; and 79.79%, 83.13%, 86.52%, 89.14% and 98.59%, respectively, at μ c = 0.52. Compared to those using the power prior, (a) the robust mixture prior leads to less gain in power when p 0 = a 0 <= 0.36; (b) the powers using the robust mixture prior with p 0 = 0.5 are lower than those using the power prior with a 0 = 0.36; (c) the Type I errors are closer to 0.025 using the robust mixture prior than the power prior when p 0 = a 0 <= 0.36; and (d) the skeptical prior given in (15) is quite noninformative in the sense that the Type I errors are around 0.025 and the powers remain around 80% for all μ c .

Borrowing-by-Parts Power Priors
Let D = (n t ,ȳ t , S 2 t , n c ,ȳ c , S 2 c ) denote the current data, whereȳ c andȳ t are the sample means and S 2 c and S 2 t are the sample variances, respectively, for the concurrent control and the test arm. In (14), the historical data is borrowed all together via the power prior. A new variation of the power prior is the borrowing-byparts power prior via distinct discounting parameters a 01 and a 02 , given by where 0 ≤ a 01 , a 02 ≤ 1. In (17), the distinct discounting parameters a 01 and a 02 are used in borrowing the mean and the variance, respectively. In the case when the mean of the concurrent control is consistent with that of the historical data, however the variance of the concurrent control is not consistent with that of the historical data, a 01 > 0 and a 02 = 0 can be specified for borrowing the mean part but not borrowing the variance part. Compared with full borrowing, where a 01 = a 02 > 0, borrowing only the mean part where a 01 > 0 and a 02 = 0 allows for achieving a desirable power while controlling the Type I error at the same time.  Table 4 reports the Type I errors and powers using the borrowing-by-parts power prior with consistent and inconsistent variances of the concurrent control compared with that of the historical control for n t = 32, n c = 16, n 0 = 44, y 0 = −0.18, S 2 0c = 11.42, μ c = 0, and Jeffreys's prior. When σ 2 t = σ 2 c = 25, we see from Table 4 that (a) the Type I error decreases and the power increases in a 01 when a 02 = 0; (b) both the Type I error and the power increase in a 02 when a 01 = 0; and (c) the Type I errors are less than 0.025 but the power increases considerably when a 01 = a 02 gets larger. When σ 2 t = 25 and σ 2 c = 11.42, we also see from Table 4 that (a) the Type I errors are controlled at 0.025 and the powers under various values of a 01 and a 02 such that a 01 +a 02 > 0 (borrowing) are greater than those under a 01 = a 02 = 0 (no borrowing); (b) the gain in the power is incremental when a 02 > 0 and a 01 = 0. These results indicate that borrowing the mean part of the historical data only or the whole historical data is more effective in increasing the power than borrowing the variance part of the historical data only. The results with μ c = −1 and μ c = 1 are given in Appendix C, supplementary materials. Figure 6(a) plots the Type I error against a 01 and a 02 and Figure 6(b) plots the Type I error against a 01 = a 02 = a 0 for σ 2 t = σ 2 c = 25. When a 01 = a 02 = a 0 , the Type I error stays at or below 0.025 as a 0 increases. When a 02 = 0, the Type I error first decreases and then increases slightly as a 01 increases. When a 01 = 0, the Type I error increases as a 02 increases.

Discussion
In this article, we develop a roadmap of Bayesian SSD as shown in Figure 1. We analytically explore the properties of the Bayesian Type I error and power with noninformative priors when the variances are known. We also examine the impact of model misspecification and the choice of noninformative priors on the Bayesian Type I error and power. For misspecified models, although it is a common practice that the variances for the test group and the control group are assumed to be equal under the normal distributions for a superiority trial, we empirically show that the Type I error and power can be increased or decreased depending on the relationship of σ 2 c and σ 2 t . The consequences of model misspecification are consistent for both frequentist and Bayesian methods. Also the choice of the priors matters even with noninformative priors.
For a 2:1 randomized controlled trial, we show that the first half in the amount of borrowing leads to more power gain than the second half in the amount of borrowing for both frequentist and Bayesian methods. This would be worth of consideration for both economical and practical point of view. We further demonstrate the risks and benefits of conditional borrowing. The Type I error can be protected by the conditional borrowing, however, the power is lowered at the same time. We note that the conditional borrowing approach can be extended to the case in which multiple historical datasets are available (see the detailed elaboration in Appendix D of the supplementary materials).
We develop borrowing-by-parts power priors for incorporating the historical data in Bayesian SSD. The likelihood function is partitioned into the part of the parameter of primary interest and the part of the nuisance parameter, which are the mean and the variance, respectively, in the normal distribution case. By using separate discounting parameters a 01 and a 02 , the historical data can be borrowed by either the mean part, or the variance part, or both. Although the borrowing-by-parts power priors are developed under the normal models, these priors can also be constructed under more general normal regression models or even more complex joint longitudinal and survival models such as those considered in Zhang et al. (2014Zhang et al. ( , 2017; and Sheikh et al. (2021). As shown in Figure 3 and Table 4, borrowing the historical data can lead to inflation of Type I errors when the concurrent control mean is greater than the sample mean of the historical data in certain magnitude or the variance of the future outcomes in the concurrent control arm quite differs from the sample variance of the historical data. The conditional borrowing approach discussed in Section 4.2 is quite effective in preventing inflation of Type I errors, however, this approach also leads to much smaller gain in the power. Embedding the robust mixture prior and the borrowing-by-parts power priors into the conditional borrowing framework may yield a more promising approach which has a better control of the Type I error and at the same time results in more gain in the power.
Although we assume that the historical data are available from the control arm, the proposed methodology can be extended to a more general case, in which the historical data are available for both the investigated product and control arms as considered in Chen et al. (2014). In this case, the borrowing-byparts power priors may be even more attractive, which allows us to leverage different parts of the historical data within and between the investigated product arm and the control arm. When the historical effect such as the mean of the historical control is different from the mean of the concurrent control, the proposed conditional borrowing approach automatically takes this into consideration by essentially leveraging less amount of the historical data. Recently, the empirical profile approach (Wu, Hui, and Deng 2020) and the scale transformed power prior (Nifong, Psioda, and Ibrahim 2021) may be potentially more effective in dealing with different effects for the history control comparing with the concurrent control. These approaches can be integrated into our proposed borrowing-by-parts power priors and the conditional borrowing framework, which is another useful extension for the future research.

Supplementary Materials
The supplementary materials consist of the derivations in Section 3.2, the derivations in Section 4.3, more cases in Section 4.3, conditional borrowing for multiple historical datasets, and Type I errors and powers with the power prior and the robust mixture prior in Section 4.1. (online_supplement.pdf).