Sample size reestimation and Bayesian predictive probability for single-arm clinical trials with a time-to-event endpoint using Weibull distribution with unknown shape parameter

ABSTRACT This manuscript consists of two topics. Firstly, we explore the utility of internal pilot study (IPS) approach for reestimating sample size at an interim stage when a reliable estimate of the nuisance shape parameter of the Weibull distribution for modeling survival data is unavailable during the planning phase of a study. Although IPS approach can help rescue the study power, it is noted that the adjusted sample size can be as much as twice the initially planned sample size, which may put substantial practical constraints to continue the study. Secondly, we discuss Bayesian predictive probability for conducting interim analyses to obtain preliminary evidence of efficacy or futility of an experimental treatment warranting early termination of a clinical trial. In the context of single-arm clinical trials with time-to-event endpoints following Weibull distribution, we present the calculation of the Bayesian predictive probability when the shape parameter of the Weibull distribution is unknown. Based on the data accumulated at the interim, we propose two approaches which rely on the posterior mode or the entire posterior distribution of the shape parameter. To account for uncertainty in the shape parameter, it is recommended to incorporate its entire posterior distribution in our calculation.


Introduction
Single-arm clinical trials are often carried out in the early phases of oncology drug development to evaluate safety and to obtain preliminary evidence of therapeutic effect of new cancer treatments (Evans 2010;Rubinstein 2014).In such trials, tumor response rate (TRR) or objective response rate (ORR) has popularly been used as the primary endpoint to identify any potential of biological drug activity that is assessed in terms of tumor shrinkage (Rubinstein 2014;U.S. FDA 2018).As noted by Rubinstein (2014), many phase II clinical trials are now being designed to assess the promise of molecularly targeted agents which may not necessarily improve TRR or ORR, but instead yield an improvement in other time-to-event (TTE) endpoints such as progression-free survival (PFS) or overall survival (OS).This manuscript deals with some unaddressed planning aspects concerning single-arm phase II clinical trials with TTE endpoints.
A limited number of options based on the log-rank test and its weighted versions are available in the literature for designing single-arm clinical trials with TTE endpoints.Some of the existing approaches include the ones proposed by Finkelstein et al. (2003), Sun et al. (2011), Kwak and Jung (2014), Wu (2015), and Phadnis (2019).Among these approaches, the method proposed by Phadnis (2019) is appropriate when the subject survival times are assumed to follow the Weibull distribution, and it can be used for calculating the required sample size while adjusting for administrative censoring along with an ad-hoc inflation for random loss to follow-up.Most recently, Waleed et al. (2021) proposed a parametric maximum likelihood estimate (MLE) test, based on the asymptotic approximation of the scale parameter of the Weibull distribution, whose variance component can account for the expected loss to follow-up rate and different accrual patterns (early, late, or uniform accrual).
It is worth mentioning that both methods (Phadnis 2019;Waleed et al. 2021) assume that a reliable estimate of the shape parameter of the Weibull distribution is known from historical studies.
When reliable estimates of any nuisance parameters, such as the shape parameter of the Weibull distribution for modeling survival data, are unavailable during the planning phase of a study, adaptations to the sample size can be incorporated using the estimates of nuisance parameters obtained using the data accumulated at an interim stage (Friede and Kieser 2006;U.S. FDA 2019;Wittes and Brittain 1990).Besides other advantages, such adaptive features in the study design enhance statistical efficiency of a clinical trial (U.S. FDA 2019).In this manuscript, we aim to build upon the framework developed by Waleed et al. (2021) by considering the scenario when adequate historical data is not available to obtain a reasonably accurate estimate of the shape parameter.
Due to ethical and practical considerations, single-arm oncology trials are often conducted via Simon's two-stage approach which allows researchers to obtain early evidence of futility of an experimental treatment, and consequently terminating the study in consultation with the Data Safety Monitoring Board (DSMB) overseeing the clinical trial (Jennison and Turnbull 2000;Kunz and Kieser 2012;Simon 1989).Alternatively, stochastic curtailment (SC) methods can be employed to decide whether to continue or 'curtail' sampling beyond an interim analysis based on the likelihood of a positive or negative outcome if the trial were to continue to its pre-planned end (Dmitrienko and Koch 2017;Jennison and Turnbull 2000;Kunz and Kieser 2012).Conditional power (Andersen 1987;Lan et al. 1982), predictive power (Spiegelhalter et al. 1986), and Bayesian predictive probability (Dmitrienko and Wang 2006;Geisser 1992;Herson 1979) are the most popularly used SC methods.These methods have been well studied for normal and binary endpoints, and implemented in various statistical software including R (2017) and SAS (2017).Very recently, Waleed et al. (2021) studied these SC methods in the context of single-arm oncology trials with TTE primary endpoints.More specifically, they presented mathematical development of these methods when the parametric Weibull model is appropriate for modeling survival data derived from such studies, and different censoring mechanisms and accrual patterns are under consideration.A limitation of the work by Waleed et al. (2021) is that a reliable estimate of the shape parameter of the Weibull distribution is assumed to be known, which may not hold true such as in the case of studies related to rare diseases.To address this limitation in Waleed et al. (2021), we will discuss the calculation of Bayesian predictive probability when the shape parameter of the Weibull distribution is unknown.
In summary, the objective of this manuscript is two-fold: first, we discuss adaptation to the sample size for single-arm phase II trials with TTE endpoints via implementation of the internal pilot study (IPS) approach proposed by Wittes and Brittain (1990) and, secondly, we present calculation of the Bayesian predictive probability (BPP) for efficacy or futility testing based on the data accumulated at the interim.
This manuscript is organized in the following order.After presenting a brief review of the fixed sample design of Waleed et al. (2021) in Section 3, we discuss sample size reestimation at a prospectively planned interim stage, and calculation of the Bayesian predictive probability in singlearm phase II clinical trials with TTE endpoints following the Weibull distribution with unknown shape parameter.We present some simulation studies and examples to demonstrate the proposed approaches in Section 4. Finally, in Section 5, we present a discussion on the contents presented in this manuscript.

Motivating example
Recently, Phadnis (2019) described a phase II oncology clinical trial to investigate whether the use of novel combination therapies leads to an improvement in the progression-free survival (PFS) among patients suffering from chemotherapy refractory advanced metastatic biliary cholangiocarcinoma, a "rare" but aggressive neoplasm.Such patients have metastatic disease and undergo an initial treatment followed by a second-line treatment which has a PFS rate of 5-10% by 1 year.Based on historical control studies, it is understood that such patients have a median PFS of 2.5 months with an interquartile range (IQR) of around 2-5 months.Researchers believe that a consistent improvement in PFS for all quantiles of the survival curve of the historical controls by a factor of 1.5 warranted further evaluation of new combination therapies in future large sample studies.For design purposes, the Weibull distribution was considered an appropriate choice for analyzing resulting survival data, and its shape parameter was estimated from the historical controls to be 1.25 (increasing hazard) following the method outlined in Wu (2015).Researchers envisioned to conduct a fixed sample study with an accrual time period of 2 years and a follow-up period of 3 years.In addition, the random loss to followup rate was projected to be around 15-20%.Using the above design parameters, the method proposed by Phadnis (2019) yielded a required sample size of 28 subjects when the Type-I error rate and power were assumed to be 5% and 80%, respectively.
Due to practical considerations, it would have been reasonable to design the above study in a manner that permits investigators to conduct interim analysis to obtain an early evidence of efficacy or futility of the new combination therapies.To do so, one may employ popular SC approaches such as conditional power, predictive power, or Bayesian predictive probability.Waleed et al. (2021) studied these three methods when the underlying survival data follows the Weibull distribution with a known shape parameter.When a reliable estimate of the shape parameter is unavailable from historical studies, such as in the case studies related to rare diseases, the uncertainty in the value of the shape parameter should be accommodated while performing any calculations at an interim stage.Since we anticipate to encounter similar studies with different design features (such as accrual patterns, loss to follow-up, etc.) with unknown shape parameter in the future, it is worthwhile to explore the utility of sample size reestimation via an internal pilot study, and calculation of Bayesian predictive probability.For the sake of exposition, we shall use simulated data sets to illustrate the methods presented in this manuscript.

Notation and preliminaries
Suppose that a total of n subjects are accrued during the enrollment period of a single-arm phase II clinical trial with a TTE endpoint.Due to practical constraints, administrative censoring is incorporated at a pre-specified calendar time τ, when all active subjects in the study are censored and the resulting data are analyzed.For the ith subject, suppose E i denotes its calendar time of accrual into the study; Y i denotes the amount of time from E i to the calendar time of event; C i denotes the amount of time from E i to the time of loss to follow-up, and Þ denotes the amount of time to being lost to follow-up or administrative censoring.We assume that the loss to follow-up is unrelated to the event of interest, that is, non-informative of the survival process, and Y i ; Z i ; i ¼ 1; . . .; n f g are independent and identically distributed.In summary, we have n pairs of data is the subject's survival time, and The event time Y i is assumed to follow the Weibull distribution having shape parameter κ and scale parameter θ with the probability density function (pdf) expressed as below: The Weibull distribution is flexible in the sense that it allows us to handle different shapes of the underlying hazard function.More specifically, the hazard function is constant, increasing or decreasing when the shape parameter κ is equal to, greater, or less than 1, respectively (Klein and Moeschberger 2003).In their proposed method, Waleed et al. (2021) assume that a reasonably accurate estimate of the shape parameter κ of the Weibull κ; θ ð Þ distribution is known from some historical studies.The random loss to follow-up time C i also follows the Weibull distribution having the same shape parameter κ and scale parameter η.To accommodate anticipated loss to follow-up rate υ, it can be conveniently verified, following the method in Wan (2017), that η ¼ θ 1À υ υ À � 1=κ ensures the loss to follow-up rate υ.
Suppose that ω represents the maximum calendar time of accrual into the study.The accrual time E i is assumed to follow a rather general form of uniform distribution, with an additional power parameter φ, having the following pdf (at a realized value e of E i ): In addition to incorporating uniform accrual pattern with φ ¼ 1, the above choice of accrual distribution is flexible in the sense that it enables us to incorporate very early (late) accrual patterns by choosing φ that is very small (large) in magnitude.

Fixed sample design
Before discussing sample size reestimation using the IPS approach, we present a brief overview of the fixed sample design, proposed by Waleed et al. (2021), for designing single-arm phase-II clinical trials with TTE endpoints.Since covariates are commonly introduced (Klein and Moeschberger 2003) into the parametric survival models through the scale parameter as θ ¼ exp γ T x � � , where: are the vectors of k þ 1 covariates and the corresponding parameters, respectively, we can define the two alternatives as: H 0 : θ � θ 0 versus H 1 : θ > θ 0 .When no covariates other than the experimental treatment administered to the subjects are introduced into the model, the scale parameter can be expressed as θ ¼ exp γ f g.Thus, our hypotheses can be equivalently expressed as: It is straightforward to verify that the MLE of γ, denoted by b γ, is given as: where Since it appears analytically intractable to obtain the exact distribution of b γ due to the underlying correlation between a subject's survival time and the corresponding survival status, Waleed et al. (2021) relied on asymptotic calculations to construct a parametric MLE-based statistic for testing the hypotheses in Eq. ( 4).Without loss of generality, they have shown that where μ� Under the null hypothesis, the Wald's test statistic is For a given Type-I error rate α, we reject the null hypothesis when the observed test statistic For sample size calculations, researchers specify a clinically meaningful difference ε > 0 that they are interested in detecting under the alternative hypothesis γ 1 ¼ γ 0 þ ε.The required sample size to detect the difference ε using the Wald's test statistic in Eq. ( 7) with Type-I error α and power 1 À β satisfies: and Φ � ð Þ denotes the cumulative distribution function (cdf) of the standard normal distribution.To compute the required sample size, numerical integration can be used to calculate σ 2 1 .

Sample size reestimation
A limitation of the fixed sample design, proposed by Waleed et al. (2021), is that a reliable estimate of the shape parameter of the Weibull distribution is assumed to be known from historical studies.Sometimes due to the unavailability of adequate historical data, this assumption may not hold true for designing studies related to small populations, such as in the case of rare diseases.It has been demonstrated that gross misspecification of the shape parameter can have an adverse effect on the study power (Waleed et al. 2021).
To tackle this limitation, it is worthwhile to consider adaptation to the study sample size.Since the variance of b γ provided in Eq. ( 6) depends on the shape parameter, we discuss the implementation of the internal pilot study (IPS) approach, proposed by Wittes and Brittain (1990), to readjust the sample size at a prospectively planned time point (Friede and Kieser 2006;Jennison and Turnbull 2000;Wittes and Brittain 1990).The IPS approach is carried out by adjusting the desired sample size based on an estimate of the nuisance shape parameter obtained using the data available at the pre-specified interim stage (Wittes and Brittain 1990).More specifically, it requires the following steps in our context: (1) Based on the best available estimate of the shape parameter κ available during the planning phase, we obtain an initial estimate of the required sample size n.
(2) Let p denote the proportion of events, or complete observations (which includes events as well as censored observations) at which we intend to adjust the study sample size.At that time point, we obtain an estimate b κ new of the shape parameter from the accumulated data, and subsequently obtain a new estimate of the desired sample size, say n new , using the variance as an estimator of σ 2 under H 1 .The number of additional subjects to be enrolled, say n add , is: (3) The final analysis is conducted using the data for all N subjects.
It must be noted that the above IPS approach is restricted in the sense that it only permits upward adjustment to the sample size (Friede and Kieser 2006;Wittes and Brittain 1990).In our context, this restriction seems reasonable because the TTE data for all subjects, regardless of their survival status, contributes to the estimation of the shape parameter during an interim analysis (Dmitrienko and Koch 2017).

Bayesian predictive probability
Bayesian predictive probability is a fully Bayesian SC method which can be utilized to calculate the predictive probability of obtaining a positive trial outcome if the clinical trial were to continue to its pre-planned end, conditional on the data accumulated at the interim stage (Dmitrienko and Koch 2017;Dmitrienko and Wang 2006).In our context, suppose that n subjects are enrolled in a single-arm phase II clinical trial designed to test the hypotheses H 0 : γ � γ 0 vs. H 1 : γ ¼ γ 1 , where: γ 1 ¼ γ 0 þ ε, and ε > 0 is a clinical meaningful effect to be detected under H 1 .At the interim stage k, suppose that the survival data corresponding to n À m subjects is fully observed, that is, n À m subjects had already experienced an event or were censored, and the remaining m subjects were still active participants in the study.Let , respectively, denote the vectors of survival times and the corresponding survival status' for the n À m subjects fully observed at the stage k.Similarly, X kÀ κ ¼ X nÀ mþ1 ; . . .; X n ð Þ and D KÀ κ ¼ δ nÀ mþ1 ; . . .; δ n ð Þ denote the vectors of survival times and status' for the active subjects that are to be observed between the stage k and trial end stage k, respectively.Since X KÀ κ and D KÀ κ are not observable at the interim stage k, suppose that XκÀ k and DKÀ k denote the corresponding predicted vectors.When the shape parameter of the Weibull distribution is known, the Bayesian predictive probability of a successful trial outcome is expressed as follows: where: , η � is some pre-specified threshold level of probability of a successful trial outcome, and π γjκ; b γ k À � is the posterior distribution of γ based on the data accumulated at the interim stage k.
When the shape parameter κ is assumed to be known, a fully simulation-based algorithm, outlined by Waleed et al. (2021), can be easily implemented to calculate the Bayesian predictive probability.In practice, Dmitrienko and Wang (2006) recommended setting the threshold level η � between 0.90 and 0.975, and after consultation with DSMB overseeing the clinical trial, it can be terminated to conclude efficacy if P k � ζ for some pre-specified ζ 2 ½0:8; 1�, or to conclude futility if P k � ζ 0 for some ζ 0 2 ½0; 0:2�:

Calculating Bayesian predictive probability when κ is unknown
When a reliable estimate of κ is not available from historical studies, an independent joint prior specification of γ and κ is typically considered (Ibrahim et al. 2001).For this purpose, we assume that the prior distributions for γ,Normal μ 0 ; σ 2 0 À � and κ,Gamma α 0 ; β 0 À � .Suppose that d denotes the total number of observed events, that is, d ¼ P n i¼1 δ i , then the joint posterior distribution of ðγ; κÞ can be expressed as Although it is analytically intractable to obtain a closed form for the joint posterior distribution of ðγ; κÞ, it can be conveniently verified that the conditional posterior distributions π γjκ; Data ð Þ and π κjγ; Data ð Þ are log-concave, and Gibbs sampling can be implemented using statistical software, such as R2OpenBUGS package available in R (Ibrahim et al. 2001;R Core Team 2017).
When a reliable estimate of the shape parameter is not available, we may consider two approaches to calculate Bayesian predictive probability of a successful trial outcome, in Eq. ( 8), using the simulation-based approach in Waleed et al. (2021) The first approach is to update the shape parameter κ used at the design stage with the mode, say κ mode , of the posterior distribution of κ generated at the interim.This updated value of κ is subsequently used to generate a large number of predicted data sets for the active subjects remaining in the study, and then the fully observed and predicted data are used to calculate our desired quantity as defined below: where The second approach to obtain predictive probability of successful trial outcome is to calculate it as a weighted average over the entire posterior distribution π κjb κ k ð Þ as: where P k is defined in Eq. ( 8).

Prior elicitation for the shape parameter
We briefly discuss elicitation of appropriate Gamma shape ¼α 0 ; rate ¼β 0 À � priors for the shape parameter κ.For this purpose, we consider two approaches similar to the ones proposed by Mayo and Gajewski (2004) for eliciting appropriate beta priors in the context of the beta-binomial model.Let F κjα 0 ; β 0 À � denote the cumulative distribution function of the Gamma prior, and therefore Þ denote the required width of the 100 1 À x ð Þth percent probability interval for the gamma prior.We may choose x ¼ 5% or 10% as available from the historical data.Using the best available knowledge about κ, statisticians may consider the following approaches to elicit prior parameters: (1) Mode method: Suppose that the best available estimate of the shape parameter, say κ prior , is considered to be the mode of the prior distribution.For a specified width W 100 1À x ð Þ , the unknown parameter α 0 and β 0 can be obtained by simultaneously solving the following system of equations: (2) Mean method: This approach differs from the first approach in the sense that it assumes κ prior to be the mean of the prior distribution.Therefore, the unknown parameter α 0 and β 0 can be obtained by solving the two simultaneous equations: It must be noted that small values of the specified width W 100 1À x ð Þ yield a more informative prior for the shape parameter.Since closed form solutions are not possible using either approaches, we need to implement numerical methods to obtain the values of α 0 and β 0 .

Simulations and examples
We present some simulation studies and examples to demonstrate the methods discussed in this paper.Statistical software R (Version 3.6.3)was used to perform all computations and simulations presented.Due to their computationally intensive nature, Bayesian predictive probability calculations were done using the high-performance computing (HPC) facilities operated by the Center for Research Computing at the University of Kansas.

Sample size reestimation using the IPS approach
In recent times, PFS and OS are being used as the primary endpoints of interest in early phase oncology trials, such as related to lung cancer, and an improvement in median PFS or OS by a factor ranging from 1.25 to 3 has been reported where the standard control arm had PFS/OS values ranging from 6 to 24 months (see Ajimizu et al. 2021 as an example).Suppose that a fixed sample study is being designed, using the method in Section 3.2, to test the following alternatives about the median survival time H 0 : M � 1 year vs. H 1 : M > 1:5 years with nominal Type-I error rate and power set to be 5% and 90%, respectively.Suppose that the enrollment period will span a total of 3 years, that is, ω ¼ 3 years.For this example, varying values of the administrative censoring time τ ¼ 7; 9 years ð Þ, loss to follow-up rate υ ¼ 0%; 10%; 20%; 30% ð Þ, and power parameter of the accrual distribution φ ¼ 0:1 À early; 1:0 À uniform; 5:0 À late ð Þ were considered.In the absence of a reliably accurate estimate of the shape parameter, suppose that the fixed sample study was originally designed assuming exponential survival times, that is, κ ¼ 1, and the corresponding sample sizes for all scenarios are reported in Table 1.
To examine the performance of the asymptotic test statistic by Waleed et al. (2021) in terms of empirical Type-I error and power, a total of 10,000 simulations were performed after computing the required sample sizes under the assumption of exponential survival times.We note that the Type-I error rate remains preserved even if the shape parameter was misspecified (ie, true value of the shape parameter κ ¼ 0:75 or 1.25) at the design stage.On the other hand, empirical power is significantly affected by the misspecification of the shape parameter.More specifically, the fixed sample study tends to be under-powered (over-powered) if the true κ was in fact smaller (larger) than one.
To address this issue, suppose that the researchers plan to adjust the sample size using the IPS approach of Wittes and Brittain (1990), outlined in Section 3.3, after a proportion of the n enrolled subjects, say p E , have experienced an event.For the sake of demonstration, we assume p E ¼ 30% and 50%, and study the properties of our design in terms of expected sample size, empirical Type-I error rate and power.The corresponding results based on a total of 10,000 simulations are reported in Table 2.We summarize our findings as below: (1) When the desired proportion of events needed for interim analysis is relatively smaller (ie, p E ¼ 30%) and participants enroll very early or very late during the accrual period (ie, φ ¼ 0:1 or 5.0), we also observe that the mean calendar time of interim analysis, denoted by E T ð Þ, increases with an increase in the shape parameter κ.Otherwise, for a fixed combination of φ, υ, τ and p E , we observe that E T ð Þ generally decreases with an increase in κ.As expected, we also note that E T ð Þ corresponding to larger values of υ or p E are generally greater than the relatively smaller values of these parameters.
(2) If the true shape parameter was equal to 0.75 (decreasing hazard), the expected sample size obtained using the IPS approach is almost twice the sample size needed for a fixed study design.Such a significant increase in the expected sample size is likely to put substantial practical constraints to carry out remainder of the study.We note that the expected sample size is also greater than the original fixed sample size n even if the assumption of exponential survival times (κ ¼ 1) holds true.This increase in the expected sample size, however, is much more achievable from a practical perspective.It is worth pointing out that this increase in the expected sample size occurred due to the fact that the estimated value b κ of κ at the interim stage for some of the simulation runs was less than κ ¼ 1, which subsequently led to n new > n and hence an expected sample size greater than n.Finally, we observe a minimal increase in the expected sample size when the true shape parameter was 1.25.As one might anticipate, the expected sample size with p E ¼ 50% is slightly smaller than that for p E ¼ 30%.(3) As expected for the IPS approach (Friede and Kieser 2006;Wittes and Brittain 1990), the empirical Type-I error rate tends to be slightly inflated in almost all of the scenarios.In our context, this inflation in the Type-I error rate is more pronounced when subjects are accrued very late in the enrollment period (that is, φ ¼ 5:0) where it can be as much as twice the desired level in some cases.(4) The desired threshold for the empirical power is achieved with the adjustment of study sample size using the IPS approach.When the true shape parameter was 0.75, the study tends to be over-powered due to very large expected sample sizes.(5) For a majority of scenarios, the empirical Type-I error rate and power corresponding to p E ¼ 50% is slightly larger than that for p E ¼ 30%.This is possibly explained due to the fact that, as we implement the IPS approach after observing more events, we get a smaller but more accurate estimate of the variance which results in slightly greater number of rejections under the null and alternative hypotheses.M > 1:5 years, when sample size reestimation is done using the IPS approach after p E % of the n enrolled subjects experienced the event of interest.

Note:
The fixed sample study was initially designed assuming exponential survival times (κ= 1).In this simulation study, we used nominal Type-I error = 5%, nominal power = 90%, maximum accrual time ω = 3 years, and administrative censoring time τ ¼ 7; 9 (in years), and varying values of the power parameter φ and random loss to follow-up rate υ.Results are based on a total of 10,000 simulations.
We also investigated our design properties when the IPS approach is implemented after a proportion of the n enrolled subjects, say p O , is fully observed.That is, p O includes all those subjects who have either experienced an event or were lost to follow-up.The corresponding results are presented in Table 3.As one might expect, the mean calendar times of interim analysis in this case are relatively smaller than those observed when interim analysis is performed upon achieving a certain proportion of events p E .In addition, we note that the expected sample sizes are slightly larger in this case, but the overall trends in the empirical Type-I error rate and power are virtually similar to those observed using p E .

Calculation of Bayesian predictive probability for hypothetical single-arm trials
Suppose that a fixed sample study is designed to evaluate whether an experimental treatment yields an improvement in the median PFS time (in months).More specifically, suppose that researchers are interested in testing the hypotheses H 0 : M � 2:50 months vs. H 1 : M > 3:75 months with nominal Type-I error rate and power set to be 5% and 80%, respectively.For this hypothetical example, investigators anticipate a non-uniform accrual pattern (φ ¼ 1:25) with maximum enrollment time ω ¼ 3 months, and administrative censoring time τ ¼ 12 months.The expected loss to follow-up rate is υ ¼ 15%.Assuming exponentially distributed survival times, n ¼ 50 subjects are enrolled to conduct this fixed sample study with the given characteristics.
On the recommendation of the DSMB overseeing this study, suppose that an interim analysis based on Bayesian predictive probability is to be conducted 3 months after the conclusion of accrual period (that is, calendar time , ¼ 6 months).Suppose that we use threshold level η � ¼ 95%, and the following decision rules for the Bayesian predictive probability P k : conclude efficacy (futility) if P k � 0:80 (P k � 0:20) or decide to continue the trial if P k 2 ð0:20; 0:80Þ.For the sake of demonstration, we generated a sample of size n ¼ 50 from each of the Weibull distributions with different underlying median parameter (M ¼ 2:00; 2:50; 3:25; 3:75; 4:50) and true shape parameter (κ ¼ 0:50; 1:00; 1:50).For this example, a non-informative normal prior was used for γ, and κ,Gamma 11; 10 ð Þ which has a unit mode.For calculating BPP using the simulation-based approach, a total of 1000 predicted data sets were generated for active subjects at the interim, and the corresponding results are summarized in Table 4.We make the following observations: (1) As anticipated, the number of observed events at the interim analysis, denoted by n e;k , is the largest for the data generated assuming the smallest median survival time (2.00 months), and it decreases as the assumed median survival time increases.The converse holds true for the number of active subjects n a;k .The number of censored subjects, n c;k , exceed the expected loss to follow-up rate υ ¼ 15% because it also includes those who were administratively censored by the interim stage.
(2) In all cases, the data dominates the assumed prior of κ in the sense that the mode of the posterior distribution is pulled towards the true value of κ used for the underlying data distributions.
(3) When the true shape parameter is 0.50 (decreasing hazard), our results suggest that the trial can be stopped for futility for each of the 5 data sets.On the other hand, if the true shape parameter is 1.00 (constant hazard) or 1.50 (increasing hazard), the trial can be (i) terminated to conclude futility (efficacy) for the data generated from distributions with underlying median M � 2:50 (M ¼ 4:50), and (ii) continued to its pre-planned end in the remaining cases.
For the hypothetical example discussed above, suppose that an interim analysis based on Bayesian predictive probability was to be conducted at calendar time , ¼ 9 months (instead of , ¼ 6 months).
The corresponding results are presented in Table S1 of online supplementary materials.As expected, the general trend in the number of observed events (n e;k ), number of active subjects (n a;k ), and number M > 1:5 years, when sample size reestimation is done using the IPS approach after p O % of the n enrolled subjects have either experienced the event of interest or were censored.

Note:
The fixed sample study was initially designed assuming exponential survival times (κ= 1).In this simulation study, we used nominal Type-I error = 5%, nominal power = 90%, maximum accrual time ω = 3 years, and administrative censoring time τ ¼ 7; 9 (in years), and varying values of the power parameter φ and random loss to follow-up rate υ.Results are based on a total of 10,000 simulations.
of censored subjects (n c;k ) is similar to the one observed earlier.In comparison to an interim analysis conducted at , ¼ 6 months, we do observe higher (lower) number of observed events (active subjects) when interim analysis is conducted at , ¼ 9 months.We also observe similar trend in the values of Bayesian predictive probabilities as observed for , ¼ 6 months.
We considered another hypothetical study in which investigators are interested in testing the hypotheses H 0 : M � 1 year vs. H 1 : M > 1:5 years at a 5% level of significance with 90% power.For this example, we assume a uniform accrual pattern (φ ¼ 1:0) during the accrual phase spanning ω ¼ 3 years, and a follow-up period of 9 years (i.e., τ ¼ 12 years).Assuming exponential survival times (κ ¼ 1), the sample size needed for the fixed study design is n ¼ 62 subjects when the expected rate of loss to follow-up is υ ¼ 15%.Suppose that an interim analysis using BPP is requested at calendar time , ¼ 4 years.We generated a sample of size n ¼ 62 subjects from each of the Weibull distributions with different underlying median parameter (M ¼ 0:75; 1:00; 1:25; 1:50; 2:00) and true shape parameter (κ ¼ 0:75; 1:00; 1:25).We consider the same threshold η � and decision rules as used in the previous example.The corresponding results are presented in Table 5 and summarized as below.
(1) Comparing these results to our previous example, we can make similar observations regarding subject status, mode of π κjγ; b κ k ð Þ, and BPP as stated earlier.
(2) Irrespective of the true shape parameter κ, the predictive probability of a positive trial outcome calculated using either approach is close to 1 for the data sets generated with underlying median equal to 2.00 years, which suggests us to conclude efficacy of the experimental treatment.We can draw appropriate conclusions for the rest of scenarios in accordance with the decision rules specified above.
It is worth noting that there are some circumstances in which the two approaches lead us to different conclusions at the interim stage.As an example, consider the data simulated with underlying median of 0.75 years and true shape parameter of 1.25, the BPP calculated using the mode and entire posterior distribution of κ is 0.0406 (conclude futility) and 0.3012 (continue trial), respectively.Since the mode of π κjγ; b κ k ð Þ is 1.3268 suggesting increasing hazard (that is, shorter survival times), it would be reasonable to take a rather conservative approach if ethically permissible, and decide to continue the trial instead of terminating it at the interim.This example demonstrates that it is of vital importance to make an informed decision by considering all the factors before making a final conclusion.
(3) In general, the predictive probability calculated by incorporating the entire conditional posterior distribution π κjγ; b κ k ð Þ tends to be greater than the one computed using its mode.In our reported results in Table 5, we observe some scenarios where the BPP calculated using the latter approach is larger in magnitude.For instance, for the data generated assuming the true κ to be 1.25 and underlying median equal to 1.25 years, the BPP is 0.9243 and 0.8364 for the mode based and full distribution approach, respectively.This discrepancy can be explained by the fact that the BPP corresponding to some κ greater than the mode (1.3282) of π κjγ; b κ k ð Þ was smaller than 0.9243 (BPP corresponding to the posterior mode).As a consequence, a smaller value of the BPP was obtained when a weighted average of the predictive probabilities was computed over the entire posterior distribution of κ.Since these calculations were conducted by simulating only 1000 predicted data sets for the active subjects at the interim, it is recommended to generate a larger number of predicted data sets to minimize the effect of such randomness in our results.Even though the BPP values under the two approaches are different, it must be noted that they qualitatively suggest the same decision of stopping the trial due to a high predictive probability of a successful trial outcome.
We also calculated Bayesian predictive probabilities for an interim analysis conducted at , ¼ 6 years (instead of , ¼ 4 years), and results have been summarized in Table S2 of online supplementary materials.
In both examples, we assumed κ,Gamma 11; 10 ð Þ which has a unit mode, and an equal-tailed 95% probability interval of width 1.2899.We also studied the impact of different priors for κ on our calculations.For this purpose, we consider five different gamma priors with a unit mode, and width of equal-tailed 95% probability interval to be 0.1, 0.5, 1.0, 2.0, and 5.0.These priors are denoted by P i (i ¼ 1; . . .; 5), where P 1 is the most informative gamma prior having the smallest width ( = 0.1) and P 5 is the most non-informative gamma prior having the largest width (= 5.0).The parameters for the Note: For this study, we assume uniform accrual pattern φ ¼ 1:00 ð Þ, maximum accrual time ω ¼ 3 years, administrative censoring time τ ¼ 12 years, and loss to follow-up rate υ ¼ 15%.The fixed sample study was designed assuming exponential survival times, and n ¼ 62 subjects were enrolled during the accrual phase.Bayesian predictive probabilities were calculated based on a total of 1000 simulated data sets.

Discussion
The fixed sample method, proposed by Waleed et al. (2021), for designing single-arm phase II clinical trials with TTE endpoints assumes that a reliable estimate of the shape parameter is known from some historical studies.Recently, Phadnis et al. (2020) demonstrated that a reasonable estimate of the shape parameter can be obtained from historical studies with at least 50 subjects and censoring rate close to 20%.There are reallife situations, such as studies involving rare diseases, where adequate historical data may not be available for obtaining an estimate of the shape parameter.When no prior information about the shape parameter is available, we could design fixed sample studies assuming exponential survival times as done for traditional methods available in the market, and subsequently consider an adjustment to the study sample size using the data accumulated at some pre-specified stage.In this manuscript, we explored the utility of the IPS approach, proposed by Wittes and Brittain (1990), for sample size reestimation at an interim stage.It was demonstrated that the power of the study is indeed rescued in our context.We noted that the adjusted sample size using the IPS approach can be more than twice the initially planned sample size if the shape parameter is grossly misspecified at the design stage, and this may put serious practical constraints to continue with the remaining study.In the future, it would be of interest to compare different sample size reestimation procedures, such as based on the conditional power or Bayesian predictive probability.
Phase II single arm clinical trials with PFS as an endpoint are being conducted with increasing frequency.For instance, PFS is being used an an endpoint in a phase II single arm study conducted to evaluate the safety and efficacy of brentuximab vedotin as consolidation therapy after autologous stem cell transplant (ASCT) in participants with cluster of differentiation antigen 30 (CD30) positive expressing peripheral T-cell lymphomas (PTCL)s.See Hoffmann (2021) for more details about this study.For this trial, the method proposed by Phadnis (2019) was used to calculate the sample size based on the hypothesis that an improvement will be observed in PFS in comparison to the historical control.Future successful execution of similarly designed phase II single arm clinical trials would benefit from futility testing at an interim time point over the course of trial.In this context, stochastic curtailment methods such as conditional power, predictive power, and Bayesian predictive probability would be beneficial for such designs (Jennison and Turnbull 2000).
In this manuscript, we also discussed Bayesian predictive probability as a useful method for conducting interim analysis for single-arm phase II clinical trials with TTE endpoints following Weibull distribution with unknown shape parameter.Based on the data accumulated at the interim stage, we propose to generate posterior distributions for both parameters of the Weibull distribution.The predictive probability of a successful trial outcome can be calculated by either using the posterior mode or the entire posterior distribution of the shape parameter.Although we observed that the BPP calculated using the two approaches tends to differ quantitatively, they yield the same qualitative conclusion at the interim stage in most scenarios.It is worth pointing out that the mode-based approach may not be appropriate in some circumstances, for instance when the posterior of the shape parameter is flatter or it has heavier tails.Therefore, to appropriately account for uncertainty in the shape parameter, it is recommended to incorporate its entire posterior distribution in our calculations.
Bayesian predictive probabilities are advantageous in the sense that they can be interpreted easily, and they can be incorporated in fixed sample designs in a post-hoc manner without an explicit adjustment for repeated significance testing (Dmitrienko and Koch 2017;Dmitrienko and Wang 2006;Saville et al. 2014).Although Bayesian predictive probability can be utilized to evaluate both efficacy or futility at an interim stage, we feel important to mention that early stopping for efficacy in early phase studies should be considered with utmost care.This is due to the fact that phase III studies are often designed based on the estimated effect size from phase II studies, and early stopping for efficacy in a phase II study may not necessarily yield an accurate estimate of the estimated effect size that can be used for a subsequent phase III study design.
In this manuscript, we have used a gamma prior for the shape parameter, as suggested by Ibrahim et al. (2001) in the context of the Weibull model.A statistician should utilize any available historical data, and work closely with the clinicians to identify appropriate priors applicable in their area of research.Interested readers can find a discussion on the choice of suitable weak or aggressive priors in Dmitrienko and Wang (2006).
In comparison to other SC methods such as conditional power and predictive power, a limitation of the Bayesian predictive probabilities is that they require much more computationally intensive calculations due to repeated sampling of the predicted survival data, for the active subjects at the interim, from the posterior predictive distribution (Saville et al. 2014).These calculations are becoming increasingly manageable with the advent of sophisticated high performance computing capabilities, and therefore Bayesian predictive probabilities can be utilized to better inform decision-making at an interim stage.

Table 1 .
Effect of misspecification of the shape parameter κ on the empirical Type-I error and power for the MLE-based test for a fixed sample study, to test the alternatives H 0 : M � 1 year vs. H 1 : M > 1:5 years, which was designed assuming exponential survival times (κ= 1) but the true κ = 0.75 (decreasing hazard) or 1.25 (increasing hazard).
Note: In this simulation study, we used nominal Type-I error = 5%, nominal power = 90%, maximum accrual time ω = 3 years, and administrative censoring time τ ¼ 7; 9 (in years), and varying values of the power parameter φ and random loss to follow-up rate υ.Results are based on a total of 10,000 simulations.

Table 2 .
Properties of a study designed to test H 1 :

Table 3 .
Properties of a study designed to test H 1 :

Table 4 .
Comparison of the Bayesian predictive probability calculated at look time , ¼ 6 months for hypothetical data sets simulated using varying values of the median survival time and true shape parameter κ for a study designed to test the hypotheses H 0 : M � 2:50 months vs. H 1 : M > 3:75 months.maximum accrual time ω ¼ 3 months, administrative censoring time τ ¼ 12 months, and loss to follow-up rate υ ¼ 15%.The fixed sample study was designed assuming exponential survival times, and n ¼ 50 subjects were enrolled during the accrual phase.Bayesian predictive probabilities were calculated based on a total of 1000 simulated data sets.

Table 5 .
Comparison of the Bayesian predictive probability calculated at look time , ¼ 4 years for hypothetical data sets simulated using varying values of the median survival time and true shape parameter κ for a study designed to test the hypotheses H 0 : M � 1:00 year vs. H 1 : M > 1:50 years.