Sample Size Calculation and Timing of Dose Selection in a Multiple-Dose Clinical Trial

ABSTRACT In a two-stage design for multiple-dose clinical trial, a dose that has the highest observed response rate is often chosen to carry to the next stage, then to compare against the control group. The observed response rate could overestimate the true response rate of the selected dose. If it is used to compare with the response rate of the control group without any adjustment, the Type I error of this statistical inference will not be controlled. In this article, the Stepwise Overcorrection method, which was shown to control the Type I error, is used to derive a formula to calculate the sample size needed for a two-stage multiple-dose design. A maximum-minimum strategy is proposed to ensure the sample size for each group will provide the desired statistical power of detecting the difference in response rates between the selected dose and the control group, assuming at least one of the experimental groups has the desired treatment effect versus the control group.


Introduction
Adaptive Phase III clinical trial designs with a dose selection step have been investigated by many people. How to select the right dose and to control Type I error of the comparison between the selected dose and the control group are two major concerns when using this kind of adaptive trials. Whitehead (1985) applied Bayesian methods to the problem of comparison of several potential new treatments with a control group. He provided an optimal strategy to maximize the probability to identify a treatment in comparing to the standard approach that includes separate Phase II and Phase III studies (Whitehead 1986). Dunnett (1984) used decision theory to obtain an optimal sample size for a fixed significance level and power. Thall, Simon, and Ellenberg (1989) further extended Dunnett's approach by allowing early termination of a less promising dose. They proposed a two-stage design in which patients are first randomized among the experimental treatments to identify the treatment with the highest observed success rate. If this highest rate falls below a fixed cutoff, the trial is terminated. Otherwise, the "best" treatment is compared to the control at a second stage. However, the true drug effect of the selected dose might be overestimated by this "pick the winner" approach when there are multiple doses to select from. The highest observed success rate is a biased estimate for the true drug effect of the selected dose. Shen (2001) quantified this bias and derived a method called stepwise over-correction (SOC) to adjust the bias and construct the confidence interval of the difference between the selected dose group and the control group. Shen's two-stage approach selects a dose group by picking the winner and incorporates the information from the dose selection stage into the confirmatory stage for statistical inference in comparing with the control group. The SOC method was shown to have a better statistical power than the Bonferroni method when there are one or more inferior doses.
Besides Shen (2001), many other approaches have been proposed for the adaptive design of seamless Phase II/III study. Two major methods are denoted as combination test of two stage p values by Bauer (1989) and Bauer and Köhne (1994) and conditional error function approach by Proschan and Hunsberger (1995) in an adaptive trial with only one experimental group and one control group. Posch et al. (2005) used the combination test of two stage p values at dose selection stage and confirmatory stage and applied the closed testing procedure to control the Type I error of multiple dose tests against the control group. The closed testing procedure needs to reject all intersection hypotheses that include the selected dose in the null hypothesis. Posch et al. method is flexible how to choose a dose and do not have to use pick-the-winner rule in dose selection.
Although there are many methods as described above to control the Type I error and to give us similar statistical test power, there is no proper sample size formula for a two-stage multiple dose clinical trial in which we only know at the study design stage a range response rate of each dose group and the control group may fall in to. The sample size per group needs to ensure that the statistical test comparing the selected does group and the control group has the desired power, if we assume at least one dose group has the assumed treatment effect over the control group. In Section 2, the stepwise overcorrection method is used to derive a formula to calculate the sample size needed for a two-stage multiple-dose design when we assume that the response rates of the experimental groups and the control group are fixed. In Section 3, we extended the scenarios in which the response rates of each experimental groups vary in a range but at least one experimental group, although we do not know which one, has a desired superior response rate than that of the control group. In addition, dose selection can take place at early, late, or in the middle of the trial. Our goal is to find the smallest sample size that can provide the desired power of the selected dose group in comparison to the control group among these scenarios. A maximum-minimum strategy is proposed to ensure that we will obtain such optimal sample size. The results of Section 2 and 3 will be extended to clinical trials with more than two doses in Section 4. In Section 5, the power of the adaptive Phase II/III trial design described in Section 2 and 3 is compared with the overall power of separate Phase II and Phase III trials method in which the data from Phase II trial cannot be combined into the analysis of Phase III trial. An example of trial design illustrated that under the same total sample size obtained from the sample size formula and maximum-minimum strategy in Section 2 and 3, the power of adaptive Phase II/III by SOC analysis is higher than that of the traditional Phase II then Phase III separate trial design. Discussion and conclusion can be found in the final section.

Sample Size Calculation When the Response Rates Are Fixed at the Potential Values
We start with a clinical trial with two dose groups and one control group to derive the sample size formula by the SOC method. Assume that two doses of a test drug are to be compared to the control group. The true response rates of dose 1, dose 2, and the control are p 1 , p 2 , and p c , respectively. In this section, we assume the response rate for dose 1 and dose 2 could be one out of two potential values but we do not know which dose is corresponding to which value of response rate. Without loss of generality, if we set p 1 >p 2 ≥ p c, we just assume there is one dose is better than the control group and none of doses are worse than the control group. We depend on the observed response rate to decide which dose we believe has a higher response rate. Let H 1 : p 1 = p c and H 2 : p 2 = p c denote the corresponding null hypotheses. The challenge is how to reject either hypothesis using the whole Type I error α without a pre-specified testing order. Shen (2001) proposed a pick-the winner approach to select the dose and provided a formula to estimate the bias of the maximum observed response rate relative to the true response rate of the selected dose. Letp 1 andp 2 be the proportions of successes in dose 1 and dose 2, respectively. We denote S to be a random variable and the index of the selected dose group. Thus, Let p S be the underlying true response rate of the selected dose with mean E p S = P p 1 ≥p 2 p 1 +P p 1 <p 2 p 2 . One of the estimates for p S could bep S = max(p 1 ,p 2 ). However, this is a biased estimator. We estimate var(p i ) byp i (1 −p i )/N i hence assume that we know the true variances and write them as σ 2 1 , σ 2 2 , where N i is the sample size, and i =1 or 2. Thenp 1 ∼ N p 1 , σ 2 1 andp 2 ∼ N p 2 , σ 2 2 hold approximately for large N i p i (1-p i ). The limitation of the normal distribution approximation ofp 1 ,p 2 when p 1 and p 2 are very close to 0 or 1 will be discussed in Section 6. Shen (2001) showed thatp S is a biased estimator of p S by investigating the distribution ofp S − p S , where bothp S and p S are random variables. The bias can be estimated based on the expectation ofp S − p S . The first and the second moments ofp S − p S have a straightforward mathematical expression given in Shen (2001) . φ(.) and (.) are the probability density function and cumulative probability function of the standard normal distribution, respectively. Therefore, Shen (2001) proposed a method called stepwise over-correction (SOC) to usep S −b 12 (γ ) to estimate p S and to correct the bias ofp S , whereb 12 (γ ) is an estimate for the bias with a tuning parameter γ to determine the width of each step of bias correction, and γ = 2 is recommended per Shen (2001) to make the Type I error rate of the hypothesis test in his article right below 0.025.
The estimate of var(p S − p S ) can be obtained by substitutinĝ p 1 for p 1 andp 2 for p 2 .
Obviously, dose selection does not have to wait until all patients finish the trial. In a two-stage design, we assume that a fraction of the N patients (say, ηN patients) in each group will be used for dose selection. Using the data from the dose selection stage to get the estimate of response rate in each dose groupp 1 andp 2 , the stepwise over-correction estimate for the bias ofp S − p S is still denoted byb 12 (γ ), with the variancê σ 2 1 andσ 2 2 given byp 1( 1−p 1) ηN andp 2( 1−p 2) ηN , respectively. The variance ofp S − p S is denoted as var bef as given by formula (2). Its estimate is denoted as var bef . After dose selection, (1 − η) N patients per group in the selected dose group and the control group are enrolled. Then we will calculatep aft , the proportion of successes in the selected dose group from the dose selection to the end of the clinical trial. The variance of the estimated response rate of the select dose group can be estimated byσ 2 aft , which isp aft( 1−p aft) (1−η)N . Thus, the estimate of the response rate of the selected dose group can be constructed by the formulap t = ηp S − ηb 12 (γ ) + (1 − η)p aft . The estimate of variance for (p t − p S ), is equal to η 2 var bef + (1 − η) 2σ 2 aft if we ignore the variance ofb 12 (γ ), where var bef is the estimate for var p S − p S given in Equation (2) usingp 1 for p 1 andp 2 for p 2 ,σ 2 1 andσ 2 2 only from the data at the dose selection stage. The SOC 95% confidence interval for the difference in response rates between the selected dose group and control group p S − p c , can be expressed as follows: We can reject the null hypothesis that the response rate of the selected dose has the same response rate of the control group if the lower bound of the above 95% confidence interval is greater than 0.
The sample size formula can be derived from the two-sided 95% confidence interval for the difference in the response rates as given in Equation (3). The sample size (N) per arm (or the total sample size N * ) is related to the expected treatment effect, p 1 , p 2 , and p c and the Type 2 error rate β. In this multipledose two-stage setting, the sample size not only depends on the differences between p 1 and p c , and between p 2 and p c , but also the difference between p 1 and p 2 .This is because either dose group 1 or 2 might be selected in stage 1. Then, the selected dose group will be compared with the control group at stage 2.
We assume that under an alternative hypothesis, the difference of the treatment effect between dose 1 and the control is 1 (i.e., p 1 − p c ) and the difference between dose 2 and the control is 2 (i.e., p 2 −p c ). The sample size needs to satisfy the following equation when the superiority holds with probability (1−β): when dose 2 is selected. In either case, Z approximately follows a standard normal distribution under the alternative hypothesis according to Shen (2001) when the sample size of a confirmatory clinical trial is under consideration. If we replace the observed response rates with the assumed response rates,v ar in the above equation will replaced by its true value var = η 2 var bef + (1 − η) 2 σ 2 aft + σ 2 c , we conclude that the sample size needs to satisfy where var bef , σ 2 aft , σ 2 c only depend on the true response rates p 1 , p 2 , and p c along with η and N, and (x) is the cumulated probability function of standard normal distribution. By solving Equation (4), we can obtain the sample size N per arm, and the total sample size N * = (2 + η) N. Again, η is the fraction of the number of patients in each group at the dose selection stage. As we can see, Equation (4) is more complicated than the sample size equation of a Phase III trial with a single dose group versus a control group. Tables 1 and 2 show that sample sizes obtained from Equation (4) are able to achieve the desired power in situations whether both doses or only one dose is superior to the control group. The number of patients per group at the final analysis as well as the total number of patients in the three groups combined (in the parentheses) are presented in Tables 1 and 2. In addition, under a nominal power 90%, we can optimize the total sample size N * through Equation (4) in Tables 1 and 2 in each column among the four typical timing of dose selection (η = 1 4 , 1 2 , 3 4 and 1). We observed that when both dose group are 15% superior in response rate as in Table 1, earlier dose selection (η = 1 4 ) yields a smaller total sample size. When only one dose group is superior in response rate as in Table 2, dose selection in the middle of enrollment per group yields a better total sample size.
where var 1 = To demonstrate the properties of sample size calculation in both methods above, we conducted simulations under a range of assumptions. Table 1 shows the sample sizes derived from Equations (4) and (5) achieve the target 90% power when both test dose groups are 15% better than the control group, but the sample size according to the naïve method from Equation (5) is obviously bigger than the sample size from Equation (4), particularly when the dose selection is conducted later during the trial.
It is interesting that when one dose group is 15% better than the control group while the other dose group has the same response rate as the control, both Equation (4) and (5) generated almost the same sample sizes as shown in Table 2. The simulated powers in Table 2 are slightly below the 90% nominal power. This is likely because Shen's (2001) stepwise over-correction estimator slightly overcorrects the bias. Nevertheless, the sample size from Equation (4) still gives us a good guidance regarding the number of patients needed to be enrolled. The result of Tables 1 and 2 show that Equation (4) provides a reasonably good estimate for sample size, but Equation (5) could be sometimes inaccurate.

How to Determine the Sample Size When the Response Rates Vary in a Range
We have shown that the sample size based on Equation (4) provides us with the desirable power when we know the response rates of the experimental groups are a set of pre-determined values relative to the response rate of the control group. However, in a real multiple dose group clinical trial, we could only assume the response rates of all dose groups fall into a range and at least one of them achieves the top limit of the range. For example, we often face an assumption that the control group has a response rate 0.7, one dose group of the test drug is expected to have a response rate 0.85 and the other dose group has a response rate in a range from 0.7 to 0.85. For such situations, sample size calculation cannot just do once for one particular alternative response rates of experimental groups, say 0.75 and 0.85. The sample size needs to be the biggest to cover all possible alternative scenarios. In addition, as we see from Tables 1 and 2, if we   Table 3. Sample size required for 90% power when the response rate is fixed for one dose group and the control group, but varies for the other dose group.
p c = 0.7 p c = 0.7 p c = 0.7 p c = 0.7 p 1 = 0.85 p 1 = 0.85 p 1 = 0.85 p 1 = 0.85 p 2 = 0.7 p 2 = 0.75 p 2 = 0.8 p 2 = 0.85 η = 1/4 N 184 (414) 220 (495) 210 (473)  select a dose group later in a trial, the total sample size N * will be larger, indicating the sample size varies with timing of dose selection. Therefore, proper timing of dose selection can reduce the required total sample size N * . We need the smallest sample size through dose selection which can achieve the desired power for all alternative scenarios. We use the above example to illustrate our procedure. The problem is that we do not know which dose will have a 0.85 response rate and where the response rate of another dose group falls into. Table 3 gives us four typical scenarios of dose response relationships we might encounter in adaptive dose selection design. The sample sizes were obtained from the method in Section 2 to achieve 90% power and verified by simulations in Table 3. If we assume that these four scenarios of response rates are equally possibly to happen and we do not know which one is true, it is natural to select the biggest total sample size N * among these four scenarios at each row for a timing of dose selection to secure the statistical power of the four scenarios. The optimal sample size for dose selection timing will be the smallest N * among these selected N * at each row. We call the above procedure the maximum-minimum approach to find a smallest N * that can have 90% power for all four scenarios of response rates.
In the setting of Table 3, we first choose the maximum total sample size of each row, which gives us the total sample size of 495, 485, 509, and 540 to cover 4 dose response scenarios at the dose selection of ¼, ½, ¾ and the end of the trial, respectively. Then we will choose the minimum among those numbers to find out the best timing of dose selection. Among them, the smallest total sample size is 485 with dose selection occurring at ½ of the trial (194 patients per group in two remaining groups plus 97 patients in a dose group, that is, dropped in the middle of the trial). By this maximum-minimum approach, we can ensure there is 90% power at the final analysis for all four scenarios. Table 3 only considers four response rates of the second dose which ranges between the response rates of the first dose and the control group. The same maximum-minimum strategy can be applied to continuous response rate of a dose group from 0.7 to 0.85 while the response rates of another dose and the control group are fixed at 0.85 and 0.7, respectively. The total sample size N * based on Equation (4) are shown in Figure 1. The maximum total sample size at the dose selection of ¼, ½, ¾ and the end of the trial 510, 485, 510, and 540, respectively. The minimum Table 4. Sample size required for 90% power when at least one dose is equivalent to the control group. p c = p 1 = 0.5 p c = p 1 = 0.6 p c = p 1 = 0.7 p c = p 1 = 0.8 p 2 = p 3 = 0.5 p 2 = p 3 = 0.5 p 2 = p 3 = 0.5 p 2 = p 3 = 0.5 η = 1/4 N from Equation (6)   of them is 485 when dose selection takes place at ½ of the seamless Phase II and Phase III trial. The maximum-minimum strategy based on Figure 1 gives us the same total sample size and suggests the same dose selection timing as Table 3.

Extension to More than Two Doses in Two-Stage Design That Tests for Non-Inferiority
We can extend our results in Section 2 and 3 to a multiple-dose clinical trial of more than two doses using the formula given in Shen (2001). As an illustration, we consider a three-dose trial.
A dose of the test drug is to be selected by the best observed cure rate at the time when ηN patients on each arm are ready for evaluation. The maximum of the three observed response rates, p S , defined as max(p 1 ,p 2 ,p 3 ), is an estimate of the true response rate of selected dose group, p S , where S = i ifp i =p S . As stated in Shen (2001), the mean ofp S − p S is the bias and is estimated bŷ 1, 2, 3) , (2, 3, 1) , (3, 1, 2)} , and (.) is the cumulative probability function of the standard normal distribution.
In this equation,b ij (γ ) is given in Shen(2001), The variance ofp S − p S is the variance of the estimate density function When dose selection happens at a proportion of sample size η, the estimate of the response rate of the selected dose group can be constructed by the following formula: , with the estimate of variance equal to η 2 var bef +(1−η) 2σ 2 aft . The sample size must satisfy the following equation: 1−β = 1− 1.96− + 1 √v ar P p 1 ≥p 2 &p 1 ≥p 3 + 1− 1.96− + 2 √v ar P p 2 ≥p 1 &p 2 ≥p 3 + 1− 1.96− + 3 √v ar P p 3 ≥p 1 &p 3 ≥p 2 .
Equation (6) is derived based on the requirement that the lower bound of the 95% confidence interval forp t − p S is greater than the noninferiority margin, -. For the superiority test, = 0 and for the noninferiority test, the margin is set as a prespecified > 0. 1 2 3 are the real difference of response rate between the dose group and the control group. In the sample size calculation,v ar in Equation (6) is estimated by var = η 2 var bef + (1 − η) 2 σ 2 aft + σ 2 c . The probability P p i −p j ≥ 0 &p i −p k ≥ 0 , (i, j, k) ∈ , can be calculated from the bivariate normal distribution ofp i −p j andp i −p k , Table 4 illustrates the effectiveness of sample size in a threedose versus a control trial with the objective to show one of the dose groups is noninferior to the control group. We target to achieve 90% power when at least one dose is equivalent to the control group by a margin of 15%. The sample sizes obtained from Equation (6) and powers via simulation are presented in Table 4. As we can see the desired 90% power is secured by the sample size derived from Equation (6).

An Example of Seamless versus Separate Phase II and Phase III Design
In a clinical trial where a new antibiotic drug is to compare to an active existing drug for the treatment of infectious disease, two doses of the experimental drugs are considered to have potential response rates from 0.70 to 0.85, and at least one of the two doses has 0.85 response rate, while the control group has a known response rate 0.70. If we use the adaptive seamless Phase II/III design as described above, the total sample size should be 485 (194 patients per group in two remaining groups plus 97 patients in a dose group, that is, dropped in the middle of the trial). The power to test between final selected dose group versus the control group is 90% if the described assumptions are true. This sample size takes the worst-case of the response rates in the assumptions into the consideration. If we use a traditional separate Phase II trial to select a dose and a Phase III trial to confirm the superiority of the selected dose over the control group, we need to determine the number of patients treated in the Phase II part and the number of patients randomized in the Phase III part of the trial. The patients in Phase II cannot be used in the analysis of Phase III trial. To have a fair comparison, a total of 485 patients are allowed for Phase II and Phase III trial together. If we select a correct dose, say dose1 with corresponding p 1 =0.85 without loss of generality, 158 patients per group is needed in the Phase III trial to achieve 90% power in comparison of 0.70 response rate in the control group. The remaining 142 patients will be evenly assigned to the two dose groups in Phase II trial. The response rate of the other dose group, p 2 is assumed to be 0.70, 0.75, 0.80, and 0.85, respectively. We will select the dose of highest observed response rate in the Phase II trial. Table 5 shows the probability of selecting dose 1 and dose 2, and the probability of achieving statistical significance of comparing the selected dose to the control group. The final statistical power is the sum of the products of the probability to be selected and the probability of success. Table 5 shows that with 71 patients per group in Phase II trial, we have 21.6% chance to select a less efficacious dose when p 1 =0.85 and p 2 =0.8. If dose 2 is selected, the probability of statistical significance for that dose is only 0.301. As a result, the overall power of final selected dose is only 0.771. In seamless Phase II/III trial, 97 per group could be used in the dose selection. The chance to select dose 2 reduces to 10.3%. Furthermore, the patients per group in Phase III trial increase to 194 after combining the patients in Phase II and Phase III in the final analysis. Owing to these two reasons, the power of the final test after proper bias adjustment method in the seamless adaptive design can keep the power at 90% in this situation. Only when both doses have 0.85 response rates, the power of separate Phase II and Phase III trial design has 90% power. As we see in Table 3, seamless design by SOC method under the same situation only needs total 417 patients (139 patients per group) at most to have 90% power. Therefore, seamless Phase II/III design can have either a better power or a smaller sample size than the traditional separate Phase II and 3 trial design.

Discussion and Conclusion
In Section 3, we assume the alternative scenarios are equally likely to occur. The total sample size of maximum-minimum approach was driven up in order to cover the worst scenario with some less efficacious doses. If we can assume a probability distribution of response rates of the experimental doses, we can put this probability in to consideration for sample size. For example, we can calculate the total sample size obtained by the formula (4) and adjusted it by the probability of response rates scenario, then apply the maximum-minimum strategy to the adjusted sample size. Sample size formula in this article is based on SOC method. The validation of power by simulation also used the SOC method for hypothesis test. In fact, the method of Posch et al. (2005) was also used to verify if the sample size and dose selection determined by the method of this article is proper. A weighted inverse normal combination function was used because that was commonly used. A closed testing procedure was applied to show all intersection hypotheses containing the selected dose group are rejected at the same α level. We repeated the simulations of Tables 1 and 2 and concluded that Posch's method and the SOC method have almost same power to demonstrate the efficacy of the selected dose group versus the control group.
Asymptotic normal approximation of observed response rates was assumed in the article. When the sample size is small, the approximation may not be appropriate as mentioned by Ristl et al. (2018). When the probability of event p is very close to 0 or 1, asymptotic normal approximation does not work well even though we have a large sample size N. As a commonly used rule, the normal distribution is generally considered to be a satisfactory approximation for a binomial distribution when Np ≥ 5 and N(1−p) ≥5. However, for extreme situations where p <0.05 or >0.95, it would be better not to use the normal distribution approximation regardless of sample size. Given the limitations of using a normal distribution to approximate the distribution of the response rate, we suggest the above general rule to be followed when deciding whether the sample size calculation in this article can be used in the trial design. Fortunately, for most Phase III trials having response rate as its endpoint, sample size per group is usually at least 100 or more and response rate is not close to the extreme value 0 or 1. Therefore, the asymptotic normal approximation can often be applied.
In summary, the proposed sample size calculation is built on Shen's stepwise over-correction method. Based on the simulations under the sample size of the proposed method shown in this article, the statistical test power of the selected dose versus the control group is reached. When we design a clinical trial of multiple doses, it is important to compare the sample size and power derived from a range of response rates of multiple doses. Multiple-dose clinical trials are useful when we want to incorporate the Phase II dose-finding stage into a Phase III trial. If we know the best dose in advance, we can use the standard Phase III trial of a single dose versus a control group. But when we cannot determine the best dose at the time to design a Phase III clinical trial, a two-stage, multiple-dose adaptive design has its advantage to examine the efficacy of each dose in the dose selection stage and carry this part of the data to the confirmatory stage to evaluate the efficacy of the selected dose seamlessly. In terms of the total sample size (dose selection stage and confirmation stage combined) and the power of the trial, adaptive multiple-dose design is better than the design with separate Phase II and Phase III studies.
Because we need to consider a variety of alternative hypothesis of the testing doses, a maximum-minimum approach for sample size is proposed. We first choose the maximum total sample size among a range of possible response rates of the competing doses. Then, we determine the timing of dose selection by picking the minimum of those total sample sizes. The sample size of this method will guarantee us the desired statistical power on a broad range of response rates of the doses to compare with the control group. The normal approach of sample size calculation for a Phase III clinical trial and for a Phase II clinical trial will not provide us the accurate sample size for the adaptive design. The sample size calculation method in this article properly solves this problem.