A Bayesian phase I–II clinical trial design to find the biological optimal dose on drug combination

ABSTRACT In recent years, combined therapy shows expected treatment effect as they increase dose intensity, work on multiple targets and benefit more patients for antitumor treatment. However, dose -finding designs for combined therapy face a number of challenges. Therefore, under the framework of phase I–II, we propose a two-stage dose -finding design to identify the biologically optimal dose combination (BODC), defined as the one with the maximum posterior mean utility under acceptable safety. We model the probabilities of toxicity and efficacy by using linear logistic regression models and conduct Bayesian model selection (BMS) procedure to define the most likely pattern of dose–response surface. The BMS can adaptively select the most suitable model during the trial, making the results robust. We investigated the operating characteristics of the proposed design through simulation studies under various practical scenarios and showed that the proposed design is robust and performed well.


Introduction
In the traditional clinical trial, it is assumed that both toxicity and efficacy increase monotonically with dose level.To begin with, maximum tolerated dose (MTD) through phase I is determined and followed by phase II trial to further determine efficacy.In recent years, significant innovations have been made in tumour treatment, and new agents such as biological response modulators and monoclonal antibodies have been developed to address the multiple treatment pathways and strong variability of cancer cells.Unlike cytotoxic agents, novel agents such as immunotherapy and targeted therapies may follow a non-monotonic pattern in their dose-efficacy curve.Therefore, compared with the traditional framework, a phase I-II, clinical trial framework that simultaneously monitors toxicity and efficacy is more appropriate for these innovative drugs.Wages and Tait (2015) outlined a method to identify the optimal biological dose (OBD) combining features of the continual reassessment method (O'Quigley et al. 1990) and order restricted inference.Zang and Lee (2017) proposed a twostage seamless phase I-II design, which used the Bayesian model averaging continual reassessment (BMA-CRM) (Yin and Yuan 2009) method to monitor the toxicity outcomes and adopt an isotonic regression based on efficacy to guide dose escalation in the first stage.Lin and Yin (2017) extended the dose -finding method of Bayesian optimal interval (BOIN) (Yuan et al. 2016) from MTD to OBD in phase I-II trials.Li et al. (2020) proposed two model-assisted designs, which incorporated the toxicity and efficacy outcomes simultaneously and identified a dose having high probability of acceptable efficacy with manageable toxicity.
In recent years, drug combination therapies have received increasing attention, since it is advantageous over the use of a single drug.First, the combination therapies enhances the therapeutic effect by increasing the dose intensity if two drugs do not produce a toxic overlap.Second, it can weaken the drug resistance that occurs in single-drug therapy.In addition, combination therapy can benefit more patients by working on multiple targets.However, at the same time, it poses new challenges to traditional clinical trial designs.For instance, combination therapies have more candidate doses and a larger space for dose exploration than single drug requiring large sample size.Furthermore, due to the complex interaction mechanisms between drugs, we only able to order the toxicity of drug combinations locally, the dose of one increased while the other decreased resulting in nondetermination of dose-toxicity relationships.This shows that the trial design for toxicity exploration of single agent cannot be directly used for dose-toxicity of combination drugs.Under the phase I-II framework, several drug combination trial designs were proposed.Yuan and Yin (2011) proposed a Bayesian seamless two-stage design, which determined the safe dose set in the first stage and explored the most effective dose combination through adaptive randomization in the second stage.Based on the former, Yada and Hamada (2018) proposed a two-stage dose combination design for phase I-II, using Bayesian hierarchical model to the imputed data to estimate the probabilities associated with toxicity and efficacy.Cai et al. (2014) proposed a novel dose-finding algorithm, which encouraged full exploration of untried dose combinations in two-dimensional space.Considering that biologically optimal dose combination (BODC) may be outside the candidate dose range or sandwiched between existing dose combinations, Guo et al. (2015) proposed a dose-insertion design based on toxicity and efficacy and adopted adaptive model selection for single drug trials.On this basis, Lyu et al. (2018) extended the idea of dose-insertion design to two drugs, and proposed an adaptive cohort division.Considering efficacy end point often requires a relatively long time to assess, Riviere et al. (2015) proposed a novel hazard model for efficacy, which accounted for the plateau in the novel molecularly targeted agent dose-efficacy curve.Tighiouart (2018) estimated safe dose combination regions along the MTD curve with a desired level of efficacy and used continuous dose levels.
Project Optimus is an FDA initiative aimed at improving the efficiency of dose-finding clinical trials for cancer therapies.It recommends a paradigm shift towards model-based dose-finding, which involves the use of mathematical models to analyze data from multiple patients and dose levels to determine the most optimal dose.This approach allows for more efficient and accurate dose-finding and can accelerate drug development and improve patient outcomes.In this paper, the phase I-II trials stated above employed a pre-specified dose-efficacy model and their expected performance depended on the coincidence of the true case and the designated model.The diversity and unpredictability of dose-efficacy relationships pose challenges for the construction of efficacy models.In addition, the sparse data from early small sample trials can lead to an improper model having a greater impact on the performance of design.A reasonable approach is to choose an appropriate model based on the data collected during the trials.In order to address the complex dose-response relationship and the problem of model designation, a Bayesian model selection (BMS) approach is taken to determine a dose efficacy model that best fits the currently collected data by comparing the posterior probability of each candidate model.To begin with, multiple dose-efficacy candidate models were defined to characterize the possible pattern of dose-response curve.For example, the linear logistic regression model is used to consider two cases where the efficacy increases monotonically with dose, then tends to plateau.While the logistic regression model with quadratic terms can describe the case where the efficacy increases first and then decreases with dose.As the trial progresses, an adaptive selection is made among multiple candidate models based on the fit performance.The BMS method automatically and adaptively selects the model with the maximum posterior probability.After a new cohort of patients enters the trial, in light of the most recent observations, we update the dose-efficacy models using the BMS approach.As a result, patients can be treated at more appropriate doses.
In addition, the proposed design quantifies the trade-off between toxicity and efficacy using a utility function and defines the dose combination with the maximum mean utility as BODC under acceptable safety.The use of posterior mean utility as a dose selection criterion is not new and is on record (Houede et al. 2010;Thall et al. 2017).The design of this paper is divided into two stages: in phase I, we escalate doses along the diagonal of the dose combination matrix for a rapid dose climb; in phase II, based on all toxicity and efficacy data observed, we continuously update the posterior estimates of toxicity and efficacy and assign patients to the most appropriate dose combination.At the end of the trials, dose combination with dose limiting toxicity (DLT) rate not exceeding toxicity upper limit with the maximum mean posterior utility as BODC was selected.Extensive simulation studies showed proposed design has ideal performance under various dose-efficacy relationships.
The remainder of this paper is organized as follows.In Section 2, probability models are introduced.In Section 3, we describe the process of trial design.In Section 4, we investigate the operating characteristics of the proposed design through a simulation study.Discussion along with conclusion is mentioned in Section 5.

Modelling toxicity and efficacy
Consider a drug combination doses finding trial.Suppose that agent A has J doses level, denoted byx a;1 < . . .< x a;J , agent B has K doses level, denoted byx b;1 < . . .< x b;K .Without loss of generality, we assume J � K and that the dosage values of thex a;j and x b;k have been standardized to have mean 0 and standard deviation 0.5.Let x jk ¼ x a;j ; x b;k À � denote the combination of dose levels j and k of agent A and agent B, and let p x jk À � and q x jk À � denote probability of toxicity and efficacy of the dose combination x a;j ; x b;k À � , respectively, for j ¼ 1; 2; . . .; J; k ¼ 1; 2; . . .; K.

Dose-toxicity model
In modelling dose-toxicity relationships for biological agents, we consider the basic feature that the dose-toxicity curve initially increases at low doses and then plateau at high doses.In this paper, we use a logistic dose-toxicity model to capture this profile of the combined agents: where α ¼ α 0 ; α 1 ; α 2 ð Þ 0 is an unknown parameter vector.To sure toxicity probability initially increases with the doses of A and B, it is assumed that α 1 > 0 andα 2 > 0. In the background of drug combination trials, according to Wang and Ivanova (2005), Riviere et al. (2015), they found that a model without interaction performed as well as a model with interaction for dose finding , the reliable estimation of such an interaction term requires a large sample size (e.g., a few hundred).To illustrate including the interaction term is not beneficial, we compared the performances of the designs under two different toxicity models (results are shown in Table S1).For dose-toxicity relationship, we don't pursue to model it completely but aim to capture sufficient toxicity characteristics to facilitate dose escalation and step down.Therefore, for simplicity, the interaction between the two drugs was not considered in dose-toxicity model.

dose-efficacy model
The influence of pharmacology and mechanisms of action on the efficacy of drugs may vary with dose.The number and form of models to be included in the selection procedure are flexible.In practice, statisticians should communicate with clinicians in a timely manner to adjust efficacy models according to the characteristics of different drug.In this study, three common dose-efficacy models are considered, denoted by M l ; l ¼ 1; 2; 3.
Consider the situation in which the effect of one drug does not depend on the presence or absence of the other drug, meaning that the two drugs act independently and the combined effect is simply the sum of the individual effects.It is appropriate when the dose-response relationship for each drug is linear and there is no interaction between the drugs.Therefore, we consider using the linear logistic model to describe the efficacy rateq x ð Þ.Here, β 1 ¼ β 10 ; β 11 ; β 12 À � 0 is an unknown parameter vector.
Logistic models with interaction terms allow for the possibility that the effect of one drug may be influenced by the presence or absence of another drug.That is, the two drugs interact, and the combined effect is not simply the sum of the individual effects.Interaction terms allow for the modelling of these complex relationships and the possibility that the dose-effect relationship may differ depending on the combination of drugs doses used.They are appropriate when there is evidence of a significant interaction between the drugs.Introducing the interaction term on the basis of M 1 , where β 2 ¼ β 20 ; β 21 ; β 22 ; β 23 À � 0 is an unknown parameter vector, β 23 denotes the interaction of efficacy of drugs A and B.
Logistic models with quadratic terms allow for the possibility of non-linear dose-response relationships.In case where the efficacy of drug combinations increase first and then decrease with the dose, i.e. the dose-efficacy curve is bell-shaped, the most effective dose is located in the middle of the dose range, the lower or above middle dose are termed less effective.Quadratic terms can capture these non-linear relationships and are useful when the effect of a drug is complex and not easily predictable.Therefore, we consider to express the non-monotonic shape of the dose-efficacy surface by introducing quadratic terms.Assuming that the efficacy probability q x ð Þ follows the logistic model with quadratic terms, where β 3 ¼ β 30 ; β 31 ; β 32 ; β 33 ; β 34 À � 0 is an unknown parameter vector.
In summary, linear logistic model is sufficient to cover all possible linear dose-effect relationships, logistic models with interaction terms are appropriate when there is evidence of a significant interaction between the drugs, and the logistic model with quadratic terms is sufficient to cover all possible dose-effect relationships that involve curvilinear dose-response curves.However, it is important to choose the appropriate model based on the nature of the dose-effect relationship and the specific drugs involved in the combination therapy.

Adaptive model selection
dose-efficacy curves may be monotonic or non-monotonic, depending on the influence of many factors, such as pharmacology and action of drugs.To this end, we propose BMS procedure that allows adaptive model selection throughout the trial.
In general, we suppose that there are L models of interest indexed by l ¼ 1; . . .; L. Furthermore, prior information about the plausibility of each model is taken into account and introduce unavailable, we proceed in the same way by setting equal probability for each model.As described in Section 2.1.2,three candidate dose-efficacy models are considered and hence use p M l ð Þ ¼ 1=3.In the model selection, we calculate p M l jD ð Þ, the posterior probability of each model, and select the one with the maximum posterior probability as the dose-efficacy model.The posterior probability of model M l is given by the following equation: Thinking that the posterior distribution of the parameters to be estimated does not have a closed form, this paper uses the Monte Carlo numerical integration method and approximates it with the harmonic mean of the likelihood (Kass and Raftery 1995).Assuming that denotes a Markov chain Monte Carlo (MCMC) sample derived from the posterior distribution of β l under M l .We can show that

Utility function
The utility function U y ð Þ is used to assess the desire for each dose combination, and its value should be obtained from clinicians to reflect the desirability of each potential outcome in terms of toxicity and efficacy.In this paper, we consider toxicity and efficacy outcomes as binary, but the function can be generalized to multilevel or multiple outcomes easily (see the Table S2 in the supplementary for an example of multiple endpoints).There are four bivariate dichotomous outcomesy The score of the least expected outcome is usually fixed at 0 with U 1; 0 ð Þ ¼ 0 (i.e.DLT and no response), and the score of the most desired outcome is 100 with U 0; 1 ð Þ ¼ 100 (i.e.no DLT and response), which clinicians are asked to use as a reference to score the other two which falling in between, U 0; 0 ð Þ and U 1; 1 ð Þ.The purpose of scoring 0 and 100 for the least desirable and most desirable outcomes, respectively, is to establish a fixed utility scale and obtain a commonly used 0-100 scoring system.Clinicians are only required to give two utility values, U 0; 0 ð Þ and U 1; 1 ð Þ, which reflect the risk-benefit trade-off for their medical decisions.For example, if a reduction in toxicity is preferred over an increase in efficacy, a larger value will be assigned to U 0; 0 ð Þ, and a smaller value to U 1; 1 ð Þ.Conversely, if the need to improve efficacy is more urgent than reducing toxicity, a higher value will be assigned to U 1; 1 ð Þ.From the dose-finding perspective, setting U 0; 0 ð Þ � U 1; 1 ð Þ usually leads to a more thorough exploration of the dose space than setting To simplify display, x jk and θ l , are omitted and the true mean utility is written as

Likelihood and prior
Suppose at a certain moment in the trial, there are n jk patients treated at the same dose combination x jk , y jk and z jk denote the numbers of patients who have experienced dose limiting toxicity and efficacy events at dose level x jk respectively, for j ¼ 1; 2; . . .J, k ¼ 1; 2; . . .K. The likelihood function of the observed data D ¼ y jk ; z jk ; n jk À � under model M l can be written as where the subscript l ¼ 1; 2; 3 denote three dose-efficacy models.π T α ð Þ and π E ðβ l jM l Þ denote the priors for α and β l .Assuming that the priors of α and β l are independent of each other, the joint posterior distribution under model M l can be written as For the dose-toxicity model (1), we apply the weakly informative prior Cauchy 0; 10 ð Þ for interceptα 0 .We assign α 1 and α 2 independent gamma prior distributions with shape parameter 0.5 and rate parameter 0.5 to ensure monotonicity before the dose-toxicity surface reached a plateau (Cai et al. 2014).For the prior specification of the efficacy model ( 2)-( 4), we also consider using a weakly informative prior for β 1 , β 2 and β 3 , recommended by Gelman et al. (2008).To be specific,β �0 Cauchy 0; 10 ð Þ (β �0 stands for β 10 , β 20 and β 30 , the same as below), β �1 ; . . .; β �4 Cauchy 0; 2:5 ð Þ.The priors are weakly informative and appropriately regularized to improve the stability of the estimates while ensuring the data domination (Gelman et al. 2008).

Trial design
In this study, we divide the dose-finding design into two stages.Stage I is a run-in period aimed at quickly exploring the dose combination space and collecting preliminary data to reliably estimate the parameters of the probability models proposed in Stage II for systematic dose-finding .Beginning with Stage I of the design, patients in the first cohort are treated at the lowest dose combination x 11 ¼ x a;1 ; x b;1 À � and then the dose is increased along the diagonal of the dose combination matrix.If the dose matrix is not square ðJ > KÞ, the dose escalation is carried out along the diagonal to explore x a;1 ; x b;1 À � x a;2 ; x b;2 À � ; . . .; x a;K ; x b;K À � and then keep the dose level of drug B unchanged at k and improve the dose level of drug A from x a;K ; x b;K À � to x a;Kþ1 ; x b;K À � , until it reach the highest dose combination x a;J ; x b;K À � .The simulation results in Section 4.2 reveal that the percentage of patients assigned to overdose is not large, with a maximum of only 28%, indicating rapid dose space exploration along the diagonal did not result in many patients exposed to toxic doses.In addition, we summarize the average number of enrollment and the average number of toxic events during the first stage (Table S3).
Considering the sparse data in the early stage of the trial, the regression model may not work well, we assess the security using a beta-binomial model rather than the proposed dose-toxicity model (1) in the first stage to guide an early rapid dose climbing.To ensure the dominance of the trial data in the posterior distribution, we assume that the toxicity probability p jk follows a beta distribution,Beta 0:1; 0:2 ð Þ, and the number of toxic events y jk follows a binomial distribution Binomial n jk ; p jk À � .Under the beta-binomial model, the posterior probability isPr When the dose combination violates safety, i.e.Pr p x jk À � > p T jD À � > δ, or reach the highest dose combination x a;J ; x b;k À � , we stop stage I and move to stage II.Also, we collect efficacy data in phase I, but do not used it to guide dose climbing.
In Stage II, we perform systematic dose exploration by using toxicity and efficacy probability models to derive inferences and evaluate dose combinations with utility functions based on the data accumulated from the trials.Specifically, once a cohort of patients has completed the trial, we update the collected outcome data and generate MCMC posterior samples for the parameters under models Assuming that model M l � is selected, MCMC posterior sample θ l � is obtained.At the current dose combination x jk , we identify A 1 as the one-degree admissible dose set, whose dose levels are different from x jk no more than one level and satisfy the safety requirement, i.e.
. The dose exploration rules for stage II are as follows: If x jk is acceptable, subjects are assigned as per following rules.
(1) Based on the accumulated trial data, identify A 1 as the set of one-degree safe neighbourhood of x jk .
(2) Among the dose combinations in A 1 , we calculate the posterior mean utility of each combina- x j 0 k 0 2 A 1 , and then identify the dose combination x j � k � , which has the highest posterior mean utility and meets the safety condition, (3) If no patients have yet been assigned to the dose combination x j � k � , or if all of the dose combinations in A 1 have been tried, we assign the next cohort of patients to x j � k � .However, if x j � k � has tried and there are untried dose in A 1 , the next cohort of patients will be assigned to x j � k � only if the following condition is met where U 0 is the lowest acceptable utility value, n 2 is the total number of patients that have been enrolled in stage II and ω is a pre-specified tuning parameter controlling the stringency of the threshold value.If the above conditions are not met, exclude x j � k � from dose set A 1 and return to Step 2.
If x jk is unsafe, lower the dose to an untried dose x jÀ 1;k ¼ x a;jÀ 1 ; x b;k À � or x j;kÀ 1 ¼ x a;j ; x b;kÀ 1 À � , which is one level below the current one.If both dose combinations exist and neither has been tried, patients in the next cohort are assigned to the combination which has a higher posteriori probability.If only one dose combination exists and has not been tried, the next cohort of patients is assigned to this dose combination.If both dose combinations exist but both have been tried, the next cohort is assigned to the dose with the maximum posterior mean utility among all existing acceptable doses.
The dose assignment rule (5) adaptively assigns patients to untried doses to avoid dose exploration trapped in subtherapeutic doses and balances two objectives of the trial.At the beginning of the trial, patients are assigned to different dose combinations in order to quickly obtain efficacy and fully explore dose matrix.After obtaining a certain efficacy and a relatively stable estimation, patients are centrally assigned to the optimal dose combination, which met the ethical requirements of the trial.
The two-stage trial design for the BODC search is organized as follows: At the beginning of the trial, the first cohort of patients is treated with the lowest dose x 11 or specified by the clinician.Assuming the current dose combination is x jk .

Stage I Run-in period
I1 If the current dose combination x a;j ; x b;k À � is safe, escalate the dose along the diagonal and treat the next cohort of patients at x a;jþ1 ; Identify A 1 as the set of one-degree safe neighbourhood of x jk ; b) In A 1 , identify the dose x j � k � ¼ x a;j � ; x b;k � À � that meets the safety condition and has the maximum posterior mean utility; c) If n j � k � ¼ 0 or n r;s �0; "x r;s PA 1 , treat the next cohort at dose from A 1 and go to step b; II3 If x jk is too toxic, deescalate to an untried one-degree dose x a;jÀ 1 ; x b;k À � or x a;j ; x b;kÀ 1 À � ; II4 If no dose is recommended in II2-II3, the next cohort of patients is treated with the highest posterior mean utility among all safe doses; II5 Repeat II1-II4 until the maximum sample size N is reached; II6 Select the dose having highest posterior mean utility among all tested safety doses as the BODC.

Simulation setting
We investigated operating characteristics of the proposed design through extensive simulation studies.For comparison, three candidate dose-efficacy models to estimate the efficacy rate mentioned in Section 2.1 were included.These individual dose-efficacy models are referred to as Model 1, Model 2 and Model 3. We considered a trial combining two biological agents A and B, in which drug A has three dose levels and drug B has five dose levels.The maximum sample size was 60 with 3 patients in each cohort.We evaluated eight different scenarios shown in Table 1.Toxicity scenarios were constructed arbitrarily, with a roughly monotonic increasing trend at each drug level, taking into account the different locations of MTD in different scenarios.Among the Efficacy Rate Scenarios 1-6 are approximate fit to three candidate dose-efficacy models, where Scenarios 1-2 roughly corresponded to Model 1, Scenarios 3-4 are close to Model 2, and Scenarios 5-6 approximated to Model 3, Scenarios 7-8 considered other dose-efficacy surface and BODC locations.To protect patients from toxic and/or ineffective dose combinations, we set the toxicity upper limit p T ¼ 0:3 and the efficacy lower limit p E ¼ 0:14, respectively.Other values can also be assigned to them based on specific practical clinical requirements.Based on the utility function formula provided in Section 2.3, we can calculate the lowest acceptable utility value U 0 ¼ 30.We set the probability threshold for the practical rule and safety requirement δ ¼ 0:95 and the tuning parameter ω ¼ 2. The standard random walk metropolis-Hastings algorithm was used for MCMC calculation.For each chain, 4000 MCMC samples are drawn with a burn-in size of 1000 iterations.Under each scenario, we carried out 5000 simulated trials for each of the methods.

Operating characteristics
The simulation results for comparing the candidate models with proposed approach under each scenario are summarized in Table 2, including the selection percentage of the BODC, the average number of patients allocated to the BODC and the percentage of patients assigned to overly toxic doses.Table 3 shows the probability of selecting candidate models under the BMS procedure in Scenarios 1-8.The detailed simulation results in Scenarios 1-8 for our proposed design and single model are provided in the supplementary materials (Table S4-Table S7), including the selection percentage of each dose combination and the number of patients assigned to each dose combination.
In Scenario 1, the toxicity rate and the efficacy rate increase monotonically with the dose of the two drugs, and only part of the dose combination is within the acceptable range, the BODC is a 1 ; b 5 ð Þ. Scenario 2 has the similar dose-toxicity and dose-efficacy profile with Scenario 1, where optimal dose combination.In such cases, Model 1 had the highest selection percentage of 43% for the BODC in both scenarios, and the average patients allocated to the target dose combination were 16.59 and 11.46 in Scenarios 1 and 2, respectively.In addition, Model 1 had the highest probability of being selected at 37% when using the model selection method.
In Scenario 3, the dose-toxicity surface increases with the dose levels of agents A and B, while the probability of efficacy decreases with the both dose levels monotonically, the target BODC is a 1 ; b 1 ð Þ. Scenario 4 shares the same dose-toxicity profile with Scenario 3 but has a different shape of the doseefficacy surface, where a 1 ; b 5 ð Þ is the target BODC.In these two scenarios, there is a strong interaction between the two drugs, thus Model 2 had the highest selection rate of BODC among multiple models, 76% and 42%, and the patients allocated to the target dose combination were 27.25 and 12.63, respectively.Reasonably, in this case, under the BMS method, the selection percentages of Model 2 were also slightly higher than that of Model 1 and Model 3 with 41% and 43% in sScenarios 3 and 4, respectively.In addition, the BODC selection percentage using the proposed BMS-based design was the second best among these designs.
In Scenario 5, the efficacy rate increases first and then decreases with the dose of drug B and increases monotonically with the dose of drug A with a 3 ; b 3 ð Þ as the BODC.In this case, the flexible Model 3 performed best with the selection percentages of the BODC, 48% and the patients allocated to the target dose combination, 14.48.In Scenario 6, the dose-efficacy surface increases first and then decreases with the dose levels of the two drugs, the target BODC is a 2 ; b 3 ð Þ.Both Model 1 and Model 2 performed poorly, and neither of them selected BODC.However, the Model 3 selected the BODC to the tune of 34% and allocated 16.5 patients on an average.That good-performing Model 3 was the highlight of the BMS procedure, and it lifted the BODC selection percentage of the proposed design to a level close to the optimum of 30%.When employing the BMS procedure, there is a 73% probability of selecting Model 3 as the dose-efficacy model.
In Scenarios 7 and 8, again the proposed BMS-based design was robust, with the BODC selection percentage close to the best-conducted model.Evaluating relative model performances showed that typically one or two models did not perform well.In Scenario 8, in particular, the dose-efficacy curve approximately plateaus at the lowest dose level of the targeted agent B, and the BODC is the combination a 3 ; b 1 ð Þ.The proposed design selected the BODC 38% of and allocated on average 8.81 patients.Considering the safety perspective under any model, it is observed from the Table 2 that the mean percentage of patients assigned to the overtoxic dose combinations was not more than 30%.The three models have different application scenarios.Although they performed very well in the scenario that conforms to the nature of their own, they have certain limitations.The advantage of BMS is that its results are more robust in terms of dose selection probabilities.The proposed design typically cannot perform as well as the best single dose-efficacy model, but its performance is always quite close to that of the best single dose-efficacy model and can be much better than that of the worst single doseefficacy model.The robustness of BMS is critical since dose-efficacy relationships are often unpredictable in practice.

Sensitivity analysis
We also conducted sensitivity analyses to evaluate the performance of the proposed design under different utility specifications, different correlations between toxicity and efficiency and different cohort sizes.Details are reported in supplementary materials Table S8-Table S10.In the above simulation studies, we set U 1; 0 , implying that patients are willing to tolerate toxicity in return for efficacy.We then considered the case where , indicating that having both toxicity and efficacy is inferior to having no effect.As a result, such a utility specification favors a more conservative dose escalation.According to the simulation results shown in Figure 1, it was found that the performance of the proposed design was undermined with a conservative utility specification in most scenarios, on which the BODCs is defined.In Scenario 3, the BODC is a 1 ; b 1 ð Þ, indicating that the optimal dose combination can be easily found without too much dose exploration resulting in better performance of the conservative setting.We performed another sensitivity analysis.Starting from c = 0, we hypothesized that toxicity and efficacy outcomes were independent of each other.To assess the robustness of the design for the dependence between toxicity and efficacy, a sensitivity analysis of the relevant toxic-efficacy data was also done.Based on the bivariate Gumbel model, we generated toxicity and efficacy data with positive correlation, and the correlation coefficient c = 2 was set.Simulation results are shown in Figure 2. It revealed that the percentage of correctly selected BODC and the mean number of patients received treatment under BODC had little difference under different correlation coefficients, showing that the operational characteristics of the proposed design are usually robust to different toxicity and efficacy correlations.
We then considered the case a cohort size of two.By comparing the simulation results of different cohort sizes, we find that the proposed design is fairly robust to different cohort sizes, with a few scenarios where a cohort size of two is slightly more than a cohort size of three in terms of the number of patients assigned to BODC(s); however, overall, a cohort size of three is lightly greater than a cohort size of two.A cohort size of three patients for general trials is more recommended as using a cohort size of two may add additional complexity in implementation and logistics.Simulation results are shown in Figure 3.

Discussion
This study proposes phase I-II clinical trial based on BMS to identify the dose combination with optimal toxicity-efficacy trade-off within the acceptable toxicity range.Considering the diversity of the dose-efficacy relationships, where efficacy first increases and then plateaus or first increases and then decreases, it is often impossible to determine dose-efficacy relationships before dose exploration.However, most of the existing model-based designs have fixed dose-efficacy curves, which greatly limits the application of such trial designs, it is necessary to consider multiple efficacy relationships when constructing the model.In this paper, we pre-specify several candidate dose-efficacy models and propose an adaptive model selection method.It allows the dose-response model to vary between the monotone pattern and non-monotone pattern as trial data accumulate, thereby flexibility and reducing the limitations of a single model adverse effects on operational performance, improving the robustness of model estimation.In addition, utility function is used to measure the toxicityefficacy trade-off, which can not only reflect the clinical experience in treating the disease, but also more intuitively reflect the pros and cons of patient outcomes.The simulation results show that our method can identify the BODC in different scenarios.In terms of safety, this method has a low probability of selecting the overdoses as BODC, and the number of patients treated at overdoses is much smaller than the number of patients treated at safety doses.
Nevertheless, there are several possible extensions of the proposed design.The time gap between safety and efficacy data availability for therapies is an important area for future research and improvement in oncology clinical development.There are many designs already available.Riviere et al. (2015) model efficacy as a time-to-event variable rather than a binary outcome, thus naturally incorporating efficacy data from incomplete assessments as censored observations in decisions of dose assignment.Jin et al. (2014) treated late-onset outcomes as missing data, used data augmentation to infer missing outcomes from posterior predictive distributions computed from partial follow-up times and complete outcome data.Another direction for future exploration could be the development of biomarkers or other predictive tools that can help identify patients who are most likely to respond to a particular therapy.For example, DAUD et al. ( 2016) used CD8 + T-cells to predict response of anti-PD-1 therapy and advocated the use of immune response to predict the likelihood of achieving various clinical outcomes with PD-1 pathway inhibition.In addition, other reasonable alternative utilities can be used to determine the target BODC.For example, Mozgunov and Jaki (2020) proposed a toxicity and efficacy trade-off function based on information entropy to weigh current data against targets.Lyu et al. (2018) chose utility as a multiplication of a linear truncated (safety utility) and an exponential function (efficacy).The whole function is divided into two parts, where the safety utility is a decreasing function of the toxicity rate and efficacy utility is an increasing function of the efficacy rate.

Figure 1 .Figure 2 .
Figure 1.Percentage of correctly selected BODCs and the average number of patients treated in BODCs under different utility specifications.

Figure 3 .
Figure 3. Percentage of correctly selected BODCs and the average number of patients treated in BODCs under different cohort sizes.
Note: The BODCs with maximum utility values are in bold.Considering the fact that finding a dose very close to BODC is still practically useful, we consider doses within 3 points of desirability from maximum desirability as BODC.

Table 2 .
Simulation study comparing the three candidate models and BMS for Scenarios 1-8.

Table 3 .
Simulation study for the selection percentage of BMS to candidate models.