Isotonic design for phase I cancer clinical trials with late-onset toxicities

ABSTRACT This article addresses the problem of identifying the maximum tolerated dose (MTD) in Phase I dose-finding clinical trials with late-onset toxicities. The main design challenge is how best to adaptively allocate study participants to tolerable doses when the evaluation window for the toxicity endpoint is long relative to the accrual rate of new participants. We propose a new design framework based on order-restricted statistical inference that addresses this challenge in sequential dose assignments. We illustrate the proposed method on real data from a Phase I trial of bortezomib in lymphoma patients and apply it to a Phase I trial of radiotherapy in prostate cancer patients. We conduct extensive simulation studies to compare our design’s operating characteristics to existing published methods. Overall, our proposed design demonstrates good performance relative to existing methods in allocating participants at and around the MTD during the study and accurately recommending the MTD at the study conclusion.


Introduction
Historically, the primary objective of Phase I trials is to identify the maximum tolerated dose (MTD) among a set of dose levels. The MTD is defined as the highest dose that can be administered to patients with an acceptable (target) level of toxicity. It is often the recommended Phase II dose (RP2D), assuming that higher doses are more effective. The primary safety endpoint is a binary indicator of dose-limiting toxicity (DLT) based on protocol-specific adverse event definitions. Sequential dose assignments have traditionally been guided by DLTs in cycle 1 (i.e., 28 days) of treatment. MTDs are the highest tolerated dose from cycle 1, even though patients are administered therapy for several cycles. This approach may be appropriate for chemotherapy, which generally causes DLTs to be observed early on in the treatment course.
One of the biggest challenges in Phase I clinical trials is identifying an appropriate RP2D when relevant toxicity events occur in later cycles of therapy. These are common occurrences in novel treatment strategies, such as targeted therapies and immune-oncology agents Weber et al. 2017), although late events are also common in other modalities such as radiation therapy (Normolle and Lawrence 2006). In a systematic review of 2084 patients treated on 54 Phase I trials of targeted therapies, 298 of the 599 patients (49.7%) who had a DLT developed their first DLTqualifying grade 3 or higher adverse event after the first cycle ). Additionally, a survey of 93 study investigators concluded that an overwhelming majority of experts favored accounting for DLTs observed beyond cycle 1 in dose allocation and the refinement of the RP2D Paoletti et al. (2014). Rodon (2014) supported these findings in an editorial.
As an alternative to using only early DLT data, designs can extend the DLT evaluation window to a longer period to allow late-onset DLTs to be counted in allocation decisions. However, this can lead to increased trial duration if completely observed DLT information for each participant is required before dosing decisions can be made. A logistical challenge arises if the DLT outcome is not captured soon enough, relative to the accrual rate. Several existing methods utilize data from patients that have been partially followed for DLT in the estimation of outcome probabilities, weighting each entered patient by a portion of the full DLT observation window for which they have been followed. Among the most commonly used designs is the Rolling 6 (R6) algorithm (Skolnik et al., 2008), even though its operating characteristics are inferior to those of model-based strategies (Zhao et al. 2011). Available model-based methods include the time-to-event continual reassessment method (TITE-CRM; Cheung and Chappell 2000); and its extensions (Braun 2006;Braun et al. 2003), the fractional CRM (Yin, Zheng, Xu, 2013), and the data-augmentation continual reassessment method (DA-CRM; Liu et al. 2013). These designs rely on the specification of a statistical model to sequentially update estimates for the DLT probabilities at each of the dose levels and assign participants to doses based on which of these estimates is closest to a target DLT rate that defines the MTD. Recently, a class of model-assisted designs (MAD; Lin and Yuan, 2020), including the TITE Bayesian Optimal Interval (TITE-BOIN; Yuan et al. 2018); method, has been proposed to offer simpler alternatives to the TITE-CRM with the aim of more frequent implementation in clinical practice. These designs use a model only to derive a pre-specified set of escalation and de-escalation rules similar to that of the 3 + 3 algorithm and estimate the MTD at the study conclusion. Unlike model-based designs, they do not adaptively fit a model based on accumulating DLT data across all dose levels after the accrual of each new cohort. Allocation decisions during the trial conduct are guided only by data observed 'locally' at the current dose and the pre-determined algorithmic rules.
In recent studies of the operating characteristics of adaptive dose-finding methods, extensions of the Conaway, Dunbar, and Peddada (CDP, 2004) method have performed well in more complex dosefinding settings, such as drug combinations and patient heterogeneity, when compared to both modelbased and model-assisted designs (Hirakawa et al. 2015;Conaway and Wages 2017;Wages et al. 2016). The CDP method is a model-based method that has not yet been adapted to the late-onset toxicity problem. We hypothesize that a TITE-CDP approach to late-onset toxicity will perform well in terms of the accuracy of MTD recommendation when compared to the TITE-CRM and TITE-BOIN methods. The proposed method offers late-onset toxicity designs that (1) have good statistical properties when compared to competing methods, (2) mathematical simplicity resulting in fast computation and straightforward execution, and (3) easily understood interpretations that can be explained to clinical colleagues. The proposed design framework combines the advantages of model-based and model-assisted designs. It relies on a similar set of simple pre-trial specifications to model-assisted methods. Yet, like model-based methods, it can share DLT information across all dose levels using order-restricted inference techniques (Robertson et al. 1988), increasing efficiency. The details of our design follow in Section 2.

Estimation
We propose a design framework that uses order-restricted statistical inference for smoothing estimated DLT probabilities across dose levels (Ivanova and Flournoy 2009;Leung and Wang 2001;Wages and Conaway 2018). We assume that the studied doses are to be administered over a prespecified DLT evaluation window τ. A DLT is defined as any adverse event meeting a protocol-specific DLT definition that occurs within the evaluation window. We are assuming that a patient who experiences a DLT at any point before the completion of the evaluation window is considered to reach the endpoint and is taken 'off study' at that point. Suppose that there are J dose levels d 1 ; . . . ; d J being studied. Let θ j ; j ¼ 1; . . . ; J denote the true probability of DLT at dose level J. Let x i denote the binary DLT outcome so that x i ¼ 1 if participant i experiences a DLT within the evaluation window ð0; τÞ and x i ¼ 0 if participant i does not experience a DLT. Denote the time to DLT for participants with x i ¼ 1 as t i where 0 � t i � τ. At any point in the trial, suppose that y j ¼ P n j i¼1 x i DLTs have been observed in n j participants who have been evaluated for toxicity at dose level J. To model θ j at each dose level, we assume a beta-binomial model where Beta ðα j ; β j Þ is a beta distribution with parameters α j and β j . When there is a potential for lateonset DLTs and/or fast accrual, the design challenge is that there will be participants for whom x i has not yet been observed at the time a dosing decision needs to be made for a newly accrued participant. Following the notation of (Lin and Yuan 2020) the observed data x i ; i ¼ 1; . . . ; n j , indicate whether participant i has experienced a DLT at decision time ( , then x i could be 0 or 1. Let δ i be an indicator variable for whether the DLT outcome x i has been determined (δ i ¼ 1) or is still pending (δ i ¼ 0) for participant i at dosing decision time for the next accrual. Let u i � τ denote the follow-up time for participant i at this time. The observed interim data at dose level j is D j ¼ fðx i ; δ i Þ; i ¼ 1; . . . ; n j g and the likelihood, derived by (Lin and Yuan 2020) is given by where ỹ j ¼ P n j i¼1 δ i x i is the number of participants who have experienced DLT at the time of the next dose assignment, m j ¼ P n j i¼1 δ i ð1 À x i Þ is the number of participants who have completed the DLT evaluation window τ without experiencing DLT, and w i is a weight indicating the amount of information participant i is contributing to the likelihood. If participant i has experienced a DLT at any time prior to the current decision time or they have completed the DLT evaluation window τ without experiencing DLT, then w i ¼ 1. If participant i has not experienced a DLT and they have not yet completed the DLT evaluation window (i.e. u i � τ), then w i is a function of the time participant i has been followed at the time of the next accrual. Lin and Yuan (2020) derived and evaluated an approximation to likelihood (1) in the development of the TITE-Keyboard method so that they could enumerate all possible escalation and de-escalation decisions prior to the beginning of the trial. The TITE-Keyboard method uses the approximate likelihood to compute the estimated DLT probability at the current dose level, which is the only dose level used in decision-making in their method. We examine the use of the exact formulation (1) in adaptively estimating DLT probabilities at all dose levels for guiding decision-making.
Based on gðθ j Þ ¼ Beta ðα j ; β j Þ, the posterior distribution of θ j is given by f ðθ j jD j Þ / LðD j j θ j Þ gðθ j Þ. Based on this distribution, the updated DLT probabilities are given by the posterior mean of θ j .
To impose monotonicity with respect to the dose-toxicity relationship and borrow information across dose levels, we apply the pool adjacent violators algorithm (PAVA; Robertson et al. 1988); to the posterior means b θ j , denoting the isotonic estimates by θ j . This algorithm replaces adjacent DLT probability estimates that violate the monotonicity assumption with their weighted average, where the weights are the current sample size at each dose level. We use the following PAVA implementation as described by (Berry et al. 2011). Let c ¼ ðc 1 ; . . . ; c J Þ be a set of indices that indicate the pooling of adjacent values. Any doses with matching indices c j are pooled.
Step 2: If V ¼ ;, then stop the iteration. Otherwise, select the first violator where the weights are the current sample size at each dose level.
Step 3: Set c i ;v; i 2 W and replace θ ;m w ; i 2 W. Repeat from Step 1. Based on interim DLT data that is available at the time a dosing decision is to be made, these DLT probability estimates can be used to make allocation decisions according to the following algorithm described by Conaway, Dunbar, and Peddada (CDP;2004), which has been studied extensively in previous work in the early-onset DLT setting Conaway and Wages 2017;Wages and Fadul 2020).

Allocation
Our proposed design can be considered a time-to-event extension of the CDP method, so we designate it as the TITE-CDP method. Let A denote the set of doses that have been tried thus far in the trial such that A ¼ fd j : n j > 0g.
(2) Let l min ¼ min d j 2A L j ðθ j ; θ � Þ, and let H be the set of doses with losses equal to the minimum observed loss so that H ¼ fd j : L j ðθ j ; θ � Þ ¼ l min g.
(3) If H contains more than one dose, then we choose from among them according to the rules: (4) If the suggested dose level has an estimated DLT probability that is less than the target, then the next highest dose level will be chosen if it has not yet been tried. We adaptively update the likelihood function (1) to produce new estimates of the DLT probabilities across all doses. This sequential estimate-allocate process continues until a maximum sample size has been enrolled.
The recommended dose at study conclusion is the estimated MTD. If at any time in the accrual process, d 1 is deemed too toxic, then the trial stops for safety and no dose is recommended as the MTD. Based on the posterior distribution f ðθ 1 j D 1 Þ and the pre-specified target DLT rate θ � , we calculate the posterior probability that d 1 is too toxic and compare this probability to an upper probability cutoff. If Prðθ 1 > θ � j D 1 Þ > p T we say that d 1 is not a safe dose and the trial terminates early for safety. Appropriate cutoff values p T are typically in the range from 0:80 À 0:95 and can be tuned via simulation studies (Yuan et al. 2017).

Prior specifications
The beta prior distributions are used as smoothing parameters in the isotonic estimation. The elicitation of suitable priors rely upon practical guidelines for Bayesian adaptive clinical trial design (Thall and Simon 1994). We ask investigators to specify the expected value of the DLT probability θ j at each dose level and an upper bound v j such that they are 95% certain that the DLT probability will not exceed v j . Based on the expected value of θ j and a 95% upper limit v j on the DLT probability, the equations are solved to obtain prior specifications for α j +β j . In the absence of prior information, a practical prior specification can be acquired by setting the prior mean equal to the target DLT rate θ � and setting the 95% upper limit v j equal to 2 � θ � at each dose level. This prior specification is recommended to avoid the problem of rigidity (Cheung 2002) in which allocation can become confined to a sub-optimal dose level regardless of the ensuing observed data. With a smaller ESS and low target DLT rate θ � , early DLTs can heavily impact the dose assignment algorithm, potentially preventing the design from ever returning to doses with only 1/1 DLTs observed, for example. Specifically, suppose we are targeting a DLT rate of 0.20. A Beta(0.4, 1.6) prior (ESS = 2) would yield a posterior mean equal to y j þ0:4 n j þ1:6þ0:4 . Suppose that the first participant on dose level 1 does not experience a DLT so that y 1 ¼ 0; n 1 ¼ 1 and the posterior mean is θ 1 ¼ 0þ0:4 1þ1:6þ0:4 ¼ 0:4 3 ¼ 0:13. According to the dose assignment algorithm described in the paper, the trial would escalate to dose level 2. If a DLT is observed on the first participant accrued to dose level 2 with this prior specification (i.e., y 2 ¼ n 2 ¼ 1), then the posterior mean is θ 2 ¼ 1þ0:4 1þ1:6þ0:4 ¼ 1:4 3 ¼ 0:47. At this point, the trial will return to dose level 1, and the estimate at dose level 2 will remain 0.47, meaning jθ 2 À 0:20j ¼ 0:27 because the data collected at dose level 1 below will not affect the estimation of b θ 2 (unless dose level 1 becomes very toxic, at which point the trial would terminate for safety). Consequently, jθ 1 À 0:20j < 0:27 and the trial will stay at dose level 1 indefinitely. A similar illustration could be provided if we encountered 2 DLTs on the first 2 participants accrued to dose level 2. Therefore, in general we want the prior to satisfy the following conditions at each dose level j to avoid rigidity.
This approach to prior elicitation has been studied extensively in a variety of settings for early-onset DLTs Conaway and Wages 2017;Wages and Fadul 2020), and demonstrated robust operating characteristics over a broad range of scenarios.

Weight functions
Like other methods in the area, the modeling approach above relies upon specification of weights w i to incorporate the time-to-event outcomes of each participant. The weight function represents the amount of information available from participant i when a new participant is to be accrued to the study. There are a variety of specifications that can be applied within our modeling framework. First, we employed a simple linear scheme (Cheung and Chappell 2000) that assigns weight proportional to the length of follow-up u so that w i ¼ minðu=τ; 1Þ. This simple weighting scheme yields exceptionally robust performance, a fact also reported by Cheung and Chappell (2000) in the original TITE-CRM paper. Based on their results, and many others, the linear weighting scheme is recommended as the default for generalized use in practice. We also evaluated an adaptive weight function (Cheung and Chappell 2000) that adjusts for the number of observed DLTs, so that where z is the total number of DLT outcomes, the times-to-DLT u ðkÞ are ordered such that 0;u ð0Þ < u ð1Þ � � � � � u ðzÞ < u ðzþ1Þ ;τ, and κðuÞ ¼ maxfk 2 ½0; z� : u � t ðkÞ g. The operating characteristics of the proposed methodology using adaptive weights are provided in Supplemental Material.

Illustration using real data
As an illustration, we consider a dose finding study of five escalating doses of bortezomib in combination with standard chemotherapy as the first-line treatment for lymphoma patients from a published study (Leonard et al. 2005). DLTs were defined by grade 3 or more severe neuropathy, low platelet count, and symptomatic non-neurologic or non-hematologic toxicity. The goal of the study was to locate the MTD, defined as the dose with an estimated DLT rate closest to the target rate of θ � ¼ 25%. Each patient may receive up to six 21-day cycles of treatment for a total DLT assessment window of 126 days. If only DLTs observed in the first cycle are counted, the tolerability of bortezomib could potentially be underestimated. Conversely, it is not feasible to wait until each patient has been observed for 126 days before making sequential dosing decisions. The data in the Table 1 are taken from Figure 1.1 in Cheung (2011) and provide the actual outcomes for the first eight accrued patients in the study. To illustrate the model estimation, suppose we have specified a Betað2:34; 7:02Þ prior at all dose levels based on the 25% target rate. The study began on dose level 3, and the first DLT is observed on dose level 4 of patient 7 at day 43. At this point in the study, we have complete data on three participants without DLT (#1, #2, and #3) and on one participant who experienced DLT (#7). We have four participants who have not experienced a DLT but have yet to complete the entire DLT evaluation window. These are partially observed DLT outcomes, and for ease of illustration, we will use a linear function to assign weights. The data available at dose level 3 is n 3 ¼ 4;ỹ 3 ¼ 0; m 3 ¼ 3; w i3 ¼ ð1; 1; 1; 123=126Þ, and δ i3 ¼ ð1; 1; 1; 0Þ. Using these data and our modeling approach above, we can calculate estimates for the DLT probability at dose level 3. Based on (2), the posterior mean of θ 3 is b θ 3 ¼ 0:175. The data available at dose level 4 is n 4 ¼ 3;ỹ 4 ¼ 1; m 4 ¼ 0; w i4 ¼ ð100=126; 53=126; 1Þ, and δ i3 ¼ ð0; 0; 1Þ. The posterior mean of θ 4 is b θ 4 ¼ 0:289. The data available at dose level 5 is n 5 ¼ 1;

Summary of operating characteristics
We examined the operating characteristics of the proposed approach, the TITE-CRM, and the TITE-BOIN method through simulation studies. Operating characteristics for the TITE-CRM were simulated with the dfcrm package in R. We considered six study dose levels over 50 scenarios of hypothesized DLT probabilities for each dose level. The scenarios were randomly generated from the (Conaway and Petroni 2019) family of dose-toxicity curves and provide a variety of locations for the hypothesized MTD, as well as a mix of steep and shallow dose-toxicity curves (Figure 1). The scenarios were generated in a way that ensures that the number of scenarios with MTD located at each dose level are approximately the same. The prior distribution for our proposed method was Beta(2.07, 4.83) at each dose level, chosen according to the specifications described above. For TITE-CRM, the set of skeleton values of the working model for the probability of a DLT over the evaluation window were specified as ð0:05; 0:10; 0:20; 0:30; 0:50; 0:70Þ, which were chosen in the original TITE-CRM paper (Cheung and Chappell 2000). The prior distribution used on the model parameter was Nð0; 1:34Þ, which is common to Bayesian CRM designs and a default distribution utilized in available CRM software (Cheung 2011). The default settings were used for the TITE-BOIN method, and simulation results for the method were generated using R code available at https://github.com/ ruitaolin/TITE-MAD. The maximum sample size for each trial was n ¼ 36 patients, accrued in cohorts of size 1. The overall DLT evaluation window was 3 months with patients accruing at a rate of 3 patients per month according to a fixed process. We assumed that the time to toxicity followed a uniform distribution as in (Cheung and Chappell 2000). We first determine whether a patient has a DLT response. If so, we generate a failure time uniformly on the interval ð0; 3Þ. The target DLT rate that defines the MTD for the study is θ � ¼ 30%. Over 1000 simulated trials for each scenario, the percentage of trials in which each dose was recommended as the MTD and the number of participants treated at the true MTD during each trial was tabulated. The percentage of correct selection (PCS) is the percentage of simulated trials in which the correct MTD was selected by the design under the hypothesized scenario. The operating characteristics for each method, summarized over 50 scenarios, are reported in Figure 2. The proposed approach is demonstrating good performance in terms of the probability of selecting the correct MTD. The mean PCS over all scenarios is 44.3% for the proposed method, 43.7% for the TITE-CRM, and 39.3% for the TITE-BOIN. These results are consistent with the 10,000 simulated trials for each of 50,000 randomly generated dose-toxicity curves under various design conditions conducted by (Yuan et al. 2018). This study demonstrated that, on average, the correct dose is selected as the MTD 38.1% of the time using the Rolling 6 algorithm, 46.0% of the time using the TITE-CRM, and 44.2% of the time using the TITE-BOIN method. While not included in Figure 2 for the sake of brevity, we also simulated the TITE-IR (Chapple et al. 2019) method over the same 50 curves using the titeIR package in R. The average PCS for the TITE-IR method was 41.4%, indicating that the proposed methodology provides an improvement over this approach. In terms of the average number of participants treated at the MTD, the TITE-CRM yielded an average of 11.2 participants, the TITE-BOIN yielded an average of 10.0 participants, and the TITE-CDP method yielded an average of 12.3 participants. Accuracy index is a summary measure used to assess performance of a method by incorporating information on dose selection at all dose levels into one number. This measure penalizes  (2019))class for a target DLT rate of 30%.
the method for selecting doses further from the true MTD. For J dose levels, the accuracy index, as described by Cheung (2011) in a trial of n participants is given by The distance measure ρ j is a description of the true dose-toxicity curve, with the denominator of A n representing the extent to which the DLT probabilities deviate from the target rate. Therefore, steeper curves will have larger values of P J j¼1 ρ j than will flatter curves. In order to assess method performance for participant allocation, the accuracy index can be used by substituting the probability of selecting dose j above with the proportion of participants allocated to dose j. The maximum value of A n is 1 with larger values indicating that the method has high accuracy. Over the 50 scenarios considered, the average accuracy index for dose selection is 0.547 for TITE-BOIN, 0.594 for TITE-CRM, and 0.614 for TITE-CDP. The average accuracy index for participant allocation is 0.246 for TITE-BOIN, 0.390 for TITE-CRM, and 0.358 for TITE-CDP. While these results evaluate the methods under a particular set of assumptions, Figure 2 provides compelling evidence to support the impact of the method described in this paper.

Application to a phase I trial in prostate cancer
We apply our proposed method to a Phase I/II trial (Wages et al. 2021) designed to evaluate hypofractionated (HypoFX) radiation therapy (RT) schedules for daily salvage prostate bed RT to identify the shortest schedule with acceptable gastrointestinal (GI) and genitourinary (GU) toxicity. The trial studied four dose levels (DL) of increasingly smaller number of fractions (in Gy), meaning that 'escalation' to higher dose level meant accrual to a shorter dose-fractionation schedule. The primary objective of Phase I was to determine the shortest dose-fractionation schedule (i.e., MTD) with acceptable toxicity for evaluation in Phase II. The primary endpoint of Phase I was incidence of grade � 3 acute GU or GI toxicity assessed by CTCAE v5.0. A DLT was defined as any treatment-related grade � 3 GU or GI toxicity, within a DLT evaluation window of τ ¼ 3 months of treatment. The MTD is defined as the highest dose (i.e. shortest fractionation schedule) with a DLT rate closest to the target DLT rate of θ � ¼ 20%. The upper probability cutoff used to define the safety stopping rule was p T ¼ 0:90. The study accrued participants in cohorts of size 1 and the starting dose level was dose level 1 [treatment course 2:5 Gy � 26 ¼ 65 Gy (5.2 weeks)].
We generated the operating characteristics for the TITE-CDP and TITE-BOIN methods over 10 scenarios of hypothesized DLT probabilities for each dose level. We did not include the TITE-CRM in this simulation study because we wanted to evaluate the impact of safety stopping rules on the operating characteristics of the methods, and the dfcrm package does not have the ability to incorporate such rules. The hypothesized scenarios were randomly generated from the Conaway and Petroni (2019) family of dose-toxicity curves and provide a variety of locations for the hypothesized MTD, as well as a mix of steep and shallow dose-toxicity curves (Figure 3). For the proposed method, we used a Beta(2.6, 10.4) at each dose level for the prior distribution, chosen according to the specifications described above. In Supplemental Material, we assessed the operating characteristics of the TITE-CDP under three alternative prior specifications settings. The three sets of specifications were computed by adjusting the values of v j and the upper limit confidence level in the calculations provided in Section 2.3. Overall, the percentage of MTD selection results is very close to those reported in Table 2, demonstrating robustness of the TITE-CDP to reasonable prior specifications chosen according to algorithm in Section 2.3. The default settings were used for the TITE-BOIN method, and simulation results for the method were generated using R code available at https://github.com/ruitaolin/TITE-MAD. The maximum sample size for each trial was n ¼ 24 patients, accrued in cohorts of size 1. The overall DLT evaluation window was 3 months with patients accruing at a rate of 2 patients per month according to a fixed process. We assumed that the time to toxicity followed a uniform distribution as in (Cheung 2002). Over 1000 simulated trials for each scenario, the percentage of trials in which each dose was recommended as the MTD and the number of participants treated at each dose level during each trial was tabulated. The results are reported in Tables 2 and 3. As a sensitivity analysis, we considered alternative settings in which time-to-DLT is sampled from a Weibull distribution, with 50% of DLTs occurring in the second half of the assessment window. The results of a simulation study comparing TITE-CDP to TITE-BOIN under this time-to-DLT distribution are provided in Supplemental Material. In Scenario 1, the TITE-CDP method outperforms the TITE-BOIN in terms of the percentage of correctly selecting the true MTD (dose level 1) (TITE-CDP 69.4% vs. TITE-BOIN 59.0%, respectively). The two methods treat a similar number of participants on average at this dose (13.3 vs. 13.1), while the TITE-BOIN method stops early for safety with no MTD recommendation in a higher percentage of trials than the TITE-CDP method (26.6% vs. 15.8%, respectively). In Scenario 2, the methods are similar in terms of percentage of correct MTD selection with slightly better performance for the TITE-BOIN method (48.7% vs. 50.3%), while the TITE-BOIN method treats a higher number of participants at the true MTD (12.0 vs. 9.2). In Scenario 3, the TITE-CDP method recommends the true MTD in a slightly higher percentage of simulated trials than the TITE-BOIN method (56.2% vs. 53.4%, respectively), and the two methods treat a similar number of participants on average at this dose (8.7 vs. 8.6). In Scenario 4, the performance of the two competing approaches is very close. Both methods correctly recommend the dose level 3 as the MTD in approximately 70% of trials. A similar number of participants on average are treated at the MTD in Scenario 4 (10.9 vs. 10.8). In Scenario 5, the TITE-CDP method outperforms Table 2. MTD selection percentages for the proposed TITE-CDP and TITE-BOIN over 10 representative dose-toxicity scenarios generated from the Conaway and Petroni (2019) family of curves. The target DLT rate that defines the MTD is 0.20. The underlying distribution of the time to toxicity outcomes is uniform. For TITE-BOIN, the underlying distribution of patient arrival time is uniform. For TITE-CDP, inter-patient arrival is a fixed process. The accrual rate is two patients per month. the TITE-BOIN in terms of the percentage of correctly selecting the true MTD (dose level 3) (TITE-CDP 54.8% vs. TITE-BOIN 49.8%, respectively), and the TITE-CDP treats a higher number of participants on average at this dose (8.9 vs. 7.4). In Scenarios 6-8, the TITE-CDP method displays better operating characteristics than the TITE-BOIN method by recommending the true MTD in a higher percentage of trials, with gains ranging from approximately 6-10%, while treating approximately 2.3 to 4.5 more participants on average at the true MTD. Finally, both methods correctly stop the trial early for safety without recommending an MTD in scenarios with high toxicity (Scenarios 9 and 10). Across scenarios in which there is a true MTD in the dose range (Scenarios 1-8), the average percentage of correct MTD selection is 58.3% for the TITE-CDP and 52.8% for the TITE-BOIN. The average number of patients treated at the true MTD is 10.5 for the TITE-CDP and 9.4 for the TITE-BOIN. Overall, Tables 2 and 3 indicate that the TITE-CDP method is a practical alternative for designing and conducting Phase I dose-finding trials that must account for late-onset DLTs.

Conclusions
This article has presented a novel adaptive strategy that accounts for late-onset DLTs in early-phase trials. We illustrated the method on actual data from a Phase I trial of bortezomib in lymphoma patients. We applied it to a Phase I trial of hypofractionated post-prostatectomy radiotherapy for prostate cancer. Simulation studies were performed to justify and evaluate the performance of the design characteristics. The simulation results in Tables 2 and 3 and Figure 2 demonstrate the method's ability to effectively recommend desirable doses, defined by acceptable toxicity, in a high percentage of trials with manageable sample sizes. We also studied the design's operating characteristics described in this article in a simulation study of six study dose levels over 500 randomly generated dose-toxicity curves. We compared the design to the TITE-CRM and TITE-BOIN methods, and operating characteristics were favorable for the proposed approach in the scenarios considered. Moreover, the proposed method is simple, which should facilitate its use in practice. Software in the form of R code for simulation of design operating characteristics is available at the request of the first author. At present, late-onset DLTs are not often used in guiding initial dose allocation within the design nor in the dose recommendation after the study. Frequent use of variants of the 3 + 3 algorithm has continued into studies of new agents with the potential for pertinent late toxicities. In the presence of these late events, DLTs are undercounted, and the resulting recommended dose is weighted in favor of higher doses that appear safer than they are (Lee et al. 2016;Roda et al. 2016). Consequently, recommended doses result in a higher than anticipated toxicity rate. The FDA registers many novel therapies at doses different from those identified by Phase I trials (Roda et al. 2016). Many patients are treated at overly toxic or subtherapeutic levels throughout the development process. Early toxicity does not provide a complete representation of tolerability. New agents are often administered over extended periods, resulting in DLTs occurring outside of a short-term evaluation window, leading to premature treatment discontinuation. In most settings, such as metastatic disease, the goal is to continue the drug until progression or unacceptable toxicity, so there is now a much greater incentive to give more tolerable doses so that patients can continue therapy longer.
New agents in oncology have a high failure rate (approximately 95%) in the drug development process (Printz 2015). Even the most highly effective cancer therapy, with a well-understood mechanism of action, cannot benefit patients unless that agent successfully navigates the drug development process. Work by (Conaway and Petroni 2019) shows that an emphasis on more efficient designs for Phase I trials has a much more significant impact on the likelihood of success for an agent than the size of the phase III trial, illustrating the importance of selecting a well-performing dose-finding design. More effective and practical dose-finding methods are needed, and, in this paper, we have provided a simple alternative with good operating characteristics. The changing landscape of oncology drug development, with an increasing number of targeted and immunotherapy agents being tested over multiple cycles, has significantly departed from the historical paradigm of early phase clinical trial design. Hence, there is a growing need to implement new study designs that address the clinical realities and statistical considerations arising from new treatment paradigms.