Interim Futility Analysis for Longitudinal Data With Adaptive Timing and Error Rate Preservation

ABSTRACT There are many clinical trials where longitudinal endpoints are used and the primary endpoint is quite often either based on the rate of change or change from baseline at a specific long-term follow-up time point. When such trials are monitored, it is possible that interim futility analyses will be planned such that the trials can be terminated early if the treatment does not induce any benefit to the patients. For such trials, subjects with incomplete follow-up pose challenges in the timing, analysis, and decision making at the interim futility look. We propose an efficient interim futility analysis based on the slope of a linear regression, which incorporates all the data available at the interim analysis. Our approach has the added advantage of providing a data-driven decision on triggering the interim analysis when sufficient information has been collected such that the desired properties for the established futility rule are guaranteed. The construction of interim futility rules and the timing of the interim analysis are discussed and the method is illustrated with an example involving a placebo-controlled comparison of longitudinal proteinuria measurements. Supplementary materials for this article are available online.


Introduction
There are many randomized clinical trials where repeated measurements of a continuous endpoint are taken longitudinally over time. Repeated measurements are used for many reasons including to monitor the trial, to allow for interim analyses, to determine the onset of treatment effect, to account for possible measurement errors, and to estimate the pattern of response over time as discussed by Shih and Quan (1999). When patients enter the study in a sequential manner (e.g., as per a Poisson process) and a response variable is measured for each patient at successive visits, the measurements from the same patients are no longer statistically independent.
In many clinical trials, the primary endpoint is the time until the occurrence of an event, such as the remission of disease, which is defined by the level of a marker measurement. For such trials, time-to-event analysis allows information about the endpoint to be used for the subjects who have been followed only for a short period and have no event yet. However, time-toevent analysis may not be appropriate in all contexts, in particular where the treatment effect involves reversible events, where the durability of treatment response is important and it is quite likely that the primary endpoint would be defined at a specific long-term follow-up point. In such cases, only subjects who complete the prespecified long-term follow-up provide information for the primary endpoint. If an interim analysis is based only on such a primary endpoints, the analysis could potentially make use of only a small proportion of accrued subjects. Marschner and Becker (2001) discussed a method to incorporate formally short-term information into interim analyses with reversible binary endpoints defined at a long-term follow-up time by making use of the fact that the short-and long-term responses may be statistically associated, permitting an increase in the efficiency of the analyses. Galbraith and Marschner (2003) proposed an extension to this method suitable for continuous endpoints, based on the multivariate normal model in which planned visit is treated as a factor and the mean response vector at each visit is modeled such that all the data collected can contribute to the interim analysis.
To more efficiently use the longitudinal data, the analysis of such endpoints is often conducted using linear mixed-effects models as proposed in Laird and Ware (1982). The inclusion of interim analysis adds another layer of complexity to the design and interpretation of such trials. Lee and Demets (1991) investigated a group sequential method for comparing the rates of change between two treatment groups with repeated measurement data to control the overall Type I error rate. Kittelson, Sharples, and Emerson (2005) proposed using within-subject summary statistics such as average, slope, and the area under the curve in a group sequential manner.
In this article, we investigate the cases where an interim futility analysis is planned with repeated measurement data collected for the purpose of comparing the slopes (rate of changes in the endpoint) between a new treatment and a control group. The main endpoint is the slope difference within the study period, but it is desirable for the sponsor to monitor trials with early data to make sure the new treatment works. If it appears that the new therapy is not effective, the trial can be terminated early attaining resource saving. Otherwise the trial will continue to the end. No early stopping for efficacy is planned since the treatment with the length of the whole study period is desired. In additional, the interim analysis may support informative decision making for subsequent trials.
The challenges in this type of data include: Subjects' entry into the trial is staggered so at the interim analysis each subject may have been observed on a different number of occasions; the planned visits could be unequally spaced; missing data may be present. In addition, at the design stage, it is challenging to construct the futility decision rules with appropriate properties where the probability of continuing the study under the null or alternative hypotheses at the interim can be controlled at desired levels. Another challenge is the timing of the futility interim analysis. An interim timing such as "when x% of the subjects complete n month visit" will not work well owing to the staggered entry of patients and depending on the enrollment rate, the data available at such interim analysis may be substantially different from what was hypothesized and so the operating characteristics of the decision rule are not guaranteed to match those specified by the protocol.
Unlike the methods using within-subject summary statistics as proposed in Kittelson, Sharples, and Emerson (2005), a method using both within-and between-subject information is proposed to help construct an interim futility decision rule with desired false continuation and false stopping properties. The proposed method also leads to an enrollment-driven mechanism to trigger the interim analysis when sufficient information has been collected such that the desired properties for the established futility rule is guaranteed.
The estimation of the slopes and slope difference in the linear mixed effects model is reviewed in Section 2. In Section 3, the construction of interim futility rules with desired properties is discussed. The mechanism of triggering the interim analysis is presented in Section 4. The method is illustrated with a simulation example from a longitudinal clinical trial in Section 5.

Results from the Linear Mixed Effects Model
In this section, we present the analytical result about the variance of the slopes in the linear mixed effects model, which is required by our proposed procedure.
Let y i = (y i1 , . . . , y im i ) be the response vector from subject i, where m i is the actual number of visits completed for the subject. Similarly, let x i = (x i1 , . . . , x im i ) be the vector of visit time from subject i. Letx i andȳ i be the mean values of visit time and response for subject i, respectively. Note that at the interim analysis, m i 's may be different for different i while at the end of the study, all m i 's are the same if there is no missing data. Assuming a compound symmetric variance-covariance structure, for each individual i(= 1, . . . , n) and visit j(= 1, . . . , m i ), where β is the slope, s i is the random subject effect distributed as N(0, σ 2 s ), and e i j is the random noise (within-subject) distributed as N(0, σ 2 ). Let σ 2 s /σ 2 = γ . The covariance matrix of y i then is where I m i is the identity matrix with dimension m i and J m i is the matrix of 1's with dimension m i .
The covariance matrix of all response y becomes Let x be the design matrix (of fixed effects) linking β = (α, β ) to y. Therefore, the generalized least-square estimate of β becomesβ Typically, in the data analysis stage the parameters σ and σ s are estimated via maximum likelihood or restricted maximum likelihood methods. These estimates are then substituted in place of the true parameter values, to compute estimates of β and V(β). For design purpose, V(β) is critical since if for given σ and σ s , V(β) (or more specifically, the variance of the slope V (β )) could be determined, and a target effect size (e.g., the desired difference in slopes) be specified, then any type of probability calculation (e.g., false continuation, false stopping) can be carried out for any decision rule.
Going through the matrix algebra, the Fisher's information of the slope (the reciprocal of the variance) in the sample turns out to be This information comprises two components: the withinsubject (β w ) and between-subject (β b ) estimators of the slope. For subject i, if we fit a linear regression with only data from this subject, the information for the slope is The overall within-subject estimate of the slope β w is the inverse variance weighted average of β i 's and therefore the information for β w becomes The between-subject estimate of the slope β b is calculated by fitting a linear regression with the mean data (x i andȳ i ) from each subject. The variance ofȳ i is Using the same formula as (1), the information for the between-subject estimate of the slope β b is (4) The final estimate of slope β comes from the inverse variance weighted combination of β w and β b and therefore the final Fisher's information for the slope is the sum of the information for both between-and within-subject slopes as in (2).
Note that, if each subject completes exactly the same planned visits, (4) becomes zero sincex i 's are all the same. However at the interim analysis, depending on the design, the enrollment patterns, data variability, and the interim timing, the contribution from the between subject variability could be relatively significant and the inclusion of such information will improve the efficiency as demonstrated by the simulation in Section 5.

Construction of Interim Futility Rules
Let β A and β P be the slopes of active treatment and placebo in a parallel design and V (β A ) and V (β P ) be their variances, whose values are dependent on the number of subjects n and the number of completed visits of each subject m i with i = 1, . . . , n. Let Letθ be the observed θ at the interim analysis. For any decision rule in the form of "continue the study ifθ > δ; otherwise stop, " the probability of "continuing the study" is where θ t is the assumed difference of the slopes. When θ t is set to the value at the null hypothesis H 0 , say θ 0 (often zero), (5) gives the probability of a false continuation (FC) declaration at the interim, as where z FC is the upper FC percentile of the standard normal distribution. When θ t takes the value at the alternative hypothesis H 1 , say θ 1 , the probability of a false stopping (FS) decision, that is, the probability of making the Type II error, at the interim can be derived from (5) as where z FS is the upper FS percentile of the standard normal distribution. Therefore, for any given θ 0 , θ 1 , target FC, and FS rates, the futility boundary δ and required V (θ ) at the interim (denoted as V I ) can be derived as This is directly analogous to the traditional sample size/power calculation to compare two population means. In that setting, however, V is determined solely by sample size n for given data variance, while in our setting there is no such simple relationship between V and n. In adaptive design, the ratio of interim sample size to the total sample size is used to measure the information time (r) at the interim analysis. In our case, the ratio of Fisher's information of slope difference at the interim analysis to the Fisher's information with full sample size can be similarly used as the measurement of the information time.

Monitoring of Interim Analysis Timing
Once the cutoff value of V (θ ) for the given FC and FS is identified (as V I ), the interim futility timing can be determined by continuously updating V (θ ) using Equation (2) with available x i j and the interim analysis is triggered when V (θ ) ≤ V I . Note that with σ and σ s (or γ ), or with estimates of them from the data, V (θ ) is completely determined by the visit time available from each subject (x i j ). Therefore, with unblinded monitoring, the calculation of V (θ ) is straightforward based on (2). For blinded monitoring, with sample sizes n A and n P , the withinsubject component of V (β A ) and V (β P ) can be estimated using the following recipe: 1. Calculate the total within-subject sum of squares 2. Split the total within-subject sum of squares into two groups according to the randomization ratio as WSS l = WSS T n l n A + n P with l = A or P.

Estimate the within-subject components of V (β A ) and
V (β P ) as To estimate the between subject components for each treatment arm, consider a matrix with elements of (x i −x k ) 2 and dimension n A0 + n P0 , where n A0 and n P0 are the numbers of subjects providing data at the interim analysis. The total number of off diagonal elements is (n A0 + n P0 )(n A0 + n P0 − 1). Therefore, the between subject components of V (β A ) and V (β P ) can be estimated as 1. Calculate the total between subject sum of squares 2. Calculate the between subject sum of squares for the two groups according to the randomization ratio as BSS l = BSS T n l0 (n l0 − 1) (n A0 + n P0 )(n A0 + n P0 − 1) ≈ BSS T n l0 n A0 + n P0 2 where l = A or P.

Estimate the between subject components of V (β A ) and
V (β P ) as With response data y, σ and σ s can also be reestimated in a blinded manner by fitting a linear mixed effect model with a common slope. This will give study team a more realistic estimate for the timing of the interim futility analysis if the confidence is low on the values of σ and σ s used at the design stage.
The timing of the interim futility analysis in this approach is adaptive to the enrollment rate. In other words, the timing will be different for different enrollment patterns while the required FC and FS probabilities are always maintained.
The result in (2) can be extended to any variance-covariance structure although it may be difficult to derive a closed analytical form. Statistical software such as SAS or R could be used to help with the calculation.

Simulations
We use a design of a double-blind, placebo-controlled, two-year trial in idiopathic membranous nephropathy (iMN) patients to demonstrate the use of our methodology. Remission rates after 2 years of treatment, where remission is defined as reduction in proteinuria below a prespecified level, is the primary endpoint. The study was designed to have a total of n = 94 patients according to D' Agostino, Chase, and Belanger (1988), which will provide 90% power with 2.5% two-sided Type I error rate in the final analysis, assuming 35% and 70% remission rates for placebo and active treatment, respectively. The 2.5% two-sided Type I error rate was a result of negotiation with regulatory agencies due to the rareness of the disease and only one pivotal clinical trial is considered feasible.
The sponsor would like to make an interim futility decision as early as possible during the trial, to protect patients from a potentially ineffective therapy. But basing an interim analysis on remission rate and waiting until, for example, 50% of the patients have completed the 2 year treatment would likely lead to an interim timing so late in terms of total sample size as to be ineffectual. It was therefore decided to base the futility assessment in the rate of change in proteinuria.
In Fervenza et al. (2010), historical proteinuria data are available on 20 rituxumab-treated patients. Proteinuria was measured over 2 years (at 0, 3, 6, 9, 12, 18, and 24 months) and showed a linear trend in log scale. A mixed effect linear regression model based on the logarithm transformed proteinuria data led to estimates ofσ 2 = 0.6356 andσ 2 s = 0.5464. The mean baseline proteinuria value was e 2.107 g/24 h. These results were used in the design and simulation of our trial.
Based on the baseline level and data variability in the historical data, a slope of −0.0168/week on the log scale of proteinuria would lead to 70% remission rate by year 2; a slope of −0.0078/week will lead to 35% remission. Therefore, the target difference in proteinuria reduction rate θ 0 (control − active) was set to 0.009/week in the log scale. The study team decided that a 5% FS rate should be maintained if the true effect is 0.009/week. This low FS minimizes the power loss in the final analysis at the end of the study. The team were prepared to be flexible with regard to the interim FC rate.
In the simulation, it was assumed that β P = −0.0078, β A = −0.0168, and σ 2 = 0.6356. To investigate the impact of incorporating between subject component simulations were performed with the four combinations of parameter values as listed below. Case 1: The random subject effect σ 2 s = 0.5464 while σ s is ignored in the calculation of interim analysis timing as in Kittelson, Sharples, and Emerson (2005). Case 2: The random subject effect σ 2 s = 0.5464 and it is taken into account in the calculation of interim analysis timing. Case 3: The random subject effect σ 2 s = 0.2732 while σ s is ignored in the calculation of interim analysis timing as in Kittelson, Sharples, and Emerson (2005). Case 4: The random subject effect σ 2 s = 0.2732 and it is taken into account in the calculation of interim analysis timing. To further demonstrate that the timing of the interim futility analysis in our approach is adaptive to the enrollment rate, simulations were conducted with the following three recruitment models.

Model 1 (medium):
Recruitment is ramped-up over 6 weeks and then remains at 2 patients/week throughout the rest of the study. Model 2 (fast): Recruitment is ramped-up immediately to 2.5 patients/week and stays at 2.5 patients/week for 14 weeks and then ramped-down to 1 patient/week over a 80-week period. Model 3 (slow):Recruitment is constant throughout the study as 1 patient/week.
To study the impact of the number of visits in the study design, simulations were carried out for a 18-visit design, corresponding to 18 planned visits at 0,4,8,12,16,20,24,28,36,44,52,60,68,76,84,92,100, and 104 weeks and for a 9-visit design, corresponding to 9 planned visits at 0, 12, 28, 44, 52, 68, 80, 92, and 104 weeks. For any fixed values of FC, FS, and the target effect θ 1 , the critical value δ and the standard error for the slope difference √ V I can be solved using (8). In addition, the information time can be calculated based on Fisher's information at the interim (reciprocal of V I ) and the Fisher's information with full data at the end of the study. Some examples can be found in Table 1, where the information time at the interim is denoted as r 18 and r 9 for the design with 18 and 9 visits, respectively. Note that in this specific example, the information time at interim is relatively small since the sample size of the study was based on binary primary  endpoint and the study was overall powered for detecting slope difference.
For each scenario, 2000 trials were simulated and the results were summarized in Table 2 for the case where the required FC= 75% and FS= 5% at the interim futility analysis.
The simulation results confirm that, while as expected the timing of interim analysis changed according to different recruitment models, the desired properties of FC and FS were maintained. The overall power for the final analysis based on remission rate is slightly less than 90% due to the interim futility analysis. As expected, the faster the enrollment, the earlier the interim analysis will be triggered. When there are more visits (e.g., 18 visit vs. 9 visit), the information accumulates faster given the same enrollment rate and the interim analysis can occur much earlier. Compared to Kittelson, Sharples, and Emerson (2005) (case 1 and case 3), the contribution of including the between subject information component depends on many factors such as the number of planned visits, the magnitude of between subject variability and the enrollment speed. When the enrollment is fast, the interim will occur earlier and at the time of interim analysis the average visit timex i 's in (2) are more similar compared to the slow enrollment and therefore the contribution from the between subject component is smaller with fast enrollment. In our example, simulation shows that with a 18visit design, inclusion of between subject component (case 2 vs. case 1) can move the interim timing 1.6 weeks earlier with fast enrollment, and 1.8 weeks earlier with slow enrollment. The difference is bigger with the 9-visit design where the interim timing is 1.7 weeks with fast enrollment and 2.5 weeks with slow enrollment. This is because with less visits the information proportion from the between subject component is bigger compared to the design with more planned visits. When the between subject variability becomes smaller (case 3 and case 4), the information from the between subject component will be relatively bigger and the benefit of including such information is more obvious, moving interim timing 2.5 to 3.6 weeks earlier as shown from the simulation. Simulations with other FC values (25% and 50%) show similar trend.

Discussion
We have proposed a method for enrollment-driven futility analysis, based on the linear mixed effect model, to compare the rates of change in a longitudinally measured continuous endpoint between two treatment groups.
Instead of using only the within-subject information as proposed in current literature, our method fully uses all the information available at the interim analysis, even when data are unbalanced between treatment groups and measured at different points in time due to the staggered entry of subjects.
The proposed method is simple to implement since all the estimates needed for the interim futility assessment can be obtained by using existing algorithms in standard statistical packages. This also makes exploration of the operating characteristics of candidate interim decision rules straightforward to implement.
Most importantly, the method guarantees preservation of the desired values for predetermined rates of false-continuation and false-stopping decision, by triggering the interim analysis adaptively, under the control of the recruitment rate, the most uncertain of trial characteristics. Simulation results further demonstrate that while preserving the predetermined rates at the interim, by including the between subject information into the calculation, the interim analysis could occur noticeably earlier compared to using only the within-subject information.
Researchers investigating novel therapeutic interventions, often in the absence of very little relevant clinical data, are faced with a conundrum. Biological theory supports the intervention, but clearly the probability that this theory is true (in the sense that it will translate into measurable clinical advantage in the target patient population) remains low (most early, experimental trials are negative). To investigate the theory requires a clinical trial, but no-one wishes to expose more patients than are required to rule the theory out, particularly in rare and extreme disorders. Hence, the increased frequency of futility rules within clinical trials.
We believe that our recruitment-driven method, by guaranteeing error-rate preservation in decision making at the interim, represents a significant addition to the methodological toolbox for such researchers. We have shown that it is possible to include a futility analysis, which will be triggered when the data itself suggests the time is right, and maintain predetermined FC and FS rates at the interim.