End of induction patient reported outcomes predict clinical remission and endoscopic improvement with vedolizumab and adalimumab in ulcerative colitis

Abstract Background Patient-reported outcomes (PROs) are increasingly emphasized as endpoints in clinical trials of ulcerative colitis (UC). However, the prognostic value of early improvement in PROs for long-term outcomes remains unclear. Methods This was a post-hoc analysis of 611 vedolizumab-treated or adalimumab-treated patients in the VARSITY trial (Clinicaltrial.gov: NCT02497469). Stool frequency (SF) and rectal bleeding score (RBS) as reported in the Mayo score at post-induction (week 6 and 14) was assessed for their association with one-year endoscopic improvement (EI), defined as Mayo endoscopic subscore <2; histo-endoscopic mucosal improvement (HEMI), defined as EI and Geboes highest grade <3.2, clinical remission (CR), defined as total Mayo score ≤2; and PRO-2 remission, defined as RBS of 0 and SF ≤1. Multivariable logistic regression models adjusted for confounders assessed the relationships between post-induction PROs and outcomes of interest at one-year. Results Patients with severe SF at week 6 were significantly less likely to achieve one-year EI compared to those with non-severe SF [aOR 0.40 (95% CI: 0.24–0.68), p < .001]. Absence of rectal bleeding at week 6 was associated with greater odds of achieving EI at one-year [aOR 2.21 (95% CI: 1.58–3.09), p < .001]. These findings were consistent across comparisons at week 14. Similar findings were observed for the outcomes of one-year HEMI, CR and PRO-2 remission. No difference was observed between the modified partial Mayo score and modified PRO-2 score. Conclusions Post-induction PROs strongly predict the odds of CR and EI in UC and simplified evaluations can be used to assess early response to UC therapies.


Introduction
Ulcerative colitis (UC) is a type of inflammatory bowel disease (IBD) characterized by mucosal ulcerations in the colon and symptoms include diarrhea and rectal bleeding. Its disease course varies both endoscopically and symptomatically. Endoscopic improvement (EI) is an important treatment target in UC, which is supported by studies demonstrating favorable long-term outcomes, including reduced risk of disease progression requiring colectomy [1].
Symptom relief is arguably more important for patients than EI, and therapies for UC have demonstrated efficacy in achieving both targets. Historically, endpoints in clinical trials of UC have relied solely on the improvement of symptoms including rectal bleeding and stool frequency (SF). More recently, therapeutic targets have evolved to include EI, often defined as Mayo endoscopic subscore <2, and symptom resolution [2,3]. The Mayo score is a tool that is used in clinical trials and routine practice to assess disease severity and response to therapy, and which considers endoscopic findings, physician's assessment of disease activity, and patient-reported symptoms of rectal bleeding and SF. All four sub-scores are scored from 0-3 and summed to derive a total score that ranges from 0-12, with higher scores indicating greater disease severity. The modified partial Mayo score is a simplified adaptation of the Mayo score, which only considers the rectal bleeding score (RBS) and SF, and represents the interim PRO tool used in clinical trials of UC [4,5]. There is increasing interest in PRO-based endpoints as regulatory bodies are moving towards integration of PROs along with EI in clinical trials of UC [6].
Despite evolution in treatment targets, an increasing body of evidence demonstrates discordance between symptoms of UC and underlying mucosal inflammation [7]. A recent metaanalysis of five studies of moderate-severe UC reported only 40% of those in endoscopic remission had normalization of SF [8]. Similar findings were also demonstrated among trial participants with normalization of SF and rectal bleeding [9]. It has also been shown that symptoms persist despite achievement of EI, which suggests complete resolution of symptoms may not be a feasible target [10]. A study of patients with moderate-severe UC reported that healing of ulceration does not guarantee subsequent symptom relief [11]. However, it remains unclear whether early improvement in PROs are useful to predict which patients will have better long-term outcomes in UC. This analysis examines the relationship with PROs at post-induction and outcomes at one year.

Study design
Data from the VARSITY study were obtained through Vivli (protocol #0007158) and by permission from Takeda Inc. The VARSITY study was a phase 4 clinical trial comparing two biologic therapies, vedolizumab and adalimumab, in patients with moderate to severely active UC (ClinicalTrials.gov identifier NCT02497469) [2]. The Hamilton Integrated Research Ethics Board determined that local ethics review was not required as data used for this analysis were previously collected and deidentified.

Data availability statement
Data can be made available upon request to Vivli and by permission by Takeda Inc.

Participants
The design and eligibility criteria of the VARSITY study have been described previously [2]. In short, adults with moderate to severe UC as determined by a total Mayo score of 6-12, including Mayo endoscopic subscore of !2, were eligible. Participants were randomized to receive standard doses of vedolizumab or adalimumab for a duration of 52 weeks. Dose escalation was not permitted in this trial. While most participants were anti-TNF naïve, prior anti-TNF exposure was allowed in up to 25% of participants. Endoscopic assessments with biopsies were performed at baseline, week 14 (post-induction), and week 52 (post-maintenance). All endoscopies and histologic assessments were centrally read.

Variables
A total of 769 participants were randomized in the VARSITY study and received at least one dose of treatment. We restricted this analysis to 611 participants with complete Mayo scores (including the Mayo endoscopic subscore) at week 6, 14, and 52. For this analysis, post-induction was defined as week 6. Acknowledging the possibility of delayed symptom improvement, sensitivity analyses were planned to evaluate PROs at week 14.

Patient-reported outcomes
In our analysis, we evaluated SF and RBS, which are two PROs included in the Mayo score. SF is relative to a patient's baseline and scored as normal (score of 0), 1-2 stools per day more than normal (score of 1), 3-4 stools per day more than normal (score of 2), and >4 stools per day more than normal (score of 3). RBS is scored as none (0), visible blood less than half the time [1], visible blood half of the time or more [2], and passing blood alone [3]. As a sensitivity analysis, in order to evaluate the gradient of effect between PROs and the outcomes of interest, we categorized SF and RBS as severe vs. not (score of 3 vs. <3), moderate-severe vs. not (score !2 vs. <2), and normal vs. not (score of 0 vs. >0).

Modified partial Mayo score
The partial Mayo score (PMS) includes the SF, RBS, and PGA subscores, for a total score ranging from 0 to 9. In the absence of endoscopy, the PMS is used to assess disease activity in clinical trials of UC. More recently, the modified PMS (without PGA) is applied in clinical trials. Therefore, we evaluated the modified PMS and its ability to predict outcomes at week 52 compared to the total Mayo score as continuous variables.

Outcomes
The primary objective of this study was to determine whether post-induction PROs at week 6 were associated with one year EI, defined as Mayo endoscopic subscore <2. This definition is commonly used to assess EI in clinical trials of UC and was chosen based on international consensus statements [12]. Additionally, we evaluated histo-endoscopic mucosal improvement (HEMI), defined as Mayo endoscopic subscore <2 and Geboes highest score <3.2, as well as clinical remission (CR), defined as total Mayo score 2, and PRO-2 remission, defined as RBS of 0 and SF 1. We also planned to evaluate a more stringent definition of HEMI, defined as Mayo endoscopic subscore <2 and Geboes highest score <3.1. Exploratory analyses were planned to evaluate early PROs at week 2 for the outcome of one year EI to determine if earlier PRO timepoints could be equally prognostic as PROs at post-induction. Week 2 was chosen as it was the only available timepoint preceding week 6. Outcomes are also presented using an alternative classification of UC PROs, referred to herein as modified PRO-2 score, specifically absence or presence of rectal bleeding and normal, mild/ moderate, or severe SF score. The decision to include these alternative classifications was based on the lack of discriminative association with the outcomes as presented in Table  2 and Supplementary Table 2.

Statistical analyses
This was a post-hoc analysis of data from the VARSITY study. For each comparison, descriptive statistics were provided. Continuous variables were described as means [and standard deviations (SD)] or medians [and inter-quartile ranges (IQR)], and dichotomous variables were presented as proportions or percentages and compared using the Chi-square test of trend. Multivariate logistic regression models evaluated the relationship between PROs and achievement of outcomes at one year. Unadjusted odds ratios (ORs) and adjusted ORs (aORs) for achieving the outcomes of interest were calculated and were adjusted for known confounders of symptoms, including prior anti-TNF use, disease duration, treatment allocation, and concomitant corticosteroid use. All ORs and aORs were presented along with 95% confidence intervals (CIs) and associated p-values. In addition, receiver operating characteristic (ROC) curve analyses were performed to compare the PMS and total Mayo score for the ability to achieve CR and EI at week 52. Comparisons with objective markers of disease (i.e. CRP and fecal calprotectin) as categorical and continuous variables were also performed. The accuracy of each assessment was evaluated using the area under the curve (AUC) of the ROC and categorized as poor (AUC 0.5-0.7), fair (0.7-0.8), good (0.8-0.9), and excellent (0.9-1.0). AUCs were compared using the method described by Delong et al. [13] The statistical significance level was set to a Bonferroni-corrected threshold of 0.002, as opposed to the conventional value of 0.05. Data were analyzed using Stata/ IC version 15.0. Table 1 demonstrates the baseline characteristics of the 611 participants included in this analysis. Overall, these were consistent with the overall VARSITY study cohort. The mean age was 40.7 years (SD 13.5) and 356 participants (58.3%) were male. The median disease duration was 4.8 years (IQR 2.1-9.3) and 327 (53.5%) participants received vedolizumab. A total of 114 (19.5%) participants had prior anti-TNF exposure and 213 (34.8%) had concomitant corticosteroid use. There was objective evidence of disease activity at baseline as the median C-reactive protein was 4.0 mg/L (IQR 1.4-10.0), the median fecal calprotectin was 1412 mg/L (IQR 575-3091), and 371 (60.7%) participants had severe endoscopic disease. RBS and SF at baseline were similarly elevated.

PROs and outcomes at week 52
The proportion of participants achieving the outcomes of interest at one year stratified by PRO severity are presented in Table 2. Overall, symptoms improved from baseline to week 6 and 14. From the 611 participants included in our analysis, 378 (61.9%) participants had moderate-severe RBS at baseline, which was reduced to 79 (13%) by week 6 and further reduced to 54 (8.8%) by week 14. At baseline, 507 (83.0%) participants had moderate-severe SF, which was reduced to week 6, 221 (36.2%) and further reduced to 164 (26.8%) by week 14. Supplementary Table 1 demonstrates the proportion of participants attaining outcomes at one year stratified by simplified categories of RBS and SF at baseline, week 6, and week 14.

PROs and endoscopic improvement at week 52
The primary purpose of our analysis was to evaluate the relationship between PROs at week 6 with EI at one year. Participants with severe SF at week 6 were significantly less likely to achieve EI compared to those with non-severe SF [aOR 0.40 (95% CI: 0. 24 (Table 3). PROs and clinical remission at week 52 PROs and PRO-2 remission at week 52 The relationship between PROs and PRO-2 remission at one year was also investigated (Supplementary Table 2 and  Mild rectal bleeding score: Visible blood with stool less than half the time. Moderate rectal bleeding score: Visible blood with stool half of the time or more. Severe rectal bleeding score: Passing blood alone. Mild stool frequency score: 1-2 stools/day more than normal. Moderate stool frequency score: 3-4 stools/day more than normal. Severe stool frequency score: >4 stools/day more than normal.  Table 4).

PROs at week 2 and endoscopic improvement at week 52
Additionally, we evaluated PROs at week 2 to determine the relative prognostic value of PROs at timepoints earlier than post-induction. As demonstrated in Supplementary

Discussion
UC is a chronic inflammatory disorder of the colon characterized by bloody diarrhea and a relapsing and remitting course. It negatively impacts patients' health-related quality of life and general life satisfaction [14]. Our study found that UC PRO improvements at weeks 6 and 14 were predictive of one-year CR and EI. This highlights the reliability of using PROs as a way to measure early response to biologic therapy in UC.
As demand for virtual healthcare increases, there is a need for reliable symptom-based tools for assessment of UC patients [15]. Symptom-based strategies can also increase patient empowerment, reduce health care costs, and increase treatment adherence [16,17]. Owing to long-term improvement in outcomes and association with improvement in quality of life, the STRIDE consensus statements recommend resolution of PROs (such as RB and SF) in addition to mucosal healing as treatment goals in patients with UC [18]. Concern over the bias imposed by the PGA on the total Mayo score has led to a    shift by the US FDA towards adopting the use of the adapted Mayo score, which excludes the PGA [19]. However, in our study, we observed that the total Mayo score outperforms the modified partial Mayo score, modified PRO-2 score, and individual PROs. This may suggest that the combination of PROs, endoscopy, and PGA produces a synergistic effect when predicting long-term outcomes, and reliance on one component in isolation is not an optimal strategy. However, PROs clearly provide important prognostic information, particularly when both PROs (RB and SF) are evaluated as demonstrated by the accuracy of the modified Partial Mayo score and modified PRO-2 score compared to each PRO in isolation. Thus, our findings support use of a simplified approach to evaluating PROs as a surrogate predictor for long-term outcomes. Prior studies have examined the cross-sectional relationship of PROs and EI in UC. Overall, RB and SF have a moderate to strong correlation with endoscopic activity; however, the absence of RB was more sensitive than normalization of SF [8,20]. Thus, normalization of the PROs is useful in evaluation of response to therapy in that it correlates with both current and future EI. This provides further support to use of PROs within a treat to target strategy for management of UC [18].
The US FDA now mandates that IBD clinical trials include PRO-based endpoints [21,22]. These are acceptable to patients and also provide prognostic value. In our study, the early (week 6/14) absence of RB and normalization of SF predicted both CR and EI at week 52. In contrast, although postinduction PROs in Crohn's disease (CD) are also predictive of longer-term CR, they do not seem to have an association with one year ER of CD. This may reflect previous observations where correlation between symptoms and endoscopic activity in CD were low [23][24][25].
In the absence of endoscopic assessments, clinical trials of UC often use the modified PMS to assess disease severity. However, the question remains whether the modified PMS is of similar predictive value as the total Mayo score for longterm outcomes. Our analyses demonstrate that the modified PMS has lower predictive value than the total Mayo score for outcomes of CR and EI at week 52. This calls into question whether the modified PMS should be used as a means to assess for treatment response.
Compared to traditional categorization for RBS and SF, each with four categories of severity, our analyses demonstrate that a simplified approach can adequately capture the likelihood of achieving long-term outcomes at one year. Our findings suggest that the absence or presence of rectal bleeding may suffice. Similarly, the SF can be simplified into three categories (normal, mild/moderate, and severe). Compared to the modified PMS, the modified PRO-2 score had similar ability to predict outcomes at one year. Simplification of the traditional PRO categories can benefit both patients and clinicians by offering a more direct assessment strategy. This simplified approach may also benefit clinical trials by increasing the frequency of assessments with minimal additional effort by patients and trialists. Further validation in external cohorts of simplified approaches to evaluating PROs should be performed before incorporation into clinical practice or trials.
While rapid improvement in symptoms is important for patients, whether PRO improvement earlier than post-induction is similarly predictive of long-term outcomes remains unclear. In our analysis, we observed that patients with severely elevated PROs at week 2 were less likely to achieve EI at one year after adjustment for known confounders. This has important implications for patients and clinicians as evaluation of PROs as early as two weeks after starting therapy can be used to guide treatment decision-making. However, patients may have a delayed response to therapy and have symptoms that may persist for reasons unrelated to UC, such as irritable bowel syndrome, bile acid malabsorption, viral infections, or hemorrhoids. Therefore, treatment decisions should be made given the totality of evidence, with consideration given to early changes in PROs. Strengths of this study include using data used from a randomized, double-blind, multicenter, clinical trial with centrally read endoscopy. However, we also acknowledge some limitations in our study. There is the potential for treatment bias since all patients enrolled in the study received an active comparator (either received vedolizumab or adalimumab) that could influence PROs. Secondly, the patient population in our study was relatively young (mean age 40.7 years) with short disease duration, with their PROs likely to be reliable, but the application of our findings in the elderly population or those with longer disease duration is uncertain. There also may be some selection bias in that patients enrolled in the VARSITY study had moderate to severe disease, so it is possible these findings may not apply to those with milder or more severe disease. Dose intensification of vedolizumab and adalimumab was not permitted in the VARSITY trial, which may limit the generalizability of our findings. However, as non-response at the end of induction is a negative predictor for one-year outcomes, our findings support the role of early drug optimization as a strategy to improve outcomes for patients. Lastly, multiple comparisons were performed to identify potential trends or associations; however, we adjusted the p-value threshold to account for multiple comparisons. Small sample sizes in subgroups, particularly among patients with severe rectal bleeding, is a notable limitation of our analysis, and limits further extrapolation.
The results of our study emphasize the need for incorporating PROs in the assessment of UC patients undergoing induction treatment with biologics in routine clinical practice. This patient-centered approach correlates with clinical improvement and EI. Our analyses demonstrate that a simplified approach to assessing symptoms, such as the absence or presence of rectal bleeding, can predict long-term outcomes and be used by clinicians and patients to assess disease activity. Further research is needed to examine the role of PROs with other non-invasive parameters such as early biomarkers to provide a personalized approach for UC patients in predicting long-term CR and EI.

Author contributions
ECLW: acquisition and compilation of data; statistical analysis; drafting of the manuscript; BH: drafting of the manuscript; PSD: study concept and design; data interpretation; drafting of the manuscript; JKM: study design; drafting of the manuscript; WR: study concept and design; acquisition and compilation of data; data interpretation; drafting of the manuscript; NN: study concept and design; acquisition and compilation of data; statistical analysis; data interpretation; drafting of the manuscript.