Modeling Postoperative Mortality in Older Patients by Boosting Discrete-Time Competing Risks Models

Abstract Elderly patients are at a high risk of suffering from postoperative death. Personalized strategies to improve their recovery after intervention are therefore urgently needed. A popular way to analyze postoperative mortality is to develop a prognostic model that incorporates risk factors measured at hospital admission, for example, comorbidities. When building such models, numerous issues must be addressed, including censoring and the presence of competing events (such as discharge from hospital alive). Here we present a novel survival modeling approach to investigate 30-day inpatient mortality following intervention. The proposed method accounts for both grouped event times, for example, measured in 24-hour intervals, and competing events. Conceptually, the method is embedded in the framework of generalized additive models for location, scale, and shape (GAMLSS). Model fitting is performed using a component-wise gradient boosting algorithm, which allows for additional regularization steps via stability selection. We used this new modeling approach to analyze data from the Peri-interventional Outcome Study in the Elderly (POSE), which is a recent cohort study that enrolled 9862 elderly inpatients undergoing intervention under anesthesia. Application of the proposed boosting algorithm yielded six important risk factors (including both clinical variables and interventional characteristics) that either contributed to the hazard of death or to discharge from hospital alive. Supplementary materials for this article are available online.


Introduction
Each year more than 300 million patients undergo surgery worldwide.According to the 2016 Global Burden of Death study, ischemic heart disease (17.3%), stroke (10.1%) and death due to surgery (7.7%) constitute the major causes of death (Nepogodiev et al. 2019).Hence, there is an urgent need to minimize the risk of complications in postoperative care.In their review of 87 studies reporting on about 20 million anesthetic administrations, Bainbridge et al. (2012) showed that postoperative and anesthetic-related mortality declined over the past 50 years, but still remain 2-3 times higher in developing countries than in developed ones.Haynes et al. (2009) suggested establishing a surgical checklist that improves team communication and consistency of care to reduce deaths associated with surgery.
Regarding postoperative complications, it is particularly necessary to focus on elderly patients, as risks leading to adverse effects accumulate with advancing age (Olotu et al. 2019).Here, we propose a novel strategy to analyze postoperative death in the Peri-interventional Outcome Study in the Elderly (POSE), which is a large-scale prospective study comprising a cohort of about 10,000 patients older than 80 years of age undergoing all surgical and nonsurgical interventions under anesthesia Statistical methods used for modeling postoperative death are manifold.Examples include, among others, logistic regression (Lemeshow et al. 1988;Silber et al. 2002), Cox regression (Heuschmann et al. 2004;Reis et al. 2020), random survival forests (Friedel et al. 2013), and artificial neural networks (Fritz et al. 2019).All of these methods need to deal with the challenges that (i) censored event times occur in the data, (ii) patients might experience a competing event first, and (iii) measurements of event times are usually grouped (i.e., there are tied observations with the same event time).The latter is also the case for the data handled with here, because the time until postoperative death in POSE was recorded using 24-hour intervals.
To address the aforementioned issues, we propose a timeto-event modeling approach that explicitly assumes the event times to be measured on a discrete time scale (Willett and Singer 1993;Tutz and Schmid 2016).More specifically, we consider event times taking values t = 1, 2, . . ., k (with k = 30 when considering 30-day mortality), and we account for the fact that event occurrence is affected by one or more competing events (Austin, Lee, and Fine 2016;Lee, Feuer, and Fine 2018).In our analysis of the POSE data the main competing event for postoperative death is discharge from hospital alive.As outlined in Section 3, our modeling strategy combines discrete competingrisks modeling with state-of-the-art approaches for predictive modeling and variable selection: We will use gradient boosting (Bühlmann and Hothorn 2007) to fit models for location, scale, and shape (GAMLSS, Rigby and Stasinopoulos 2005), and we will apply stability selection to determine subsets of risk factors that are associated with postoperative death (Meinshausen and Bühlmann 2010).
Throughout this article, we denote by T ∈ {1, 2, . . ., k} the discrete time-to-event outcome and by C ∈ {1, 2, . . ., k} a discrete censoring time.The censoring process is assumed to be (i) independent, in the sense that the hazard rates of the individuals under observation are representative of the respective hazard rates in the study population at all times t (eqs.(1.9) and (6.5) of Kalbfleisch and Prentice 2002), and (ii) noninformative, in the sense that the densities of T and C share no common parameters (chap. 6, p. 195 of Kalbfleisch and Prentice 2002).The objective is to fit the discrete cause-specific hazard functions λ j (t|X) = P(T = t, ε = j | T ≥ t, X) , j = 1, . . ., J , where ε ∈ {1, . . ., J} denotes the type of event (death or discharge alive) and X = (X 1 , . . ., X p ) is a set of p time-constant covariates (measured for example, at the time of intervention).The cause-specific hazard function λ j (t) quantifies the probability of experiencing an event of type j at time point t given survival up to t (Schmid and Berger 2021).In the following sections we will introduce a novel gradient boosting algorithm for modeling postoperative mortality that is able to select the influential covariates contributing to the cause-specific hazard functions λ j (t).Key advantages of the proposed algorithm are (i) its flexibility in specifying the functional forms of the covariate effects and (ii) its built-in mechanism to select potentially different sets of covariates in each of the functions λ j (t).
The remainder of the article is organized as follows: Section 2 describes the basic concepts of discrete cause-specific hazard modeling.Section 3 introduces the proposed boosting approach and discusses its characteristics.Related approaches are briefly sketched in Section 4. In Section 5 we present the results of several simulations comparing the boosting approach to alternative methods.Finally, the analysis of postoperative 30day inpatient mortality based on the POSE data is presented in Section 6.The article concludes with a summary and discussion of the presented methods (Section 7).

Methodology
Discrete cause-specific hazard models.For modeling the discrete cause-specific hazards λ j we consider the class of multinomial logistic regression models defined by where η j (•) ∈ R, j = 1, . . ., J, is a set of risk functions specific for each competing event.Note that the J cause-specific hazard functions are linked by (1), implying that all competing events are considered jointly.The risk functions are related to time t and the covariates X.Hence, the conditional probability that an event (of any type) occurs later than at time t is given by and the overall discrete hazard is obtained by A quantity of interest when modeling competing risks is the cumulative incidence function (CIF) for an event of type j, which is defined by (3) Importantly, F j (t|X) depends on all cause-specific hazards and there is no one-to-one relationship between F j (t|X) and λ j (t|X), see Schmid and Berger (2021).
A common assumption is that the risk functions in Model (1) have the simple linear form η Under this assumption each risk function contains timedependent intercepts γ 01j , . . ., γ 0,k−1,j (referred to as baseline coefficients) as well as a linear combination of the covariates with coefficients γ j = (γ 1j , . . ., γ pj ) ∈ R p not depending on t.The baseline coefficients can then be treated as an additional factor variable in the model.When the number of time points is large relative to the sample size, it is often useful to consider an additive model where the baseline coefficients γ 0tj = f 0j (t) are represented by smooth (possibly nonlinear) functions of time.Furthermore, to weaken the linearity assumption on the effects of the covariates, one may consider the additive risk functions η , where the effects of the covariates are also determined by smooth nonlinear functions (Berger and Schmid 2018).For example, estimation of f 0j (•), . .., f pj (•) can be carried out by representing each function as a polynomial spline and by using penalty terms that enforce smoothing over the range of covariate values (Eilers and Marx 1996;Wood, Pya, and Säfken 2016;Wood 2020).
In linear and semiparametric models it is usually assumed that the hazard functions λ 1 , . . ., λ J depend on the same set of covariates X 1 , . . ., X p .A more flexible approach proposed here is to assume that the J hazard functions are determined by different (possibly lower-dimensional) sets of covariates.To this purpose, we consider a semiparametric model with risk functions of the form where p j ) ⊆ (X 1 , . . ., X p ), j = 1, . . ., J, are sets of p j covariates specific for each competing event.For example, as demonstrated in the analysis of the POSE data in Section 6, the anesthesia technique (e.g., regional anesthesia) solely contributes to the risk function for death, whereas the urgency of intervention (e.g., elective surgery) solely contributes to the risk function for discharge alive.
Estimation and model fitting.Consider an iid sample of right-censored time-to-event data with n observations ( Ti , i ε i , x i ), i = 1, . . ., n, where Ti := min(T i , C i ) is the observed time, i := I(T i ≤ C i ) is the status indicator, and ε i ∈ {1, . . ., J} denotes the type of event.For the POSE data we define J = 2 with possible events death and discharge from hospital alive.With these specifications, the log-likelihood of Model (1) can be written as where the binary values y it0 , y itj for individual i at time t are defined by see Tutz and Schmid (2016).By incorporating both λ 1 and λ 2 in the multinomial likelihood function (5), estimation of the two cause-specific hazards is performed jointly, thereby accounting for possible dependencies between the occurrence of the death and discharge events.The results in (5) and (6) imply that Model (1) can be fitted using software for multinomial logistic regression with outcome categories j ∈ {0, 1, . . ., J}, where j = 0 denotes the reference category (equivalent to the censoring event).Before model fitting, the original data have to be converted into an augmented data matrix containing the binary outcome values.In the augmented data matrix, each individual is represented by Ti rows, where the vector of covariates is repeated row-wise.The overall augmented data matrix, which is obtained by "gluing" the individual augmented data matrices together, has ñ = n i=1 Ti rows.For details on data preparation and maximum likelihood estimation for discrete cause-specific hazard models we refer to chapter 8 of Tutz and Schmid (2016).

Optimization of the Log-Likelihood
To maximize the log-likelihood function (5), we propose applying a component-wise gradient boosting approach (Friedman 2001;Bühlmann and Hothorn 2007).Generally, gradient boosting is designed to minimize the expectation of an objective function ρ(Y, η(X)) over an unknown vector-valued function η provided that ρ is differentiable with respect to each component of η.In practice, minimization is usually performed by considering the empirical mean n i=1 ρ(y i , η(x i ))/n instead of the theoretical expectation.
For the class of discrete cause-specific hazard models the gradient boosting algorithm can be applied by setting ρ equal to the negative of the log-likelihood (5) and by defining the J-dimensional function η(t, X) = (η 1 (t, X [1] ), . . ., η J (t, X [J] )) .Accordingly, one solves the optimization problem ...,n,t=1,...,Ti . (7) The proposed algorithm (described in detail below) originates from the approach by Schmid et al. (2010) who introduced a boosting algorithm for classification with multiple outcome categories.The algorithm iteratively computes the partial derivatives of the log-likelihood function ( 5) with respect to each of the risk functions η j , and determines the covariate (i.e., either t or one component of X) that fits the negative partial derivative best for each competing risk.Simple univariable models, called base-learners, are used to fit the negative partial derivatives to each of the covariates by least squares or penalized least squares estimation (for details, see below), and one chooses the baselearner that minimizes the residual sum of squares to update the estimate of η at each iteration.
Cyclical and noncyclical fitting procedure.With the strategy proposed by Schmid et al. (2010), the algorithm cycles through all J risk functions at each iteration (in a "cyclical" way), and each risk function is updated successively while the current estimates of the other cause-specific hazards are used as offset values.The main tuning parameter is the vector of stopping values m stop = (m stop,1 , . . ., m stop,J ) which determines the number of boosting iterations to be performed until the algorithm is stopped.Schmid et al. (2010) recommended using a multidimensional strategy to determine the values of the stopping iterations, as the components of η may require different degrees of regularization.In POSE, for example, different stopping iterations may be necessary to optimize the risk functions for death and discharge alive.Importantly, this strategy allows for selecting the individual sets of p j influential covariates X [j] for each of the cause-specific hazards.
An alternative strategy, which we will consider in this article, is to update only one component of η in each boosting iteration (in a "noncyclical" way).First, for each cause-specific hazard the best performing base-learner is selected (as with the cyclical algorithm).In the next step, the potential improvement in the negative log-likelihood is compared across all selected baselearners, that is, over all cause-specific hazards.Finally, only the risk function with the highest improvement (with regard to the selected base-learner) is updated.This noncyclical algorithm requires the specification of only one stopping iteration m stop , which can be optimized over a one-dimensional grid.As a consequence, tuning strategies for m stop (discussed in more detail below) are considerably more efficient than the multidimensional strategy.Note that one-dimensional tuning implies the possibility that some of the risk functions may not be selected for any of the updates (neither with regard to X nor t).In this case, the resulting cause-specific hazard estimates become independent of the covariates and constant over the whole range of t, that is, ηj (t, Technically, the algorithm can be embedded into the gam-boostLSS framework proposed by Mayr et al. (2012) for fitting models for location, scale, and shape (GAMLSS, Rigby and Stasinopoulos 2005).To obtain coefficient estimates we make use of the eponymous R add-on package (Hofner et al. 2022), which also implements the noncyclical version of gamboost-LSS (Thomas et al. 2018).The corresponding self-implemented family function, called CSHM(), an acronym for Cause Specific Hazard Modeling, is available on GitHub.A detailed description of the algorithm is given in Section 1 of the supplementary material.
Cause-specific variable and model selection.In the literature, various approaches to optimizing the number of iterations of boosting algorithms have been proposed.These include crossvalidation, bootstrapping, and early stopping based on information criteria (Bühlmann and Hothorn 2007).An alternative strategy, which will be used in the analysis of the POSE data, is stability selection (Meinshausen and Bühlmann 2010;Shah and Samworth 2013).The idea of stability selection, when applied to gradient boosting algorithms, is to repeatedly run the algorithm on subsamples of the original data and to stop it when a pre-specified number of base-learners has been selected for an update (in any of the J risk functions).Afterwards, for each base-learner associated with one of the risk functions one computes the fraction of subsets in which the base-learner was selected and includes all base-learners with a selection frequency higher than a pre-specified threshold in the model.
Following the approach by Hofner et al. (2011), we control the complexity of the base-learners by fixing their degrees of freedom (df, defined in terms of the base-learners' hat matrices).Note that the functions f sj (X 4) generally represent smooth effects of continuous covariates (including linear effects as a special case) but can also represent effects of binary or categorical covariates, f sj (X s β sj , where Z denotes the matrix of corresponding dummy variables.In the former case, we decompose the smooth effect into a linear function (using a least squares base-learner with 1 df) and a 1df deviation from this linear function (derived from a centered P-spline base-learner with second-order differences) that were both subject to selection.In the analysis of the POSE data this decomposition will be considered for time t and the covariate age.For categorical covariates (e.g., anesthesia technique) we specify one least squares base-learner per covariate (with df equal to the number of categories minus one).Thus, all dummy variables belonging to the same categorical covariate are either included or excluded simultaneously from the model.After stability selection, the final model including the selected variables can be fitted using ML estimation, as implemented in the R package VGAM.Additionally, as part of post-selection inference, (1 − α) confidence intervals can be obtained by bootstrapping, for example, by application of the percentile bootstrap (calculating the α/2 and (1 − α/2) percentiles of the bootstrapped coefficient estimates).

Related Approaches
To carry out variable selection in discrete cause-specific hazard models, two alternatives to the proposed boosting approach are (i) penalized maximum-likelihood (ML) estimation (Tutz, Pößnecker, and Uhlmann 2015;Möst, Pößnecker, and Tutz 2016), and (ii) tree-based modeling (Berger et al. 2019).Both approaches, briefly outlined below, will be considered in the simulation study in Section 5.
For ML estimation, Möst, Pößnecker, and Tutz (2016) considered linear risk functions and used two penalty terms in the log-likelihood (5), yielding the penalized log-likelihood , where γ 0 = (γ 01 , . . ., γ 0J ) is the full vector of baseline coefficients and γ = (γ 1 , . . ., γ J ) is the full vector of covariate effects.The first term P 1 with tuning parameter ζ 1 penalizes the differences between coefficients of adjacent baseline coefficients to ensure that hazards vary smoothly over time.This is analogous to the idea of penalized spline modeling described previously.The second term P 2 with tuning parameter ζ 2 shrinks the covariate effects toward zero, which enforces selection of covariates.Möst, Pößnecker, and Tutz (2016) proposed to set √ J γ s γ s to ensure that all coefficients γ s = (γ s1 , . . ., γ sJ ) associated with the same covariate are simultaneously shrunk toward zero.This is analogous to the idea of group LASSO (Yuan and Lin 2006).To allow for different degrees of regularization in the J risk functions, one has to weaken this assumption and needs to specify a separate LASSO penalty on each of the coefficients, that is, With regard to the analysis of the POSE data (with J = 2) this means that the tuning parameters need to be optimized on a two-dimensional grid.
Model ( 4) is able to capture both linear and nonlinear effects of the covariates on the hazards.However, its additive structure may be inappropriate in the presence of (higher-order) interactions.A popular approach to address this issue are treebased models, see, for example, Moradian et al. (2022) for a recent overview on tree-based methods for discrete-time hazard modeling and extensions for dynamic predictions.Berger et al. (2019) proposed a discrete cause-specific hazard model of the form λ j (t|X) = g j (t, X), j = 1, . . ., J, t = 1, . . ., k − 1, where the functions g j (•) are determined by a classification tree for categorical outcomes with categories j ∈ {0, 1, . . ., J} based on the CART algorithm (Breiman et al. 1984).For tree building, the covariates X 1 , . . ., X p , as well as time t (coded as an ordinal variable) are considered as candidates for splitting.When a tree has been constructed, each terminal node q of the tree corresponds to a subset of the covariate space and to a time interval.Estimates of the cause-specific hazards λ1q , . . ., λJq are then given by the (Laplace-corrected) proportions of the outcome values in the node.We refer to Berger et al. (2019) for further details on splitting criteria and tuning parameter selection.

Empirical Evaluations
Before considering the POSE data, we present the results of a simulation study based on 100 Monte Carlo replications that illustrates the properties of the proposed boosting approach.The aims of the study were (i) to compare different strategies for variable selection, (ii) to compare the boosting approach to alternative methods, and (iii) to evaluate how the performance of the boosting approach is affected by the number of noninformative covariates.
Experimental design.We generated data of sample size n = 250 from the discrete cause-specific hazard model (1) with two competing events ε i ∈ {1, 2} and discrete event times Ti ∈ {1, . . ., 11}.We further considered eight independent standard normally distributed covariates X 1 , . . ., X 8 ∼ N(0, 1).The two risk functions had the form where the baseline coefficients were randomly drawn from a uniform distribution γ 0t1 , γ 0t2 ∼ U[−3, −2].The regression coefficients were set to γ 1 = γ 2 = (0.8, −0.8, 0.6, −0.6) .Thus, each cause-specific hazard function was determined by different sets of four informative covariates.The censoring times were obtained by drawing random numbers from a discrete distribution with probability mass function , where the degree of censoring was determined by the parameter b ∈ R + .We set b = 0.9, resulting in a censoring rate of about 30%.In addition to the eight informative covariates, we considered 10 (low-dimensional), 200 (moderate-dimensional) and 1000 (highdimensional) non-informative, standard normally distributed variables.In each replication we generated (i) a learning sample used to fit the models, (ii) a validation sample to determine the tuning parameters, and (iii) a test sample to evaluate the different approaches using the criteria described below.The following modeling approaches were compared: (i) the proposed boosting approach, referred to as GB , using the R package gamboostLSS (Hofner et al. 2022) with the self-implemented CSHM() family function, (ii) gradient boosting without cause-specific variable selection (GBg) , using the R package mboost (Hothorn et al. 2022) with the Multinomial() family function, (iii) penalized ML estimation with a separate LASSO penalty on each of the coefficients (i.e., with different degrees of regularization for each cause), referred to as PL , implemented in the R package MRSP (Pößnecker 2014), (iv) the penalized ML approach by Möst, Pößnecker, and Tutz (2016) with simultaneous variable selection (PLg) , implemented in MRSP by setting the argument group.classes=TRUE,(v) the tree-based approach by Berger et al. (2019) (Tree) using the Gini impurity for split selection, of which an implementation is available from GitHub, and (vi) the cause-specific hazard model without any covariates, except for the time-dependent baseline coefficients (Null) , fitted using ML estimation implemented in the R package VGAM (Yee 2022).
In addition, we computed all evaluation criteria using the true underlying cause-specific hazards (True).The tuning parameters to be determined were the number of boosting iterations m stop (GB and GBg), the penalty parameters ζ 1 and ζ 2 (PL and PLg), and the minimum node size (Tree).All tuning parameters were optimized by maximization of the log-likelihood on the validation samples.For the GB algorithm we also considered variable selection by stability selection as described in Section 3, which was performed using the stabsel() function of the R package mboost (Hothorn et al. 2022).
Evaluation criteria.To evaluate the model fits, we calculated a set of performance measures proposed by Heyard, Timsit, and Held for the COMBACTE-MAGNET consortium (2020) for discrete cause-specific hazard models.These were the discretetime concordance index (C-index), defined as a weighted average of the cause-specific C-index values (weighted by the event proportions in the test sample), and the cause-specific timedependent prediction error defined by where Fj (t|x i ) is the predicted cumulative incidence function for an event of type j, I(•) denotes the indicator function and w it are inverse-probability-of-censoring weights, see van der Laan and Robins ( 2003) and Chapter 4 of Tutz and Schmid (2016).We also computed the cause-specific integrated prediction error defined by IPE j = k t=1 PE j (t) • P( Ti = t, ε i = j), where the marginal probabilities P( Ti = t, ε i = j) were obtained from fitting a covariate-free null model.
Comparison of variable selection strategies.We first consider the results obtained from the proposed GB algorithm.Supplementary Table S1 presents the mean values and standard deviations of the C-index and the integrated prediction error (across 100 simulation runs) when determining m stop by maximization of the log-likelihood in the validation sample (referred to as logL) and when applying stability selection (StabS).For the latter we used 50 subsamples of size n stab = 125 with selection parameter q = √ 10 p (i.e., the algorithm stopped when √ 10 p base-learners had been selected) and threshold parameter πthr ∈ {0.6, 0.7, 0.8, 0.9} (i.e.only base-learners with a selection frequency higher than πthr were used in the final model fit), which is the range recommended by Meinshausen and Bühlmann (2010).
From Table S1 it is seen that the performance of the boosting approach clearly depends on the chosen variable selection strategy.While the values of the C-index were rather similar in the low-dimensional scenario, they deteriorated in the moderateand high-dimensional scenarios when using stability selection with πthr = 0.9.This finding might be explained by the presence of some falsely included non-informative variables, which might have prevented a part of the informative variables from reaching the high inclusion threshold.With regard to the integrated prediction error, GB with stability selection (StabS) consistently performed better than GB with optimization of m stop (logL).This could be due to StabS selecting sparser models (with a lower number of falsely included non-informative variables) than logL.The standard deviations of the integrated prediction  error did not reveal substantial differences between the variable selection strategies.Overall, stability selection with πthr ∈ {0.6, 0.7} (i.e., with a rather soft threshold) performed best.The following evaluations of the GB algorithm will therefore be based on either logL or StabS with πthr = 0.7.
Comparison to alternative methods.The C-index values obtained from the various modeling approaches are shown in Figure 1.It can be observed that the boosting approaches (GB with logL and StabS) and the penalized ML approach with cause-specific variable selection (PL) achieved the highest discriminatory power (mean C-index in the low-dimensional scenario = 0.807, 0.810, and 0.809, respectively).Particularly, these methods resulted in higher C-index values than their grouped counterparts (GBg and PLg, mean C-index in the low-dimensional scenario = 0.794 and 0.804), demonstrating the benefits of cause-specific variable selection in risk functions depending on different sets of covariates (such as ( 8) and ( 9)).In the moderate-and high-dimensional scenarios, GB with StabS was the best-performing approach (mean C-index = 0.809 and 0.783).The tree approach exhibited the worst performance by far, which might reflect a potentially high variability in the selected splits (not uncommon in tree modeling) as well as the inability of the tree structure to capture the data-generating process (characterized by the linear risk functions ( 8) and ( 9)).
Figure 2 presents the time-dependent prediction error PE j (t) obtained from the various modeling approaches.The observed patterns were similar for type 1 and type 2 events.As already indicated by the values of the integrated prediction error in Table S1, GB with StabS outperformed GB with logL.While the tree-based approach and the null model were clearly inferior, the GB approaches and the penalized ML approaches performed comparably well across all scenarios.A close examination of Figure 2 shows that GB with StabS yielded the smallest prediction error in most cases.The means and standard deviations of the integrated prediction error are given in supplementary Table S2.They confirm the previous findings, indicating that GB with StabS and PL performed similarly and resulted in smaller prediction errors than the other approaches.
Additional simulations.For sensitivity analysis, we considered three alternative scenarios, which differed from the above simulation design in the following aspects: Additional scenario (a): We set b = 1.8, which increased the censoring rate to 60%.Additional scenario (b): We considered equi-correlated normally distributed covariates with zero mean and pairwise Pearson correlations 0.4 (including the noninformative covariates).Additional scenario (c): We generated risk functions with nonlinear effects of the form and coefficients γ 1 = γ 2 = (1.25,−1.8, 0.6, −0.6) .Here, we additionally used P-spline base-learners (1-df deviations from the linear functions), and evaluated the ability of the GB algorithm to detect nonlinear effects.The findings obtained from the additional scenarios are presented in Section 2 of the supplementary material.In summary, our simulations yielded the following results: (a) GB with StabS performed best with a soft threshold, that is, πthr ∈ {0.6, 0.7}.(b) Using this threshold, GB with StabS outperformed GB with logL, both in terms of discrimination and prediction error.(c) The benefit of cause-specific variable selection was clearly seen when informative covariate sets varied across cause-specific hazards.(d) Compared to the alternative approaches, GB with StabS was among the best-performing methods, particularly in terms of IPE j .(e) In the additional scenarios, the performance of GB with StabS was equally strong.Although PL had a comparable performance in the scenarios with increased censoring rate and correlated covariates (additional scenarios (a) and (b)), the results obtained from additional scenario (c) clearly demonstrate the benefits of including nonlinear effects terms in the GB models (supplementary Figures S5  and S6, supplementary Tables S5 and S6).

Peri-Interventional Outcome Study in the Elderly
POSE is a multi-center cohort study that enrolled 9862 interventional patients aged 80 years or older from 177 hospitals across 20 European countries.Recruitment took place between 10/2017 and 12/2018.The main objective of POSE was to assess 30-day mortality after intervention and to determine mortalityrelated risk factors.Here, we investigate risk factors that are individually associated with an increased risk of postoperative death, as well as with an increased chance of discharge alive.
Patients were included in the analysis if they were undergoing any kind of intervention under anesthesia classified as either "surgical or nonsurgical, " "elective or nonelective, " or "in-patient (requiring overnight hospitalization) or out-patient." Exclusion of 365 ineligible patients resulted in an analysis dataset of 9497 patients.Data collection comprised measurements of clinical variables (e.g., comorbidities), lifestyle factors (e.g., smoking) and interventional characteristics (e.g., anesthesia technique).Another important risk factor is frailty, which is a recognized syndrome in the field of care for older adults (without consensus on its exact definition) and is associated with many age-related functional impairments (Oakland et al. 2016).For details on the study design (enrollment, follow-up, etc.) we refer to POSE-Study Group (2022).
Within 30 days after intervention, 295 patients had died inhospital, 8874 patients were discharged home or to a rehabilitation facility, 302 patients were still in the ward and alive, and 26 patients were lost to follow-up.There were missing values in the data of 1137 patients (11.9%).The largest part of the missing values occurred in limited mobility according to the TUG test, which is due to some patients not being able to perform the test (assessing muscle strength, joint function and body balance).A test on nonrandom missingness ("MAR+" assumption, Potthoff et al. 2006) did not indicate any violations of the missing-atrandom assumption (minimum p-value = 0.451, for details on the test see Section 3 of the supplementary material).Therefore, unlike in POSE-Study Group (2022), where missing values were imputed using multiple imputation techniques (as predefined in the statistical analysis plan), we only considered complete cases.After the exclusion of patients with missing values, we obtained an analysis dataset with n = 8360 patients.Summary statistics of the observed time spans Ti and the covariates included in our analysis are given in supplementary Table S7.The value T i = 0 refers to an event within the first day, and the value T i = 30 within the 31st day after intervention.In total, we incorporated 14 covariates (one continuous variable, seven binary variables and six multicategorical variables with up to eight categories).In the first part of our analysis we randomly split the analysis dataset into one learning and one test sample.The learning sample with ñ = 5574 observations (containing two thirds of the data) was used to fit the model and to perform variable selection, and the test sample with ñ = 2786 (containing one third of the data) was subsequently used for model evaluation.
We applied stability selection based on 300 subsamples of size ñstab = 5574/2 = 2787 with selection parameter q = 15.supplementary Figure S7 shows the selection frequencies for the 25 most frequently selected base-learners.Using the threshold value πthr = 0.7, the model included time t and six different covariates.The GB algorithm selected a smooth nonlinear effect of t on the cause-specific hazard for the event discharge alive (see Figure 3), but no time-dependent intercept for the event death.The covariates anesthesia technique, transfusion of plasma, type of intervention, and frailty contributed to the cause-specific hazard function of death, whereas transfusion of plasma, type of intervention, frailty, urgency, and premedication contributed to the cause-specific hazard function of discharge alive.Hence, the selection was consistent for three of the covariates, but differed for the other three covariates that were selected for only one of the two risk functions each.In the next step, we fitted the  discrete cause-specific hazard model to the learning data (using VGAM), including the covariates selected previously by GB with StabS.The estimated baseline coefficients for discharge alive (presented in Figure 3) show that the conditional probability of being discharged increased until day 10 and subsequently strongly decreased up to day 20 before leveling off at a low level up until day 30.This finding is of high clinical interest, as it suggests postoperative day 10 being a threshold for considering discharge alive of elderly patients.Table 1 presents the coefficient estimates with 95% confidence intervals obtained from the final model fit.It is important to note that due to the small number of in-hospital deaths the confidence intervals for the effects on death are rather wide.The results revealed several clinically relevant associations: The need for transfusion of plasma during intervention and being frail were estimated to significantly increase the hazard of postoperative death and at the same time to significantly decrease the hazard of being discharged alive (i.e., to prolong the length of stay in the hospital).This result is in line with the results by Ming et al. (2020), who found transfusion of fresh frozen plasma to be associated with higher rates of mortality, and a recent meta-analysis and systematic review which revealed that frailty assessed by the Clinical Frailty Scale is strongly associated with mortality (Aucoin et al. 2020).Patients undergoing "ear, nose, throat, and ophthalmic, " "gynaecologic and urological" or "orthopaedic, trauma and plastic" interventions had a significantly reduced hazard of death (compared to patients with abdominal interventions).Urgent and emergency interventions significantly increased the hazard of staying in the hospital (compared to elective interventions).In comparison, the large EuSOS study (Pearse et al. 2012) revealed increased inhospital death after urgent (5%) and emergency (10%) surgery.Of note, EuSOS included younger patients (average age 56.7 years), excluded high risk surgeries and did not assess the length hospital stay.The estimated cumulative incidence functions Fj (t|X) of a randomly selected patient ("elective" and "gynaecologic and urological" intervention, no transfusion of plasma, not frail, general anesthesia technique, no premedication) are depicted in Figure 4.For this patient, the probability of postoperative death within 30 days was very low (<1%).This probability would have increased to more than 3% after 30 days if the patient had been frail (Figure 4(a)), and to 1.5% if the patient had undergone an urgent or emergency intervention (Figure 4(c)).The probability of being discharged alive was above 80% after seven days (Figures 4(b) and (d)), but would have decreased in case of frailty or in case of an urgent or emergency intervention.
Next, we assessed the calibration of the cause-specific hazard model.To this end, we used the graphical tool proposed by Heyard, Timsit, and Held for the COMBACTE-MAGNET   Berger and Schmid (2022).Note the strong difference between the axis ranges of the two panels.consortium (2020) and applied it to the test sample with ñ = 2786 observations.Figure 5 presents the resulting calibration plots, which are based on splits of the augmented test data into subsets defined by the percentiles of the predicted cause-specific hazards.Although the predicted hazards for death showed more variation, both plots reveal strong agreement between the empirical hazards ȳg,j and the average predicted hazards ȳg,j , indicating satisfactory calibration.
In the second part of our analysis we repeatedly split the analysis dataset into 100 pairs of learning and test samples and used the C-index to evaluate discrimination.Furthermore, we analyzed calibration by fitting logistic recalibration models to the test data (Heyard, Timsit, and Held for the COMBACTE-MAGNET consortium 2020) and compared the GB approach to the alternative methods considered in Section 5.For a detailed presentation of the results we refer to Section 3 of the supplementary material.Briefly, the second part of the analysis showed that GB with StabS had a reduced discriminatory power compared to the alternative methods (reducing, for instance, the average C-index from 0.885 (PL) to 0.824).On the other hand, it showed the best calibration (agreement between predicted and observed hazards), which adds to the conclusion drawn from the calibration plots in Figure 5.
In conclusion, the application of GB with StabS to the POSE data resulted in a sparse and well calibrated model that yielded six important risk factors for death and discharge alive.These risk factors contributed to either one of the two cause-specific hazards or both.Importantly, the present analysis confirmed previous clinical findings on the effects of frailty, the transfusion of plasma, and the urgency of the intervention.The high level of interpretability, however, came at the price of reduced (but still qualitatively high) discriminatory power compared to alternatives with a less rigorous selection of covariates.

Concluding Remarks
Motivated by the recent completion of an international cohort study on elderly hospital patients (POSE-Study Group 2022), we developed a modeling approach to investigate in-patient mortality up to 30 days after intervention.The proposed approach accounts for several important characteristics of longitudinally collected in-hospital data, for example interval-censored measurements recorded on a discrete time scale and the occurrence of competing events (such as discharge alive).We applied a combination of state-of-the art methods for predictive modeling and variable selection, namely GAMLSS, stability selection, and likelihood optimization within the framework of discrete cause-specific hazard models.Although explored only relatively recently, the aforementioned methods rely on well-established theory, including asymptotic results on consistency and normality (Rigby and Stasinopoulos 2005), results on error control for the number of falsely selected variables (Meinshausen and Bühlmann 2010;Shah and Samworth 2013), and likelihood construction derived from counting process theory (chap.8 of Kalbfleisch and Prentice 2002).As a consequence, the proposed approach is able to overcome a number of limitations frequently encountered in clinical applications involving in-hospital data, namely the use of unweighted binary regression (e.g., logistic regression) to model postoperative death, which effectively ignores censoring (Lemeshow et al. 1988;Klein et al. 2016), and the consideration of a single event only, which induces a possible bias in the presence of competing risks (Segev et al. 2010;Reis et al. 2020).Our analysis of the POSE data demonstrates that a combination of discrete cause-specific hazard modeling and stability selection is able to identify cause-specific sets of risk factors while, at the same time, resulting in well calibrated models with an easy interpretation.
The methodology considered in this work may be extended in various ways: (i) First, although we developed our approach to estimate postoperative mortality, the presented methodology is, in principle, applicable to any dataset that involves longitudinally collected data measured on a discrete time scale.Clearly, the validity of the approach depends on the correct specification of the censoring process.(ii) An alternative approach to model the CIF for an event of interest is the subdistribution hazard model by Fine and Gray (1999), which has recently been extended to discrete time measurements (Berger et al. 2020) and which can be combined with stability selection in a similar way as the approach presented here.Note, however, that variable selection in the subdistribution model is not designed to identify cause-specific sets of covariates in a simultaneous fashion, but rather focuses on a single CIF for a specific event of interest.(iii) Analogous to GAMLSS, it is possible to combine discrete causespecific hazard modeling with alternative methods for variable selection, for example, stepwise selection based on the generalized Akaike information criterion (Ramires et al. 2021).Here we performed variable selection by combining stability selection with a gradient boosting algorithm, making use of independent test samples to perform valid calibration and discrimination assessments.This strategy may be extended by a large variety of alternative modeling options (e.g., interaction terms or spatial effects), which may be represented by sets of suitably specified base-learners in the employed gradient boosting algorithm (Kneib, Hothorn, and Tutz 2009).We acknowledge that the use of independent test samples in Section 6 was feasible mainly because of the sufficiently large number of patients and events in the POSE data.In smaller studies that may not allow for a single sample split, other split-sample strategies might be used to assess the performance of variable selection techniques in discrete hazard models (Meinshausen, Meier, and Bühlmann 2009).

Figure 1 .Figure 2 .
Figure 1.Results of the simulation study.The boxplots visualize the C-index values that were obtained from the various modeling approaches.As expected, the C-index was close to 0.5 for the covariate-free model (Null).In each panel, the eighth boxplot contains the C-index values obtained from the true underlying cause-specific hazards.The best-performing approaches are indicated by the dashed horizontal lines.

Figure 3 .
Figure 3. Analysis of the POSE data.Time-dependent baseline coefficients for discharge alive with pointwise 95% confidence intervals obtained from fitting the discrete cause-specific hazard model to the learning sample of the POSE data (n = 5574, variable selection by GB with StabS, πthr = 0.7).

Figure 4 .
Figure 4. Analysis of the POSE data.The figure shows the estimated cumulative incidence functions with pointwise 95% confidence bands for death (panels (a) and (c)) and discharge alive (panels (b) and (d)) referring to the covariates of a randomly selected patient ("elective" and "gynaecologic and urological" intervention, no transfusion of plasma, not frail, general anesthesia technique, no premedication).Panels (a) and (b) refer to the frailty of the patient, panels (c) and (d) refer to different degrees of urgency of the intervention.

Figure 5 .
Figure 5. Analysis of the POSE data.The calibration plots for death (left panel) and discharge alive (right panel) refer to the test sample of size ñ = 2786 using G = 22 subsets D g , g = 1, . . ., 22, according the rule suggested byBerger and Schmid (2022).Note the strong difference between the axis ranges of the two panels.

Table 1 .
Analysis of the POSE data.The table presents the coefficient estimates and 95% confidence intervals (based on 300 bootstrap samples) obtained from fitting a discrete cause-specific hazard model to the learning sample (n = 5574, γ = coefficient estimate, CI = confidence interval).Covariates were selected by GB with StabS ( πthr = 0.7, FigureS7).