Efficient Estimation in the Fine and Gray Model

Abstract Direct regression for the cumulative incidence function (CIF) has become increasingly popular since the Fine and Gray model was suggested (Fine and Gray) due to its more direct interpretation on the probability risk scale. We here consider estimation within the Fine and Gray model using the theory of semiparametric efficient estimation. We show that the Fine and Gray estimator is semiparametrically efficient in the case without censoring. In the case of right-censored data, however, we show that the Fine and Gray estimator is no longer semiparametrically efficient and derive the semiparametrically efficient estimator. This estimation approach involves complicated integral equations, and we therefore also derive a simpler estimator as an augmented version of the Fine and Gray estimator with respect to the censoring nuisance space. While the augmentation term involves the CIF of the competing risk, it also leads to a robustness property: the proposed estimators remain consistent even if one of the models for the censoring mechanism or the CIF of the competing risk are misspecified. We illustrate this robustness property using simulation studies, comparing the Fine–Gray estimator and its augmented version. When the competing cause has a high cumulative incidence we see a substantial gain in efficiency from adding the augmentation term with a very reasonable computation time. Supplementary materials for this article are available online.


Introduction
The competing risks model considers the first occurrence of many competing types of events.Typically one might be interested in death from a specific disease, but this event will only be observed if the subject does not die of other causes.Regression modeling of the probability of dying from the disease of interest was first proposed by Fine and Gray (1999).They suggested to model the effect of covariates on the cumulative incidence function (CIF) of the disease of interest (denoted F 1 ) directly instead of through cause-specific hazard functions.The latter approach has been considered for many different cause-specific hazard models such as for example the Cox model, see, for example, Andersen et al. (1993) and Cheng, Fine, and Wei (1998), but the advantage of the Fine and Gray (FG) approach is that we get a more direct description of the effect of covariates on the CIF.Moreover, their approach does not require any model for the CIF of the competing events (denoted F 2 ).Fine and Gray (1999) derived estimating equations for a particular regression model (log-log link) that are similar to those of the semiparametric Cox model (Cox 1972).The FG-model has proven useful to address regression issues in the competing risks context, in particular as a prediction model, due to its nice numerical properties that resemble those of the Cox model.A drawback of the model is that the regression parameters are not so straightforward to interpret, see Fine (1999) and Eriksson et al. (2015).This may be remedied by using instead, for example, the logit-link, see, mean when all models are correctly specified will still lead to a valid estimator but with a different asymptotic variance.Intuitively, one should choose the estimating equation the least sensitive to the value of the nuisance parameters.This should result in a less variable estimator or, if the model for the nuisance parameters is misspecified, a less biased estimator.The semiparametric inference theory shows that choosing the optimal augmentation term is equivalent to projecting the estimating equation into a specific space.
Based on these results, we show that the FG-estimator is semiparametrically efficient in absence of censoring (Section 4) but not semiparametrically efficient in the more practically relevant setting of right-censoring (Section 5.1).To the best of our knowledge this last result has not been established earlier.The results in Mao and Lin (2017) are based on assuming a specific structure for F 2 (t, X) and their results are, therefore, not the same as showing that the Fine-Gray estimator is not semiparamatrically efficient.We then derive the efficient influence function which in principle paves the way to semiparametrically efficient estimation for the FGmodel in the case of right-censored data (Section 5.2).In Section 6, we propose a simpler estimator that is an augmented version of the FG-estimator, which also has improved efficiency compared to the FG-estimator.This augmented FG-estimator is easy to compute but is not equal to the semiparametrically efficient estimator.A further advantage of the two proposed augmented estimators are that they are robust in the sense that they are both consistent if either G c or F 2 is correctly specified.Large sample properties and inference for the augmented FGestimator are also given.Simulation studies reveal that the efficiency gain depends strongly on the occurrence of the other cause and can be substantial when the other cause is frequent and confirmed the robustness of the augmented FG-estimator (Section 7).Section 8 contains a worked example on data from multiple myeloma patients treated with allogeneic stem cell transplantation from the Center.The two competing events were relapse and treatment related mortality defined as death without relapse.Closing remarks, including software implementation, are given in Section 9.All technical computations are relegated to the Appendix, supplementary materials.

Notation and Data-Generating Model
Let T be an event time, C a censoring time, and X a set of covariates taking values in the covariate space X .Let ∈ {1, 2} denote the failure type such that = 1 indicates the event of interest and = 2 indicates competing events.We will allow for right-censoring and let T = T ∧ C and δ = I(T ≤ C), and = (1 − δ c ) with δ c = I(C < T) so that the observed data is D = ( T, , X).We let N c (t) = I( T ≤ t, ˜ = 0) denote the counting process relative to the censoring mechanism and Y(t) = I(t ≤ T) is the at risk indicator.Let N 1 (t) = I(T ≤ t, = 1) be the counting process related to the cause of interest and define the at risk indicator with respect to the cause of interest as Y 1 (t) = (1 − N 1 (t−)); these two quantities relate to full data and may therefore not be fully observed due to rightcensoring.We assume conditional independent censoring, that is, (T, ) are independent of C given X.We are interested in the cumulative incidence function F 1 (t, X) = P(T ≤ t, = 1|X) for t ∈ [0, τ ] with τ a fixed finite follow-up time.Our generative model for the CIF of cause 1 is the FG-model where 1 (t) = t 0 λ 1 (s)ds an increasing cumulative baseline function that is left unspecified and with β 0 the a set of regression coefficients.The key interest is to estimate β 0 , and the baseline function 1 (•).In contrast to Mao and Lin (2017), we do not make specific assumptions about the CIF of the other cause, F 2 (t, X) = P(T ≤ t, = 2|X), which is left unspecified.However, it must satisfy F 1 (t, x) + F 2 (t, x) ≤ 1 for all x ∈ X , and we also require that F 1 (∞, x) + F 2 (∞, x) = 1 for all x.To satisfy the constraint we parameterize F 2 (t, X) as where G 2 (t, X) is the survival function of T given = 2 and X with hazard λ 2 (t, X) that models the occurrence of cause 2, that happens when cause 1 does not occur.The key of this parameterization is that the parameter λ 2 (t, X) varies freely.We introduce and the censoring distribution as G c (t, X) = P(C > t|X) with hazard function λ c (t, X).Both λ c (t, X) and λ 2 (t, X) have corresponding cumulative hazard functions denoted by c (t, X) and 2 (t, X), respectively.The martingale (increment) for the censoring mechanism will be denoted dM c (t, X) = dN c (t) − Y(t)d c (t, X).Finally, we let the density of X be denoted by f (X) that is left unspecified.We observe a random sample of n individuals: ( T i , i , X i ), i = 1, . . ., n.

The Fine-Gray Estimator
Fine and Gray (1999) considered the estimating function where w i (t, , n and where we use the convention that for a p × 1 vector a, a 2 = a ⊗2 = aa T .Let Sometimes we will write additional arguments, for example, S FG n (β, Ĝc ) to explicitly state that we use estimated weights based on Ĝc .We denote β such that S FG n ( β, Ĝc ) = 0 as the FG-estimator.In their original proposal, Fine and Gray assumed that the weights are independent of the covariates, that is, w i (t, X i ) = w i (t), i = 1, . . ., n, but the extension to covariate depending censoring is immediate.In the case of covariate independent weights, one may estimate these by where Ĝc is the Kaplan-Meier estimator of the censoring distribution; see He et al. (2016) for a detailed discussion of the properties of the FG-estimator with censoring weights that depend on covariates through a Cox model.The influence function of the FGestimator is given in Fine and Gray (1999), where the first term is what would be achieved if the censoring distribution was known, and the second term is due to the variability from the Kaplan-Meier estimator used to estimate the censoring distribution.The function q(t) reflects that the censoring only affects the terms related to cause two jumps.It can be written as (see Appendix A, supplementary materials) (t), with S(t) = P(T > t), the censoring adjustment term can therefore be written as dM c (t). (3) Importantly we also note that q(t) = 0 since e X (t) gives the expected value of the covariates among those with Y 1 (t) = 1 at the time of an event of type 1.We shall later see that this expression resembles the augmentation term of the FG-estimator that we compute to increase efficiency.In the following sections we study properties of the FG-estimator in terms of efficiency.Note: It can be shown the second term φ FG,2 is equivalent to the projection of φ FG,1 onto the nuisance space from the censoring distribution (a precise characterization of this space can be found in Appendix D, supplementary materials.Therefore, if one uses only the first term φ FG,1 to estimate the standard error of β, as suggested in Geskus (2011), and criticized in Li, Gray, and Fine (2012), it will lead to an conservative estimator of the uncertainty.

Efficiency in Absence of Censoring
We will start by considering the case where there is no censoring, which we also refer to as the full data case.In terms of notation we let C = +∞ above.And then for example Y(t) = I(t ≤ T) and, for j ∈ {1, 2}, δ j = I( = j) is the event type indicator relative to cause j.To derive the semiparametrically efficient estimator, we consider the geometry of the nuisance tangent space that can be deduced from the log-likelihood function.For a single observation (T, , X), we show in Appendix B.1, supplementary materials, that the log-likelihood function simplifies to: where denotes the set of unknown parameters.The loglikelihood function thus consists of three terms, one depending on the parameters of the CIF for cause 1, one depending on unspecified hazard parameters for the CIF for cause 2, and the last depending on the unspecified distribution for the covariates.We note that the parameterization of F 2 satisfying the cumulative incidence constraint is needed to get terms with parameters that vary freely.Because of this we can write the full data nuisance tangent space F as a direct sum of three orthogonal subspaces, see Appendix B.1, supplementary materials.We can then assess the efficiency of the FG-estimator.
Theorem 1.If the data are fully observed (C = ∞) then the FG-estimator is semiparametrically efficient.
Proof.Based on the decomposition of the full data nuisance tangent space as a direct sum of three orthogonal subspaces we can use similar arguments as in section 5.2 of Tsiatis (2006) to show that the efficient score is see Appendix B.1, supplementary materials, for details.Clearly, ( 4) is equivalent to the influence function of the FG-estimator (2), in the case of no censoring.The FG-estimator is thus efficient when there is no censoring.
Another useful observation is that when the incidence of the competing risk is low, F 2 (t, X) ≈ 0, the FG-estimator becomes close to being efficient, even if there is right-censoring, as it gets closer to the partial likelihood estimator for the Cox model.When the incidence of the competing risk is high there is room for efficiency gain over the FG-estimator as we illustrate in Section 7.

Efficiency in Presence of Right-Censoring
We will now study the more interesting and realistic case where there is right-censoring, and first show that the FG-estimator is no longer semiparametrically efficient as opposed to the situation without censoring.We define Z = (T, , X) as full data that is without any censoring and let D = ( T, ˜ , X) denote the observed data.Tsiatis (2006) shows in Theorem 9.2 that any influence function based on observed data in the case of rightcensored data can be written as an AIPWCC (augmented inverse probability weighted complete case) where the second term on the right-hand side of the latter display is an element of the so-called augmentation space A 2 = { α 4 (t, X)dM c (t, X)|∀α 4 (t, X)} and φ F (Z) is an influence function corresponding to the full data case (i.e., no right-censoring).By varying the element in the augmentation space, display (5) gives a class of influence functions defined by φ F (Z).The optimal influence function in this class is where (•| A 2 ) denotes the projection operator onto the augmentation space, see Appendix C.1, supplementary materials, for more details on why this is the optimal influence function.This corresponds to (5) with , see section 10.3 of Tsiatis (2006) for more details.In this section we have allowed the censoring distribution to depend on the covariate X, but we may sometimes suppress this dependency as the general arguments concerning efficiency for the FG-model are the same whether or not the censoring depends on the covariates as long as a correctly specified censoring model is applied in the final estimation step.

Fine-Gray Estimator
We deal with the FG-estimator as it is was originally proposed by Fine and Gray (1999), that is, with covariate independent weights.The case with covariate dependent weights can be handled similarly.We start by rewriting the influence function of the FG-estimator as an AIPWCC, see Appendix C.2, supplementary materials, for details, where and The sum of the two last terms of ( 6) is indeed an element of the augmentation space.Hence, the FG-estimator belongs to the class of AIPWCC's defined by the full data influence function φF (Z).This enables us to show the following result.
Theorem 2. The FG-estimator is not semiparametrically efficient when there is right-censoring and when F 2 (t|X) > 0.
The proof is given in Appendix C.2, supplementary materials.The FG-estimator is not semiparametrically efficient in the case of right-censored data as it is not even optimal in the class of AIPWCC's defined by φF (Z) to which it belongs.In the next section, we show that the semiparametrically efficient influence function does not belong to this specific class of AIPWCC's, so, in theory, there is even more room for improvement in terms of efficiency.

The Semiparametrically Efficient Estimator
We derive now the semiparametrically efficient estimator in the FG-model.By this we mean the most efficient estimator when the only structure we put on the joint distribution of Z = (T, , X) is the FG-model (1).One might attempt to obtain the efficient score function by projecting the parametric score in β based on the observed data likelihood function onto ⊥ , the orthogonal complement of the nuisance tangent space.This appears difficult, however, see Appendix D, supplementary materials for further details.We propose an alternative way of getting the efficient influence function suggesting how to obtain the semiparametrically efficient estimator for the FG-model.The efficient estimator is in the class of doubly robust AIPWCC estimators, see section 11 of Tsiatis (2006).Specifically, there exist a function of the full data for some v c (u, X) so that the observed data efficient influence function is given by If we could find the function v c (t, X), and thus B F eff (Z), then we could use the empirical version of the above efficient influence function as an estimating equation.Unfortunately, the full data efficient influence function is not the wanted B F eff (Z) but these two functions solve the equation where and F is the full data nuisance tangent space.Equation ( 8) is a complicated integral equation.Surprisingly, it is possible to construct an iterative procedure that solves this equation in the case where The proof is detailed in Appendices E.1, E.2, and E.3, supplementary materials.We have implemented this procedure in the simulation setting described in Section 7, and it appeared to work well in that setting with a similar performance as the NPMLE estimator, see Section 7 for details about the NPMLE in the considered setting.However, we refrain from further study of the efficient estimator due to its more complicated structure than that of the augmented FG-estimator which we advocate to use instead.This estimator will be more efficient than the FG-estimator in the case where there is right-censoring and when the incidence of the competing cause is not negligible.The augmented FGestimator is studied in more detail in the next section.

Estimation
In this section we explore what we call the augmented FGestimator (see Figure 1 for a graphical representation).By this we mean the estimator that is obtained by augmenting the FGestimating equation which is the optimal influence function in the class of influence functions that φ FG (Z) belongs to.The term φ FG,A 1 (D) can be estimated by While the augmented estimating equation requires to have a model for S(t, X), that is, for F 2 (t, X) the CIF of the competing risk, it also brings some robustness against possible misspecification of F 2 (t, X) or G c (t, X), as stated in the following theorem.
We also observe that the augmentation term can be interpreted as adding a term for the expected value of the part of the score that is not fully observed due to censorings (minus its mean), since indeed E( ) the working all-cause survival, and the working weights.The estimator β, defined as the zero root of the augmented estimating function correctly specified and at least one of G * c and F * 2 is correctly specified.
The proof of Theorem 3 is given in Appendix F, supplementary materials.In Theorem 3, the generative and working models are defined over the same set of covariates.However some of the covariate effects may be disappearing in the generative and/or working model so the theorem does not assume that one has identified the right set of covariates for all models.It is critical that H is computed under F * 1 to obtain consistency when the censoring model is not correctly specified.
Note: a similar approach can be used to derive an augmented estimator for the cumulative baseline hazard function that is robust to a misspecification of the censoring model, see Appendix G, supplementary materials for details.

Asymptotic Distribution of the Estimator
We now turn to the asymptotic properties of the augmented FGestimator.We do this in the setting where the censoring weights w i (t, X i ) do not depend on covariates for simplicity.We have working models for the augmentation term and the censoring distribution represented by estimators F * 1 , F * 2 , G * c , based on the iid sample O.Given these estimators we can estimate the augmentation term (denoted S A, * n ) and the weights (denoted w * i (t)).The augmented FG-estimator β is thus given as the solution to In practice, a possible estimating procedure is to use a Kaplan-Meier estimator or a Cox model for the censoring model and use the NPMLE of Mao and Lin (2017) or other estimators to estimate the CIFs.We here used the Aalen-Johansen product limit estimators as working models for F * 1 and F * 2 as this provided easy computable estimates that further satisfied the natural constraint for the cumulative incidence functions.The following corollary states sufficient conditions for the estimators to ensure consistency of the augmented FG estimator β: This results follows from the Theorem 3, noticing that when G * c is consistent, the augmentation term in the score converges to 0 (regardless of F * 1 and F * 2 ) while the first term of the score does not involve F * 1 or F * 2 .The estimator is asymptotically normal under regularity conditions as in Fine and Gray (1999), we give additional details in Appendix H, supplementary materials, and here only show how to estimate its variance.We consider the situation where we want to demonstrate the efficiency gain, namely when the working censoring model, G * c = G c , is correct.In this case the asymptotic expansions are also simpler.We make the arguments even simpler by considering only binary covariates such that we can use the stratified NPMLE's F * 1 and F * 2 to estimate F * 1 and F * 2 , respectively.First, the variance of the FG estimator is estimated by ) ŵi (t)dN 1,i (t) the second derivative of the FG-score, and with the estimated influence function where The augmented FG estimator in contrast has a variance that is consistently estimated by In the case where the working models F * 1 and F * 2 are correctly specified, n −1 q(t) and n −1 q * (t) have the same limit and then cancel out.
The above arguments can be extended to the situation where the censoring distribution depends on covariates via a Cox model, in which case G c , q and q * are conditional on the covariates.The expression of the influence function ( φFG ) and information matrix (I) for the corresponding Fine and Gray estimator can be found in He et al. (2016).The influence function of the corresponding augmentation term can be split into a martingale term H(t, X) (t,X) dM c (t) and another term involving the influence function of the nuisance parameters (e.g., the regression parameters of the Cox models).This last term can be identified using a functional delta method, see Appendix C of Ozenne et al. (2020) for similar derivations on another augmented estimator.

Simulations
We assumed that F and β 1 = (0, −0.1), and that the other cause was given by F and β 2 = (−0.5,0.3), a parameterization that satisfies the constraint F 1 + F 2 ≤ 1.We considered ρ 1 = 0.2, 0.4 and ρ 2 = 1, 10 to get different levels of the two causes.The setting where ρ 2 = 10 makes it important to consider the constraint F 1 + F 2 ≤ 1 in the simulations and when estimating the cumulative incidences.The covariates were either two independent identically distributed binomial's with P(X j = 1) = 0.5 (for j = 1, 2), or two independent identically distributed normally distributed random variables with standard deviation 1.When ρ 1 = 0.2 the risk spread of F 1 was between 11% and 20%, for all other combinations of the settings, when ρ 1 = 0.4 the risk spread of F 1 was between 23% and 33%, and similarly for ρ 2 = 1 the risk spread of F 2 was between 40% and 70%, and when ρ 2 = 10 the risk spread of F 2 was between 65% and 90%.We consider only the time interval from [0, 6].To generate independent right-censoring we considered a censoring time that was generated from Cox's regression model with hazard λ c (t, X) = r c exp(X 1 β c ) with β c being either 0, or 0.5, and r c was chosen to make the percent of censorings be at 10%, 25%, or 40% for all settings.We base estimation on the time from [0, 6], and the censoring probabilities refer to censoring occurring before time 6, as the data is otherwise fully observed out to 6.We considered the sample sizes n = 200, 400, and 800 and used 10,000 repetitions.
We estimated the parameters using the FG model (FG), a FG model using a Kaplan-Meier stratified by X 1 (FG-CM), and augmented the FG-estimator (AUG).The augmentation term was computed using stratified Aalen-Johansen estimates that thus lead to a correct model in the case of the binomial covariates, and only an approximation for the two continuous covariates where we considered four groups with 25 % in each.This makes it possible to evaluate also the performance of the augmentation term when it only provides an approximation.We denote the estimator based on the approximative augmentation term with a * (AUG*).
In the case with two binary covariates we also directly computed the NPMLE for F 1 (t, X) on FG form and with F 2 (t, X) = (1 exp(− 2 (t, x)))(1 − F 1 (τ , X)) by maximizing the likelihood for the censored data based on the parameters given by β and the increments at the 1 and 2 jumps, along the lines of Mao and Lin (2017).Our simple optimizer worked rather well, but as the simulations also showed had some difficulties for some of the simulation settings.We did not compute the standard errors in this setting, but wanted it as a reference for looking at the efficiency gain for our augmented estimator.

Efficiency Gain
The relative efficiency was computed by looking at the variance of the augmented estimator relative to that of the FG-estimator using correct models for F * 1 and F * 2 (AUG), and in the simulation setting with continuous covariates where we only used approximate models for F * 1 and F * 2 based on 4 strata (AUG*).We further computed the NPMLE and also report its efficiency gain (NPMLE).We display the results for both covariates, that are denoted −1 and −2 in Figure 2.
We note that the efficiency gain can be large when F 2 is large, while when F 2 is more moderate the augmentation does not lead to much gain.We also considered other levels of F 2 , and roughly speaking the relative variance appeared to increase linearly around 1% with ρ 2 .This considerable efficiency gain shows the augmentation can improve efficiency quite a lot, and reduce the width of a possible confidence interval by up to 12% in the considered settings.In addition we see that using the correct models for F * 1 and F * 2 lead to slightly better efficiency that when using only approximative models (AUG vs. AUG*), but the augmentation still leads to some gain even when the models are only approximative.Interestingly, we note that the efficiency gain relative to the FG-model is not monotone in the amount of censoring.Efficiency relative to that of the FG estimator of augmented FG-estimator (AUG), approximative augmentation (AUG*) and NPMLE relative to variance of FGestimator based on 10,000 realizations.The number after the name of the estimator refer to the type of covariates included in the model: 1 for binary and 2 for continuous.ρ 1 and ρ 2 are multiplicative factor for, respectively, 1 and 2 ; a larger value is thus associated with a higher number of events for the corresponding cause (all else being equal).
The NPMLE lead to even better efficiency when F 2 was large, but was only considered in the setting with the binary covariates where it was possible to compute it in a simple way.This estimator, however, were more difficult numerically and in some settings did not show gain compared to the simple FGestimator.The NPMLE had also some instability and therefore it was necessary to report a robust measure of variation for the cases where ρ 1 = 0.2 and ρ 2 = 10.We here used the MAD, the median absolute deviation that is the median of the absolute deviations from the median.

Bias and Coverage
In supplementary materials I.1 we show that the bias of all estimators, and these simulations confirm that augmented estimator is doubly robust and essentially unbiased when one of the needed models are correct, and that bias appears otherwise.In addition we also demonstrate that the NPMLE of course is biased when for example F 2 is incorrect.
In supplementary materials I.2 we report the coverage which is generally close to the nominal 95 % level for all estimators, and that the estimated standard errors thus work well.

Illustration
We considered data from multiple myeloma patients treated with allogeneic stem cell transplantation from the Center ).These data were also considered in He et al. (2016) and Mao and Lin (2017).
We first looked at the censoring distribution and found that gp, dnr and preauto were highly significant predictors, in contrast to ttt24 that was not significant.Since the censoring distribution was strongly dependent on gp, dnr and preauto the simple FG estimator could be biased.
We here report the regression coefficients for cumulative incidence of relapse using the FG estimator (FG), our augmented estimator (AUG), and the FG estimator with censoring weights that were based on stratified Kaplan-Meier estimators based on all levels of gp, dnr, and preauto (FG-CM).We also computed an additional augmented estimator that also allowed the stratified censoring weights used for the FG-CM, to see if this lead to a small gain in efficiency (AUG-CM).
We note that the AUG and the FG-CM estimators both provide estimates that are rather different from the simple FG, and also that these are two estimates are quite similar.The augmented estimator thus displaying its double robustness property.The variable of key interest was the period variable, gp, and this is rather strongly biased using the simple FG estimator, whereas other risk factors lead to quite similar estimates.This lead to a relative difference in cumulative incidence at around 15% for relapse when comparing the early transplant period to the late transplant period.
In addition with the correctly specified censoring weights we note that the augmented estimator AUG-CM provides a small efficiency gain compared to the FG-CM that both should be unbiased and therefore comparable (Table 1).
To give an idea about the practical use of the efficiency improvement from the augmentation term, we simulated a dataset similar to that considered but with 100,000 observations, here our efficient implementation ran the standard FG model in 3 sec, and computed the augmented estimator in 4 sec.Further, fitting the FG model with censoring adjustment for each for the 16 strata defined by the four binary covariates ran in 10 sec.The 864 observations of the data ran in 0.03, 0.04, and 0.08 sec, for the three considered estimators, respectively.

Discussion
We have shown that the estimator of Fine and Gray (1999) is semiparametrically efficient in the case of fully observed data with no right-censoring, and importantly, that it is not semiparametrically efficient in the case of right-censoring when F 2 is not zero.We derived an augmentation term to increase its efficiency.This term is easy to compute and leads to an important increase in efficiency when F 2 is large.We also derived the semiparametrically efficient estimator that is even more efficient but has a more complicated structure due to it being a solution to an integral equation.When F 2 is not large the efficiency gained by the two proposed estimators is negligible as expected.The augmented FG-estimator is furthermore double robust, so consistent if either the censoring model or the working model for the cumulative incidence functions were correctly specified.The augmented FG-estimation procedure is implemented in the R-package mets, Holst and Scheike (2021), and demonstrated in a vignette in the package.
In the situation where the censoring weights are based on stratified Kaplan-Meier estimates, for example, based on two binary covariates (X = (X 1 , X 2 )), then the influence function of the FG-estimator with the stratified weights becomes, following the expansion of Fine and Gray (1999) We note that this is also the influence function of the augmented estimator in the case where the censoring distribution does not depend on X.So therefore the estimator obtained from using stratified weights even when it is known that the censoring weights are known not to depend on X achieves the same efficiency as the augmented FG-estimator.Interestingly, the augmentation term in this case is obtained not from a working model but from averaging H(t, X)I(T ≤ t, = 2))δ/G c (T, X).

Supplementary materials
The Supplementary Material contains several technical arguments as well as a couple of supplementary simulations.

Figure 1 .
Figure 1.Geometrical view of the efficiency gained using the augmented FG instead of the traditional FG estimator.The figure represent the orthogonal of the nuisance tangent space ( ⊥ , see Appendix D, supplementary materials for an explicit definition of ) as it contains all influence functions.Each vector represent the influence of an estimator: traditional FG (φ FG (D)), augmented FG (φ FG,A (D)), semiparameteric efficient (φ 0 eff (D)).The Euclidean norm of a vector represent the variability of the corresponding estimator.Augmenting the FG estimating equation is equivalent to finding the element of the augmentation space A2 leading to the smallest the norm.This is achieved when usingφ FG (D)| A 2 the projection of φ FG (D) onto A 2 .Further efficiency gain can be obtained by subtracting the component orthogonal in the (full) tangent space, leading to φ 0 eff (D).The (full) tangent space T is the direct sum of and the tangent space relative to the parameter of interest β.

Figure 2 .
Figure2.Efficiency relative to that of the FG estimator of augmented FG-estimator (AUG), approximative augmentation (AUG*) and NPMLE relative to variance of FGestimator based on 10,000 realizations.The number after the name of the estimator refer to the type of covariates included in the model: 1 for binary and 2 for continuous.ρ 1 and ρ 2 are multiplicative factor for, respectively, 1 and 2 ; a larger value is thus associated with a higher number of events for the corresponding cause (all else being equal).
for International Blood and Marrow Transplant Research (CIBMTR)(Kumar et al. 2011) The data used in this article consist of patients transplanted from 1995 to 2005, and we compared the outcomes between transplant periods: 2001-2005 (N = 488) versus 1995-2000 (N = 375).The two competing events were relapse and treatment-related mortality (TRM) defined as death without relapse.Kumar et al. (2011) considered the following risk covariates: transplant time period (gp (main interest of the study): 1 for transplanted in 2001-2005 versus 0 for transplanted in 1995-2000), donor type (dnr: 1 for Unrelated or other related donor (N = 280) versus 0 for HLA-identical sibling (N = 584)), prior autologous transplant (preauto: 1 for Auto+Allo transplant (N = 399) versus 0 for allogeneic transplant alone (N = 465)) and time to transplant (ttt24: 1 for more than 24 months (N, 289) versus 0 for less than or equal to 24 months (N = 575)) t, X i ), see Appendix C.3, supplementary materials.
* c the martingale process associated with G The estimator β is a consistent estimator of β if (i) the working model for F *

Table 1 .
Regression effects for cumulative incidence regression of relapse estimated using the FG-estimator or the propose augmented (AUG) estimator.