Developing a Targeted Learning-Based Statistical Analysis Plan

Abstract The Targeted Learning estimation roadmap provides a rigorous framework for developing a statistical analysis plan (SAP) for synthesizing evidence from randomized controlled trials and real world data. Learning from these data necessitates acknowledging potential sources of bias, and specifying appropriate mitigation strategies. This article demonstrates how Targeted Learning informs different aspects of SAP development, including explicit representation of intercurrent events. Guiding principles are to (a) define the target parameter of interest separately from the model or estimation procedure; and (b) use targeted minimum loss-based estimation (TMLE) and super learning for causal inference. These flexible methodologies can be entirely pre-specified while remaining data adaptive; and (c) carry out a nonparametric sensitivity analysis to evaluate the plausibility of a causal interpretation of the estimated treatment effect, and its stability with respect to violations of underlying casual assumptions. The roadmap promotes the principles and practices set forth in the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use Guideline. An annotated SAP, checklists for pre-specifying the TMLE and super learning procedures, and sample R code are provided as supplementary materials.


Introduction
Existing guidelines for developing a statistical analysis plan (SAP) stress their importance in providing clarity, transparency and reproducibility (United States Food and Drug Administration 2013; Public Policy Committee, International Society of Pharmacoepidemiology 2016; Gamble et al. 2017;ICH 2019). This article complements that literature by demonstrating the utility of following the Targeted Learning Roadmap during SAP development . Targeted Learning (TL) offers a framework for causal inference in randomized controlled trials (RCT), RCTs incorporating real world data (RWD), and observational studies (OS) (van der Laan and Rose 2011; Ho et al. 2021;Gruber et al. 2022). The TL approach defines a clinical question of interest in terms of a causal inference problem, and uses targeted minimum lossbased estimation (TMLE) with an advanced machine learning algorithm known as super learning (SL) to estimate treatment effects (van der Laan and Rose 2011). TMLE+SL addresses potential biases due to non-randomization, intercurrent events, and missing outcome data, and can be pre-specified in an entirely transparent and reproducible manner. The TL approach recommends sensitivity analyses to assess the impact of "causal bias" stemming from, for example, unmeasured confounding, on the validity and reliability of a causal interpretation of the study finding. The entire process is described in a step-bystep guide known as the TL estimation roadmap. Following the roadmap enhances the ability to adhere to principles and practices set forth by the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH-E9(R1) Guideline) (ICH 2019).
The Guideline lists five elements of a statistical estimand: population, treatment, outcome variable, summary measure, and intercurrent events that occur after treatment initiation, altering the treatment strategy or outcome of interest. Targeted Learning addresses each of these elements. In particular, Targeted Learning emphasizes precisely formulating a statistical question that clearly address the scientific causal question of interest. Unlike a traditional parametric modeling approach, the causal quantity of interest is defined in terms of the distribution of the data, not as a coefficient in a regression model. Separating the parameter definition from the choice of estimator strengthens interpretability of the findings. Invoking a counterfactual framework for defining the target causal estimand, such as the Neyman-Rubin causal model or Pearl's nonparametric structural equation models, allows for explicitly incorporating intercurrent events into the representation of the causal quantity (Pearl 2000;Sekhon 2008). Estimation using TMLE and SL can be completely pre-specified, while remaining data adaptive and providing valid inference.

Methods
The Targeted Learning Roadmap is a practical guide to statistical learning from data ( Figure 1) van der Laan and Rose 2011;Ho et al. 2021). Starting with a well-defined question that can be answered from data, the steps in the roadmap provide a systematic process for creating and evaluating the evidence extracted from data. These steps facilitate adhering to four central precepts extracted from the ICH-E9(R1) guidelines: (a) construct the estimand corresponding to a clinical question of interest; (b) the description of the estimand should reflect the clinical question of interest in respect of intercurrent events; (c) the statistical analysis should be aligned to the estimand; and (d) sensitivity analysis should explore the robustness of study findings under violations of untestable assumptions (ICH 2019).
Next we step through the TL roadmap, using a fictitious RCT as a running example, the Randomized Trial of Drug for Migraine And Headache Pain (RDMAP) study. This study compares the effect of study Drug A with active comparator Drug B on migraine and chronic headache at the end of 52 weeks of follow-up. A completed annotated sample SAP is provided as supplementary materials.
Step 0. Formulate a well-defined substantive question, and precise description of the experiment generating the data A well-defined question addresses all five elements of the statistical estimand defined by the ICH-E9(R1) Guidelines: population, treatment, outcome, summary measure, and potential intercurrent events (ICH 2019). In our example these elements are defined as follows, • The study population consists of adults, 18-65 years, who have consulted a medical professional for migraine of chronic headache within the past 24 months. Inclusion criteria include self-report of at least two headaches per month in the past 6 months. Exclusion criteria are pregnancy or known malignancy, cluster headache, suspicion of serious pathological etiology, cranial neuralgia. • Subjects will be randomized 1:1 to the treatment arm (Drug A, 500 mg tablet, once daily) or comparator arm (Drug B, 500 mg tablet, once daily). A 12 week supply of the assigned drug will be dispensed at baseline, and at three subsequent in-person visits every 12 weeks. • The primary outcome is mean weekly headache score (MWHS) 12 months post-randomization. The MWHS will be an average of the weekly headache scores during the final four weeks of follow-up, and a weekly headache score is the sum of seven daily self-reported headache scores on a scale of 0-10.
• The summary measure of interest is the marginal additive treatment effect (ATE). • Foreseeable intercurrent events include treatment noncompliance (discontinuation or switch), incomplete capture of the covariates.
For a PP analysis the question of interest is, "What is the population-level effect of taking Drug A versus Drug B on migraine and chronic headache in adult patients who meet eligibility criteria if they are adherent to their assigned medication?" where the effect is measured as the difference in MWHS at 12 months post-randomization under each treatment. For the ITT analysis, the question is, "What is the population-level effect of being assigned treatment with Drug A versus Drug B on migraine and chronic headache in adult patients who meet eligibility criteria, regardless of adherence?" The effect is again measured as the difference in MWHS at 12 months postrandomization under each assigned treatment.
The timeline for data accrual in the RDMAP study is shown in Figure 2. The covariate vector at each time point, t, is denoted by L t . Covariates collected at baseline (t = 0) and subsequent clinic visits are summarized in the figure. By convention, the final set of covariates, L 4 , includes outcome Y. Indicators of treatment received at baseline and subsequent clinic visits are denoted by A 0 , A 1 , A 2 , A 3 , (missing if post LTFU). Indicators of remaining uncensored are denoted by C 1 , C 2 , C 3 , C 4 . The longitudinal data structure is given by O = (L 0 , A 0 , C 1 , A 1 , L 1 , C 2 , A 2 , L 2 , C 3 , A 3 , L 3 , C 4 , L 4 ).

Step 1. Define a realistic statistical model
A statistical model, M, is a collection of possible probability distributions of the data. A main terms linear model regressing outcome Y on binary treatment indicator A, (A = 1 for study Drug A, A = 0 for comparator Drug B), optionally adjusting for suspected covariates, W, is often specified by default. However, this defines an underlying statistical model, M, that contains only distributions of the data that enforce a monotonic relationship between each continuous covariate and the outcome, and preclude treatment effect heterogeneity. Such a model may be far from ideal. Targeted Learning instead defines a realistic statistical model, M, respecting the time ordering of the data generating process and consistent with study inclusion/exclusion criteria. In our example, since baseline treatment is randomly assigned, we know that P(A 0 = 1|L 0 ) = 0.5, although there may be chance imbalances. However, because the dropout mechanism is unknown, we cannot realistically impose parametric modeling assumptions on the conditional probability of being LTFU.

Step 2. Specify a causal model and causal quantity of interest
A causal model describes presumed causal relationships in the data and known conditional independencies, even beyond those implied by the time ordering. Subject matter experts and statisticians can work together to construct a directed acyclic graph (DAG) that explicitly depicts this causal knowledge (Pearl 2000). Analyzing the structure of the DAG can facilitate identifying potential confounders of the associations between treatment, drop out, and the outcome. Carrying out this exercise during SAP development motivates planning to collect these data throughout the course of the study.
Our causal estimand of interest, the ATE, is defined as a mapping from the causal model to a marginal additive treatment effect. In counterfactual notation the estimand can be expressed in the point treatment setting as ψ causal = EY 1 − EY 0 , where Y 1 is the counterfactual outcome a subject would experience when treated with Drug A, and Y 0 is the counterfactual outcome a subject would experience when treated with Drug B. The expectation is with respect to the entire population of randomized subjects.
This point treatment formulation may be appropriate for an intention to treat (ITT) analysis, when any LTFU is only a function of pretreatment patient characteristics. However, it is problematic for a PP analysis, because it ignores intercurrent events that potentially disrupt the treatment-outcome association. As instructed by the ICH-E9 (R1) Guidelines, the Targeted Learning approach formalizes the PP causal quantity in a way that explicitly represents the true process giving rise to the data over time. This representation explicitly includes treatment noncompliance, and missing outcome data.
For a TL-based definition of the PP estimand, the causal contrast of interest is the difference between the mean counterfactual outcomes under treatment at all time points and no treatment at all time points, with no LTFU, that is, the average treatment effect among the entire randomized population. Contrast this with an SAP where the PP population is defined as the subgroup that was observed to adhere to treatment, to be determined after the trial concludes, though possibly before outcomes are unblinded. The marginal treatment effect among this compliant population is not necessarily the same as the marginal effect among the randomized population (Imbens and Angrist 1994). If the compliant population is not easily characterized, it is not clear for whom the study finding holds. In the TL-based analysis, the effect estimate is with respect to the original study population.
In the causal inference literature treatment nodes and censoring nodes are collectively referred to as the set of intervention nodes, because the causal contrast of interest involves intervening to deterministically set these nodes to correspond to the longitudinal regime of interest. The causal parameter is defined in terms of counterfactual values for all the intervention nodes. The causal PP parameter is denoted ψ causal where the subscripts indicate the counterfactual values when intervening to set A 0 through A 4 and C 0 through C 4 to their desired values.
For the ITT analysis, we are concerned with intervening on baseline treatment assignment at t = 0, and setting all censoring nodes to indicators of remaining uncensored through the end of follow-up. In the counterfactual model, the nodes that represent later treatments, (A 1 , A 2 , A 3 ), are no longer intervention nodes. The causal quantity of interest for the ITT analysis can be written as,ψ causal The specifications for both the ITT and PP estimands anticipate that intercurrent events will happen. This explicit representation also guides the design of study with respect to what variables need to be measured to make the sequential randomization assumption plausible.
When the outcome is binary the ATE is equivalent to a risk difference. Other summary measures that are commonly of interest include the risk ratio, odds ratio, hazard ratio, cumulative incidence difference or ratio. Though it is not a concern in our running example, in the presence of competing risks intervening on censoring nodes may be unrealistic, and alternate definitions of the causal parameter may better address the substantive question (Rudolph et al. 2020). One of several strategies suggested in the ICH-E9 (R1) Guidelines is to consider incorporating the competing risk into the outcome definition (ICH 2019). A continuous time TMLE for competing risks has been described in the literature (Rytgaard and van der Laan 2021). Each competing risk is considered to be an outcome that arises from a multivariate outcome process that jumps when the particular cause of failure occurs. The TMLE approach carries out a complete analysis of all absolute risks simultaneously.

Step 3. Specify the statistical estimand
Identifying assumptions link the causal parameter, defined in terms of counterfactuals, to a statistical parameter that can be estimated from data, ψ stat . This statistical estimand can be written as a difference in conditional mean outcomes under two different treatment regimes or interest, ψ stat = ψ stat (Qā 1 ) − ψ stat (Qā 0 ). In our example, for the PP analysisā 1 is the treatment regime in which the subject receives Drug A at each time point, and remain uncensored, andā 0 is the treatment regime in which the subject receives Drug B at each time point, and remain uncensored.Qā is defined as a series of K + 1 iterated conditional means, where K is the number of time points,Q a 1 = Q a Y ,Q a L K−1 , . . . ,Q a L 0 (van der Laan and Gruber 2012; Schnitzer 2020). For example, under intervention a = (1, 1, 1, 1, 1, 1, 1, 1), this series is given by, andC t , denote covariate, treatment, and censoring histories, respectively, from t = 0 through t. ψ causal is identifiable from data under the following causal assumptions. The sequential consistency assumption is an assumption that for each regime of interest,ā, for any individual i observed to follow that regime, Y i = Y(ā) . In an ITT analysis this assumption is met by design, as long as the outcome is measured without error. In a PP analysis, if observed treatment is not concordant with the intervention of interest at some time t, then this assumption is not met. Recall that the subscripts in the representation of ψ causal PP describe the counterfactual value at each intervention node. Under treatment noncompliance at time point t the observed sequence of values no longer matches the desired counterfactual sequence. Thus, there is no basis to claim that the recorded outcome value corresponds to the counterfactual outcome under the intervention of interest. One must acknowledge that the counterfactual outcome is unavailable, by setting its value in the dataset to missing, and setting all indicators of being uncensored at times t + 1 through t = K to zero, or "censored. " The sequential positivity assumption states that within strata defined by confounders, subjects have a nonzero probability of receiving each level of treatment. If this assumption is not met there are areas where the data provide no support for evaluating a causal contrast, without imposing additional modeling assumptions. The positivity assumption must hold at each time point with respect to the cumulative product of the conditional probabilities associated with following the intervention of interest at each time, t. The probability of remaining uncensored can be 1, since no intervention of interest involves intervening to impose censoring. Denoting each cumulative product through time t as G 0:t , the positivity assumption states that 0 < G 0:t < 1. Randomization baseline treatment assignment guarantees this assumption is met at t = 0. However, if there is LTFU, that guarantee does not extend to all time points t > 0. In a PP analysis, treatment switching also threatens the positivity assumption. Propensity scores and missingness probability diagnostics allow us to assess this assumption with respect to covariates measured up to each time point.
The sequential randomization assumption (SRA) states that treatment and/or censoring at time t is independent of the counterfactual outcome given the past. This is an extension to longitudinal settings of the missing at random (MAR) assumption, A t , C t ⊥ ⊥ Yā|Ā t−1 ,C t−1 ,L t−1 . Under causal models that provide additional information about known conditional independencies in the data, a weaker version of this assumption might be sufficient. For example, if treatment and censoring at time t were known to depend only on baseline values and the most recent measures of time-varying covariates, then replacingL t−1 with L t−1 , L 0 would be sufficient, A t , C t ⊥ ⊥ Yā|Ā t−1 ,C t−1 , L t−1 , L 0 .

Step 4. Estimation from data, respecting M and statistical inference
Step 4 requires suitable methodology for estimating the statistical parameter defined in Step 3. The estimation tools of Targeted Learning are TMLE and SL. TMLE has been shown to produce reliable study findings while depending on weaker assumptions than traditional parametric modeling (Gruber and van der Laan 2013). TMLE can appropriately adjust for timevarying confounding due to, for example, components of L t affected by prior treatment that also impact the outcome, while traditional parametric modeling approaches (e.g., linear, logistic, Cox proportional hazards) cannot. Although some other causal inference methods, including inverse probability weighting (Hernan et al. 2000) and G-computation (Bang and Robins 2005) can also correctly adjust for time-varying cofounders, these are consistent and asymptotically linear only under a smaller, more restrictive, statistical model.
As a double robust estimator, TMLE produces unbiased estimates of the treatment effect as long as either the outcome regression or both the PS and censoring mechanism are modeled correctly. For specific examples of carrying out longitudinal data analyses we refer the reader to publications that emphasize practical applications (Schomaker et al. 2019;Sofrygin et al. 2019;Ferreira et al. 2020;Gruber et al. 2022). For practical guidance on specifying a SL we refer the reader to a recent publication describing each essential step: choosing an appropriate loss function for the task at hand, defining the cross-validation scheme, and constructing the SL library based on characteristics of the data ). The library should include a rich set of diverse algorithms, which can be coupled with screening algorithms to reduce dimensionality. In summary, analytic choices should be tailored to the characteristics of the problem, and what is known about the data and the plausibility of the identifying assumptions linking the statistical and causal estimands.
Each choice can and should be pre-specified in the SAP. A priori specifications enhance analytic reproducibility, interpretability, and reliability. Checklists of options and settings for analyses using the tmle, ltmle, and SuperLearner R packages are provided as supplemental materials Polley 2021;Schwab 2021). These checklists are intended to illustrate what decisions must be made in order to pre-specify the entire analysis. The values provided are example specifications, not general recommendations.
Handling missing covariates: It is important to distinguish between missing values in covariates intended to be captured for all subjects and informed presence, the existence of covariate values due to extra surveillance in only some portion of subjects (Goldstein et al. 2016). The absence of a value sometimes indicates that there was no need to measure the covariate. Thus, its unknown underlying value is not related to a treatment or drop-out decision (United States Food and Drug Administration 2021). The absence of information on a baseline or timedependent covariate, X, in this circumstance does not represent missing data. The value of X should only be viewed as missing when the underlying covariate is needed for the SRAs for treatment and censoring to hold. In essence, X functions as an interaction term, REQUIRED×X, that equals the recorded value when measuring X was required, and 0 otherwise.
On the other hand, if the SRAs only hold with respect to the underlying full measurement of X, then its value is truly missing. If the MAR assumption is reasonable, then in response, we impute a value, such as 0 or the mean or mode of the observed values. We also simultaneously create a binary indicator of missingness to add to the dataset. This indicator allows the pattern of missingness to itself be a predictor of subsequent treatment, drop out, or the outcome. There is no need to impute missing covariate values for data collected after a censoring event at time t cens , because only values from subjects who remain uncensored at times t > t cens contribute to estimating the components of Q and G at those subsequent time points. If the MAR assumption is not reasonable, then, if possible, one might change the design of the study by randomly sampling subjects among the total sample where these measurements are collected. Such designs are often referred to as two stage designs, where the second stage involves randomly sampling according to known sampling probabilities (Ho et al. 2021). Of course, changing the design of the study, or the definition of the causal estimand necessitates going through the steps of the roadmap from the beginning.
Handling missing outcomes: These general guidelines should be discussed with clinicians to reflect clinical context. Outcome values that are not recorded are set to missing (e.g., NA in R, '. ' in SAS). In a PP analysis, outcomes are also set to missing for subjects who are noncompliant with assigned treatment. Note that each observation with a missing outcome value remains in the dataset, and contributes to estimation of the PS and missingness probabilities up to the time it is right censored. Internally, TMLE evaluates the targeted counterfactual outcomes under both treatment assignments. These values contribute to the parameter estimate, ensuring the marginal mean outcome is with respect to the original intended study population. Retaining these observations also reduces variance.
TMLE provides valid inference, unlike most other causal effect estimation methodologies that incorporate machine learning. TMLE provides asymptotically valid 95% confidence intervals and controls the Type I error rate as a consequence of being a regular, asymptotically linear estimator. If the strong positivity assumption holds (i.e., the efficient influence curve for the target parameter is a bounded function in the observations), then reliable finite sample analytic standard error estimates are obtained with the sample variance or cross-validated sample variance of the efficient influence curve, an immediate byproduct of the TMLE (van der Laan and Rubin 2006). These standard error estimates can be used to construct p-values and Wald-type confidence intervals that have good coverage. However, if positivity is an issue, then it has been shown that these influence curve based variance estimators can be anticonservative by under-estimating the variance, but that plugin variance estimators still provide robust variance estimators. (Tran et al. 2018) Such estimates of the standard errors based on the influence curve of the TMLE, or on a robust, targeted plug-in estimate of the variance are available in most of the standard TMLE software packages Benkeser et al. 2017;Schwab 2021;Ju 2021).
There is also the option to use the nonparametric bootstrap to obtain confidence intervals that incorporate the higher order behavior of the TMLE. This has been shown to be theoretically valid and finite sample robust method when one use a SL based on highly adaptive lasso (HAL) estimators, and parametric model based estimators (Cai and van der Laan 2020). If, on the other hand, one uses a SL with other types of machine learning algorithms, then ties must be removed from the bootstrap sample (i.e., analogue to subset sampling). Targeted model based bootstrap resampling is a final option for obtaining robust confidence intervals whose confidence intervals incorporate higher order behavior of the TMLE with any SL (Coyle and van der Laan 2018). This illustrates that specifying the method for inference is another important detail to include in the SAP.

Step 5. Interpretation and sensitivity analyses to inform a substantive conclusion
Although the interpretation of ψ causal is clear, how closely ψ stat matches ψ causal merits discussion. To assess interpretability of ψ stat as a causal effect, one considers the plausibility of each of the identifying assumptions in turn. The consistency assumption holds trivially in an ITT analysis when outcomes are correctly measured. In a PP analysis, this assumption is likely to be met as long as treatment adherence over time was documented, and outcomes recorded under noncompliance are viewed as missing (exceptions are pre-specified allowances for grace periods, treatment and outcome assessment windows, etc.). The positivity assumption can be assessed with respect to measured confounders by examining the estimated PS and missingness distributions by treatment arm. Domain knowledge is needed to qualitatively assess the plausibility of the sequential randomization assumption. However, the true extent of violations of these assumptions is ultimately unknowable. Thus, TL prescribes a nonparametric assessment of how large and small departures from these assumptions would impact the substantive conclusion. This provides quantifiable insight into the level of support, including any lack of support, for regulatory decision making. This analysis complements other forms of sensitivity analyses, such as a tipping point analysis (Ratitch et al. 2013), multiple imputation, analyses assessing the impact of outlying values, assumptions on clustering or correlation, competing risks, and others (see Thabane et al. 2013).
The TL literature defines causal bias as the gap, δ, between the causal estimand and the statistical estimand, ignoring random variation, ψ stat − ψ causal (Diaz and van der Laan 2013). The nonparametric sensitivity analysis investigates the impact of potential, unknown, causal bias under a range of plausible values for δ, without imposing untestable modeling assumptions. The exercise illustrates how the effect estimate, p-value, and confidence interval bounds change, depending on the magnitude and direction of the hypothesized gap. The range of plausible values can be based on clinical context or prior evidence. In the following example, we examined a large enough range to show the gap size required for the 95% confidence interval to exclude the null, in both the positive and negative directions.
Consider a hypothetical analysis to estimate the ATE from RCT data where there was 25% LTFU. The unadjusted effect estimate wasψ ATE unadj = −6.10. Using TMLE to adjust for confounders moved the point estimate toψ ATE adj = −5.42, closer to the null value of 0. The substantive conclusion from the primary analysis is that treatment reduces the mean outcome. The upper bound on the 95% confidence interval is well below 0. An open question is whether the estimated treatment effect is biased due to confounding by unmeasured covariates partially responsible for LTFU. This is not directly testable from data. However, we can examine how the substantive conclusion would be impacted under a range of presumed causal bias.
We operationalized our sensitivity analysis by considering the potential bias due to unmeasured confounders or outcome mismeasurement. Figure 3 shows the shift in point estimates and confidence interval bounds for a range of causal bias sufficient for confidence intervals to be entirely above and below the null. The magnitude of the causal bias is shown on the x-axis labeled δ. Alternative axes show the bias relative to the SE of the adjusted estimate (SE units), and in terms of the difference between the unadjusted and adjusted estimates, Adj Units = 0.68. Only when causal bias is greater than approximately -3 does the confidence interval include the null. Causal bias would have to be more than 11 times larger than the difference between the adjusted and unadjusted estimates for the point estimate to be positive and the confidence interval exclude the null. Subject matter experts can share insight into whether a causal gap of this size and direction is plausible. If it is highly unlikely, then there is strong support for a conclusion that on average Drug A either has no impact or reduces 12-month MWHS in this population, compared with Drug B. A conclusion that Drug A reduces 12-month MWHS in this population also has strong support in the data. However, a conclusion that treating with Drug A instead of Drug B increase 12-month MWHS has essentially no support.

Results
An annotated SAP, completed checklists for pre-specifying the estimation procedure, and sample data analysis R code are provided as supplementary materials.

Conclusion
Decisions regarding development and authorization of new drugs, biologics, and medical devices need to be based on solid reliable interpretable science. This paper demonstrates how applying the principles of Targeted Learning while writing a SAP fosters the development of reliable RWE. TL distinguishes between a realistic statistical model, a causal model, a causal estimand and a statistical estimand. These clear distinctions are key in obtaining transparent, interpretable, actionable evidence from data. The TL roadmap complements the ICH E9(R1), going beyond the ICH focus on the statistical estimand.
Defining a realistic target of estimation with respect to intercurrent events for the primary analysis aligns with the ICH Guidelines. Nevertheless, proactively planning for how to avoid or minimize them remains an important study design component. This includes following current regulatory practice by capturing the reason for noncompliance or drop-out, and continuation of follow-up even after nonadherence (United States Food and Drug Administration 2008). Our sample SAP omits details often required in practice, to better highlight the contributions of the Targeted Learning framework. It illustrates how a Targeted Learning perspective can influence all stages from study design through data analysis and interpretation. The key recommendation is to follow the Targeted Learning Roadmap. Doing so produces a clear statement of the causal parameter explicitly with respect to intercurrent events. A model free definition of the corresponding statistical estimand leaves discretion in the choice of estimator.
TMLE is recommended over other double robust estimators by virtue of its being a plug-in estimator that respects bounds on the statistical model. TMLE also incorporates machine learning while preserving valid inference. Supplementary materials provide clear guidance on what features in TMLE and SL need to be pre-specified to ensure transparency and reproducibility. TMLE is more efficient than PS-based methods (e.g., inverse probability weighting, matching), and, unlike them, can remain unbiased when the PS and missingness mechanisms are poorly estimated (Lendle et al. 2013;Schnitzer et al. 2013;Colson et al. 2016). Even if this recommendation is not followed, principles underlying steps 1-3 and 5 of the TL Roadmap remain important guides to developing the SAP.
As in a standard analysis, a causal interpretation of the study finding depends on how well the identifying assumptions are met. Targeted Learning encourages explicitly considering each assumption in turn. This exposes potential gaps in identifiability that threaten the validity of a causal interpretation. Nonparametric sensitivity analyses quantify how such gaps affect point estimates, confidence intervals, and p-values. Results from such a sensitivity analysis, other sensitivity analyses, and diagnostics aid in assessing the strength of support in the data for a substantive conclusion drawn from the study findings.

Supplementary Materials
Annotated Statistical Analysis Plan: An annotated statistical analysis plan (SAP) titled, "A Fictitious Targeted Learning Example: Randomized Trial of Drug for Migraine And Headache Pain (TL-RDMAP)." The SAP appendix includes checklists and sample R code for pre-specifying the data analysis using the tmle or ltmle packages, and checklists for specifying super learner options. These specifications are for illustration only, and should be tailored for any particular data analysis.