Multiple-Model-based Robust Estimation of Causal Treatment Effect on a Binary Outcome with Integrated Information from Secondary Outcomes

Abstract– An assessment of the causal treatment effect in the development and progression of certain diseases is important in clinical trials and biomedical studies. However, it is not possible to infer a causal relationship when the treatment assignment is imbalanced and confounded by other mechanisms. Specifically, when the treatment assignment is not randomized and the primary outcome is binary, a conventional logistic regression may not be valid to elucidate any causal inference. Moreover, exclusively capturing all confounders is extremely difficult and even impossible in large-scale observational studies. We propose a multiple-model-based robust (MultiMR) estimator for estimating the causal effect with a binary outcome, where multiple propensity score models and conditional mean imputation models are used to ensure estimation robustness. Furthermore, we propose an enhanced MultiMR (eMultiMR) estimator that reduces the estimation variability of MultiMR estimates by incorporating secondary outcomes that are highly correlated with the primary binary outcome. The resulting estimates are less sensitive to model mis-specification compared to those based on state-of-the-art doubly-robust methods. These estimates are verified through both theoretical and numerical assessments. The utility of (e)MultiMR estimation is illustrated using the Uniform Data Set (UDS) from the National Alzheimer’s Coordinating Center with the objective of detecting the causal effect of the short-term use of antihypertensive medications on the development of dementia or mild cognitive impairment. The proposed method has been implemented in an R package and is available at https://github.com/chencxxy28/eMultiMR.


Introduction
In observational studies conducted during clinical or biomedical research, detecting the true causal effect instead of statistical association is the key to successfully revealing the underlying disease mechanisms and, in turn, improving treatments.Although randomized clinical trials are regarded as the gold standard for inferring causal relationships, they are limited to certain diseases with potential bias and cannot be applied to a large population since these trials are often time-consuming, expensive, or burdensome on patients (Fogel 2018).Alternatively, large-scale data from observational studies, which are increasingly recruiting the public, have gained more attention for exploring the causal effect of drugs and treatments on conditions such as Alzheimer's disease.For instance, Charpignon et al. (2021) assessed whether metformin improves survival and reduces the risk of dementia compared to the sulfonylureas by emulating target trials using electronic health records of diabetic patients from the Research Patient Data Registry managed by Mass General Brigham healthcare system in the United States and from the UK Clinical Practice Research Datalink.Similarly, Marcum et al. (2022) conducted a secondary analysis with data from the Systolic Blood Pressure Intervention Trial to examine causal effects in the MSM can be still consistently estimated by adopting the inverse-probability-of-treatment weight (IPTW) in logistic regressions (Robins 1999;Robins, Hernan, and Brumback 2000).
Note that MSM is valid only if the propensity score (PS) model for the treatment assignment is correctly specified, which is challenging to achieve in practice.As an alternative, conditional mean imputation (CMI) is popular for imputing the missed conditional means (CMs) for both treatment and control groups.It requires a full parameterization of the CM of the primary outcome, which also needs to be correctly specified.
To enhance protection against model misspecification, the socalled doubly-robust (DR) estimation has been proposed and extensively investigated in the literature (Robins 2000;Bang and Robins 2005;Funk et al. 2011;Schuler and Rose 2017).This estimation is DR in the sense that the resulting causal estimate is consistent if either the PS model or the CM model is correctly specified, but not necessarily both.More recently, advanced estimators have been developed to provide more protection for handling the missing data problem (Han and Wang 2013;Han 2014b;Chen, Shen, Liu, Wu and Wang 2021).The proposed method allows for the use of multiple models for the PS and CM.Under this framework, with the assumption of missing data at random, the resulting estimates are consistent if any one of the multiple models is correctly specified.Another advanced approach, known as the multiply robust estimator, was proposed based on variation-independent components of the observed data likelihood (Tchetgen Tchetgen 2009; Tchetgen Tchetgen and Shpitser 2012; Molina et al. 2017;Shi et al. 2020).Motivated by this, Babino, Rotnitzky, and Robins (2019) proposed a multiply-robust estimation of MSM for longitudinal and continuous outcomes and addressed the issue of model incompatibility.In addition to guaranteeing estimation consistency, improving estimation efficiency to enhance stability and precision in estimating causal effects is another essential goal.However, these advanced estimators with more protection against model-misspecification might be less efficient than simpler estimators, such as IPTW and CMI estimators.
On the other hand, secondary outcomes are often ubiquitous in epidemiology studies and clinical trials, and although they are of less importance than the primary outcome, they are still part of the pre-specified analysis plan for evaluating treatment effects (Smith, Morrow, and Ross 2015).For example, cognitive function is evaluated to detect MCI and dementia.The Mini-Mental State Examination (MMSE) and the Montreal Cognitive Assessment are routine cognitive screening tests, both of which are brief with different sets of questions.The MMSE is more commonly used, but the Montreal Cognitive Assessment is newer, more in-depth, and sensitive to MCI.When analyzing MCI or dementia, these cognitive evaluations can be regarded as secondary data that will not often be used in the primary analysis.To increase estimation efficiency in primary outcome analysis, recent works (Chen, Han, and He 2021;Chen et al. 2022;Chen, Wang, and Chen 2023) have developed a re-weighting estimation scheme that efficiently and robustly incorporates the information from secondary outcomes without the need for a joint model.However, that information-borrowing strategy cannot be directly applied to the context of causal inference, as it requires additional effort.
From a review of prior studies, we recognize that there is a pressing need for an advanced estimator based on multiple candidate models for the PS and CM for causal inference.However, correctly identifying the confounding mechanism in one model is difficult in practice, and it is unclear if traditional variableselection techniques that capture variables to improve predictive ability are suitable for causal inference (Hernán and Robins 2010).Second, incorporating information from secondary outcomes has the potential to improve estimation efficiency in primary causal inference.Motivated by these perspectives, we propose a novel estimation framework that can protect estimation consistency from misspecified models for the PS or CM and also enhance estimation efficiency in the context of causal inference for a binary outcome.In this framework, we allow multiple candidate models for the treatment assignment and the CM of the outcome.This multiple-model-based method differs from the multiply robust methods in the literature where variationindependent components of the likelihood are considered.The proposed estimator has been shown to be consistent and has the potential to be less biased, even if all candidate models are misspecified.Moreover, we improve estimation efficiency by incorporating information from highly correlated secondary outcomes to remedy potential efficiency loss after involving multiple candidate models.Estimation consistency is still guaranteed under a mild condition, regardless of the specification of the working model for the secondary outcome.The proposed method has been implemented in an R package and is available to the public (https://github.com/chencxxy28/eMultiMR).
The rest of this article is organized as follows.In Section 2, we review the existing methods that can be applied to a binary outcome and introduce our proposed estimation framework.Extensive numerical evaluations are presented in Section 3. In Section 4, we apply the proposed methods to real data from the National Alzheimer's Coordinating Center.Section 5 discusses the extension of the proposed framework to more complex situations that may be encountered in practice.A brief summary is presented in Section 6. Technical proofs, additional numerical results, and further discussion are included in the supplementary material.

General Notation and Method Workflow
We will first describe the general notation used in the article.Suppose we have two sets of data collected from the same study.Our primary focus is on the main dataset D 0i , which consists of the primary outcome Y 0i , the covariates X 0i , and the actual treatment assignment A i for each subject i from 1 to n.For the sake of convenience in notation, let A i be both an indicator variable of the treatment assignment in theory and the observed treatment assignment (both with two levels) for subject i in practice.Let a represent any potential treatment group level that could be assigned to a subject (a = 1 for treatment and a = 0 for control).In this section, we specifically focus on the outcome of interest measured at the prespecified time T after treatment initiation, which is measured on a binary scale.The associated covariates are denoted by X 0i .These covariates include timeindependent variables collected at the baseline, such as phenotypical characteristics obtained at the study entry or at the initial visit.Time-varying covariates, collected in certain contexts, can be accommodated in the model, which will be discussed further in Section 5.
The secondary data for subject i, D si , collected as an addendum to the primary data, consists of the secondary outcomes Y si , which are expected to be highly associated with the primary outcomes and the actual treatment assignment A i .The associated covariates are denoted as X si , which, in general, are allowed to be the same as or overlapping with the covariates in the main data D 0i .In many applications, we often consider the former case, that is, X si = X 0i .To illustrate our method and the motivation for this application, we further narrow our focus to secondary outcomes that are repeatedly measured.Such data are commonly available in clinical trials and cohort studies (Hanfelt et al. 2011;González et al. 2018).Extensions to other types of secondary outcomes can be found in Section 5. Throughout the article, we will suppress the subscript i in an expectation value unless it causes confusion.
Figure 1A presents a schematic summary of our proposed framework.The input consists of a main dataset and a secondary dataset.Our framework allows multiple candidate models for PS used for treatment assignment and CM for the primary outcome.Additionally, to integrate information from the secondary data, we employ a working model for the secondary outcomes, which are not necessarily correctly specified.The primary objective of the enhanced Multiple-Model-based Robust (eMultiMR) estimator is to enhance the estimation of the marginal causal treatment effect with robustness and improved efficiency.

The Existing Models
We start with an example of how one would estimate the effect of A i on Y 0i in a point-treatment study.The causal contrasts that correspond to association-based regression involve counterfactual variables.In particular, let Y be the primary outcome for subject i if treated and Y be the primary outcome for the same subject if untreated.However, at most only one of Y will be observed in the real world.To infer the average treatment effect (measured on the odds ratio scale), we consider the following model: where l(•) is some prespecified link function.The causal treatment effect exp (β * 1 ) is interpreted as a causal odds ratio when l(•) is a logit link.This model is a so-called MSM and was proposed by Robins, Hernan, and Brumback (2000).Note that the causal model (1) is in general not equivalent to the associationbased generalized linear model (GLM), which is based on the observed data only.When the treatment assignment is imbalanced, these parameter estimates in (2) are biased for the corresponding causal parameters in (1) (Robins, Hernan, and Brumback 2000).Under the Stable Unit Treatment Value Assumption and to identify the causal treatment effect with a conventional regression model, we make the following identifiability assumptions (no unmeasured confounding variables and adequate overlap between the treatment and control covariate distributions), which are common in the literature.A more detailed discussion of these assumptions can be found in the literature (Hernán and Robins 2010;Yang and Ding 2020).
Assumption 1 (No unmeasured confounding).Y In the remainder of this section, we will explicitly describe the existing methods that motivate the DR estimator and lead to our proposal.

The IPTW estimator
Given the identifiability assumptions above, we can derive a consistent estimate of a causal parameter by applying IPTW in a GLM.The weight is defined as 1/ Prob(A i = a|X 0i ) with a = 0, 1 (Robins, Rotnitzky, and Zhao 1994;Robins, Hernan, and Brumback 2000;Han and Wang 2013;Chen, Shen, Liu, Wu and Wang 2021).In practice, this PS is unknown.It has to be estimated by modeling Prob(A i = a|X 0i ; η) with the estimated indexed parameters η, which are derived from fitting a regression model (e.g., logistic regression).Thus, the estimation procedure for inferring a causal effect can be summarized with the following generic estimating equation: The length of parameter vector β in the estimating equation ( 3) is denoted by p = 2.Note that the second parameter in β refers to the causal treatment or exposure effect of interest.Based on Robins, Hernan, and Brumback (2000), the true causal effects β 0 can also be defined as the solution of the first moment condition E{g(D 0 ; β 0 )/ Prob(A|X 0 )} = 0. Note that some works in the literature use stabilized weights by considering Prob(A i = a) to avoid extreme values of PS for finite samples (Robins, Hernan, and Brumback 2000).However, for illustration, we adopt the form in (3), which is commonly considered in theory.
Given Assumptions 1 and 2, the weighted logistic regression can lead to consistent estimates of true causal effects if and only if the PS model Prob(A i = 1|X 0i ; η) is correctly specified; otherwise, the estimation bias could be considerable (Figure 1B).

The CMI Estimator
Besides the IPTW estimator, an alternative estimator for β can be obtained from imputed CMs.Let γ (a, X 0 ; ψ) be the CM model of the primary outcome quantifying with the indexed parameters ψ.By realizing that E(Y (a)  0 ) = E{E(Y 0 |A = a, X 0 )|A = a}, an estimate of β based on CMI can be derived with the following two steps: • Step 1. Compute ψ by solving: ( 4 ) • Step 2. Obtain β by solving: where Here, Z i and Z(a) i are any possible data-dependent functions with the same dimensions as ψ and β, respectively.For example, for the binary outcome, ( 4) could be a score function based on GLMs by specifying a logit link and Bernoulli distribution, whereas (5) can be from GLMs by specifying an identity link since imputed CMs γ (a, X 0i ; ψ) are continuous.Like the IPTW estimator, the CMI estimator requires the correct specification of the CM model γ (a, X 0i ; ψ) and could be severely biased if not (Figure 1B).

The DR Estimator
A DR estimator was first proposed by Bang and Robins (2005) and later advocated for routine use since it weakens reliance on modeling assumptions and avoids the need to commit to a specific modeling (IPTW or CMI) (Cao, Tsiatis and Davidian 2009).This estimate is consistent under the union model assuming that either the PS model or the CM model holds, but not necessarily both.However, DR estimation for the MSM given a binary outcome is still under active research.To fill this gap, we consider the following DR estimator by solving: where η and ψ can be obtained by the strategies described above.The d( •) is any possible data-dependent function with the same dimensions as β.The first component in ( 6) is related to the IPTW estimating function in (3), whereas the second component is the CMI estimation function (5) in the second step for imputing the unobserved CMs.Better numerical performance has been observed for the DR estimator compared to the IPTW and CMI estimators (Figure 1B).

The MultiMR Estimator
Despite its novelty, in practice, the DR estimator may not provide sufficient protection for the estimation consistency of the causal effect.It only allows for a single PS model for treatment assignment and a single CM model for the CMs.Note that in largescale observational studies and without prior knowledge of the underlying data mechanism, it is risky to claim this assumption holds.In practice, it might not precisely specify the true underlying model due to data complexity and heterogeneity.Some mis-specified models produce severely biased estimates, while others may not.More importantly, naively adding or deleting variables in regression or PS models may induce or amplify the association between the primary outcome and the treatment assignment (Hernán and Robins 2010).Thus, to enhance the level of protection against confounder selection and model misspecification, multiple model options with different subsets of covariates, different orders, or possibly different functions of covariates could be used (Robins et al. 2007;Han 2014b).Suppose we have K 1 candidate PS models and K 2 candidate CM models that are determined by various variable-selection techniques or relied on clinical knowledge.To fulfill the goal of being multiple-model-based robust, the following calibrated estimating equation is considered: where g(D 0i ; β) is defined in (3) and the nonnegative weights ω(a) i (a = 0, 1) are obtained by maximizing i∈{A i =a} ω (a)  i with respect to the following constraints: where Here, we use π (1) j 1 (X 0i ; ηj 1 ) for Prob(A i = 1|X 0i ; ηj 1 ) from the j 1 th estimated PS model with the plug-in parameters estimated by logistic regression.Similarly, π (0) j 2 (X 0i ; ψj 2 , βj 2 ) is the estimating function with plug-in parameter estimates ψj 2 and βj 2 solved by ( 4) and ( 5), respectively.Note that ω(a) i will appear only if A i = a.For mathematical convenience, we define ω(a) i = 1 when A i = 1 − a for a = 0, 1.The first constraint in (8) ensures the identifiability of weights.The rationale of the second constraint in ( 8) is from the following moment identities, that is, for any j 1 , j 2 , and a: where the above equalities hold when ω (a) = 1/ Prob(A = a|X 0 ) for a = 0, 1.Thus, the second constraint in ( 8) is equivalent to the sample version of the moment identities in (9).Intuitively, the estimated weights ω(a) i enable the calibration of the observed estimating function g(D 0i ; β) and the recovery of the population information from the biased samples.To ensure estimation consistency, we make the following assumptions.
Assumption 3.There exists a vector 11 , 0 T ) T with a K 1 -dimensional vector C 11 , and π (a) (X 0i ; η) is the correctly specified PS model for the treatment assignment with the estimated parameters η.
Either Assumptions 3 or 4 (but not necessarily both) guarantees consistent estimates, that is, there exists a linear combination of candidate models for PS or CM that approaches the true treatment assignment model or the true CM model.In general, C 1 can involve the parameters η0 , . . ., ηK 1 , and C 2 can involve the parameters ˆ 0 , . . ., ˆ K 2 and β0 , . . ., βK 2 .A special case for these two assumptions is that, among a series of candidate models for PS and CM and without loss of generality, the first candidate PS model or the first CM model is correctly specified, that is, C 11 = (1, 0 T ) T or C 21 = I with identity matrix I and C 2j = 0, otherwise.The property of being multiple-modelbased robust is summarized in the following lemma.
Lemma 1.Under Assumptions 1 and 2, we have that The technical details of Lemma 1 are described in Section 2 in the supplementary material.This lemma indicates that β 0 is the solution to (7) as n − → ∞, which implies the consistency of the MultiMR estimator.It is worth noting that the developed theory of MultiMR estimator solved by (7) could go beyond the existing literature (Han and Wang 2013;Han 2014b) in the sense that even if all candidate models are misspecified, the resulting estimate could still behave well empirically with a higher chance.This property is highly desirable in practice since it is challenging to exactly specify the true PS model or the true CM model, especially in terms of determining all confounders and capturing various (linear or nonlinear) effects from those confounders.Proposing multiple candidate models provides a higher-level protection, as these candidate models together could reasonably approximate the true model.Note that, in theory, the dependency of counterfactual outcomes and treatment assignment is key to obtaining a biased causal estimate.Thus, proposing multiple candidate models for PS and CM is also more likely to alleviate this dependency than one candidate model, even if the true model cannot be well approximated.Further evidence from numerical justifications is provided in Section 3. Additionally, it is recommended to propose a limited number of candidate models in practice, despite the theoretical advantage of having a larger number, due to numerical stability and potential collinearity among estimating functions.In real applications, using two or three candidate models is usually recommended, one based on clinical knowledge and the others based on model selection techniques (Shao 1997).

The eMultiMR Estimator
The MultiMR estimator can be less efficient than the IPTW or CMI estimators due to the complex mechanism for multilevel protection.To potentially enhance estimation efficiency and provide more stable statistical inference, we consider the possibility of borrowing information from secondary outcomes.Chen, Han, and He (2021) proposed a framework that effectively integrates information with a GLM.Its extension to MultiMR estimation is possible but nontrivial.To deliver the information, we consider the following enhanced estimate by solving: where the informative scores pi are derived from a working model of the secondary data, that is, max n i=1 p i , with constraints p i ≥ 0, n i=1 p i = 1, and n i=1 p i h(D si ; θ ) = 0. Here, h(D si ; θ ) is the working estimating function for the secondary outcome.Note that h(D si ; θ ) should be over-identified, that is, the dimension of this function should be larger than the dimension of the parameter vector θ .Motivated by our application, we consider secondary data as repeated measurements (with additional cases described in Section 5).However, our methodology is not limited to this specific scenario.The over-identified working function can be specified as where Z i = ∂μ s (X si ; θ )/∂θ T .Also, Ri is a diagonal matrix containing the variance of each element in Y si (can be either continuous or binary).Further, μ s (X si ; θ ) are the CMs of the secondary outcome Y si indexed by the parameters θ .Suppose there are τ matrices V j that lead to the over-identified estimating function with length r × τ , given the dimensions of θ, r, and τ ≥ 2. Note that V 1 , . . ., V τ can be a sequence of base matrices, with examples provided in the supplementary material and in the literature (Qu, Lindsay, and Li 2000; Tang and Leng 2011; Chen, Han, and He 2021).
Intuitively, based on empirical likelihood theory (Qin and Lawless 1994) with over-identified estimating functions, the resulting score provides a more efficient estimate of the distribution of the secondary data compared to the empirical distribution.By incorporating efficient estimates as weights, it summarizes and synthesizes information from the secondary data, making it feasible to borrow information and reduce estimation variability in the primary causal analysis.Another desirable feature of the eMultiMR estimator is its protection against mis-specification of the secondary model.The estimation consistency of causal parameters in MSM can be guaranteed under the following mild condition: Assumption 5. E h(D s ; θ * ) = 0, for some parameter values θ * , but not necessarily the true one generating the secondary data.
Further, let us rewrite the left-hand side of the estimating equation in (10) based on the regular procedure of the Lagrange's multiplier λs (Qin and Lawless 1994) and the empirical likelihood estimator θ (refer to Section 1 of the supplementary material): The first equality holds because pi = (1/n){1 + λT s h(D si ; θ )}.The second equality holds because λs = O p (n −1/2 ) and θ − θ * = O p (n −1/2 ) based on empirical likelihood theory (Qin and Lawless 1994).The last equality holds according to Robins, Hernan, and Brumback (2000).This finding signifies that estimation by eMultiMR is robust even if the working model for the secondary outcome is mis-specified.The robustness property of the MultiMR estimator in Section 2.3.2 is also inherited by the eMultiMR estimator.Further, by incorporating information, the asymptotic variance-covariance matrix of the eMultiMR estimator can be reduced, which is summarized in the following theorem: Theorem 2. Under Assumptions 1, 2, and 5, n 1/2 ( β − β 0 ) follows an asymptotic normal distribution with zero mean and variance at least no larger than the variance of the MultiMR estimate in Section 2.3.2,given either Assumptions 3 or 4, but not necessarily both.
The technical details of this theorem are described in Section 3 in the Supplementary Material.We have two remarks here: one is that the proposed integration does not rely on a causal relationship between outcomes but uses the high association to improve estimation efficiency.Thus, the temporal or causal relationship among outcomes may not affect the interpretation of the primary analysis; the other one is that the asymptotic variance derived from the above theorem holds given a known and correctly specified PS model.However, if the correct PS model is unknown, we suggest using the subject-level bootstrapping method as an alternative for calculating the standard error of the estimates.The numerical procedure for solving (7) and ( 10) is based on the theory of empirical likelihood (Qin and Lawless 1994;Han 2014a), which is described in Section 1 in the supplementary material.

Data Generation
We conducted a simulation study following a scenario that roughly mimics the data structure in Section 2.3.Briefly, the simulation setup includes one main dataset D 0i consisting of a binary-scale main outcome, and one secondary dataset D si consisting of repeated continuous measurements as secondary outcomes.These two datasets are from the same population cohort.Both outcomes exhibit a high level of association, and our objective is to obtain a robust and efficient estimate of the causal odds ratio for the treatment effect on the primary binary outcome.
To be specific, we generated the longitudinal secondary outcomes from in total four visits Y si using the model were generated from a Bernoulli distribution with success probability 0.5, and X si21 = X si22 = X si23 = X si24 were generated from a uniform distribution on the support [0, 1].The notation A i denotes a four-dimensional vector with elements equal to A i , where the observed treatment assignment A i for subject i is binary with success probability {1 + exp(− X 0i η)} −1 with η = (0.6, −1, −1, 1) and X0i = (1, X si11 , X si21 , Z si3 ) with Z si3 = X si11 + sin X si21 + 0.5 N(0, 1).The residual vectors si were generated by a four-dimensional multivariate normal distribution with zero mean and variance 1.We also imposed the first-order autoregressive (AR1) correlation structure with the correlation coefficient 0.4 for residuals within subjects.
On the other hand, the counterfactual primary outcome Y (A i ) 0i was considered to be binary with success probability defined by , and X 0i = (1, X si11 , X si21 ) .This setup mimics a practical case in which baseline covariates initiate the causal effect on the primary outcome.To establish the association between the primary and secondary outcomes, we employed the following generation procedure.For each i, the primary outcome is set to 1 if si4 ≥ x 0i and 0 otherwise.Here, x 0i is the (1 − p )th percentile of a standard normal distribution.By doing so, the generated primary outcome is highly associated with the secondary outcome at the last visit, similar to our application where the status of having MCI or dementia was determined at the end of the study.The generated variables and their causal relationships are summarized and visualized in Figure 2A.In the supplementary material, we also assessed a scenario where the primary and secondary outcomes were only mildly associated.In this situation, we observed only slight gain in efficiency.

Comparison of Methods
We compared five estimation procedures: IPTW, CMI, DR, MultiMR, and eMultiMR.For IPTW, we considered three setups based on (1) the correct PS model (IPTW_100), ( 2) the first incorrect model that considers only the covariates X si11 and Z si3 (IPTW_010), and (3) the second incorrect model that considers only the covariates X si21 and Z si3 (IPTW_001).For CMI, we considered three setups based on (1) the correct CM model (CMI_100) with the covariates X 0i and A i , and their interaction A i X 0i , (2) the first incorrect CM model that considers only the covariates X si11 , Z si3 , and A i , and their interactions A i X si11 and A i Z si3 (CMI_010), and (3) the second incorrect CM model that considers only the covariates X si21 , Z si3 , and A i , and their interactions A i X si21 and A i Z si3 (CMI_001).For DR, we considered four setups based on: (a) the correct PS model and the correct CM model (DR_100_100), (b) the correct PS model but the first incorrect CM model (DR_100_010), (c) the first incorrect PS model but the correct CM model (DR_010_100), and (d) the first incorrect PS model and the first incorrect CM model (DR_010_010).For MultiMR and eMultiMR, since they are able to incorporate multiple PS models and multiple CM models, we considered the same four setups as DR but with multiple candidate PS and CM models, each of which consists of two PS models and two CM models.The (e)MultiMR ((e)MultiMR_110_110, (e)MultiMR_110_011, (e)MultiMR_011_110, and (e)MultiMR_011_011) estimators have the same naming strategy to that used for DR but with multiple candidate models.This is summarized in Figure 2B.Note that here we focused only on the correctly specified secondary model for the eMultiMR estimators.The results from a mis-specified secondary model are discussed later.We considered four base matrices in (11), which are also commonly adopted in the literature (Qu, Lindsay, and Li 2000;Chen, Han, and He 2021).

Evaluating
Robustness.We first evaluated the robustness of (e)MultiMR estimates compared to those of IPTW, CMI, and DR in terms of estimation bias and mean square error (MSE).Table 1 shows that when the PS and CM models in the corresponding methods were accurate, all methods exhibited satisfactory results with little bias and low MSE.However, when the PS or CM models were misspecified, the IPTW and CMI estimators displayed considerable bias.The DR estimator protected against mis-specification for either the PS or the CMI model, but failed to provide an accurate estimate when neither the PS model nor the CMI model was correct.These behaviors persisted even with a larger sample size.In contrast, the (e)MultiMR estimators were unbiased for all setups, even if neither the PS model nor the CMI model was correct.This numerical result confirms the finding in Section 2.3.2 that the (e)MultiMR estimator is indeed beyond doubly robust in the sense that the resulting estimator can still be reasonably wellbehaved even when all models are misspecified.
Evaluating efficiency.We evaluated the efficiency of the eMultiMR estimators with the Monte Carlo standard deviation (MCSD) and MSE.As shown in Table 1, integrating information from secondary outcomes significantly reduced the estimation variability and MSE of the eMultiMR estimators compared to the MultiMR estimators in all setups.This confirms the effectiveness of using secondary outcomes in improving efficiency.We then evaluated the estimated standard errors of the (e)MultiMR estimators obtained through bootstrapping (BSE).Tables 3 demonstrates that the BSEs were closely aligned with the MCSDs for sample sizes of 300 or 600.Furthermore, the coverage proportions, based on using 1.96 BSE over 1000 Monte Carlo runs, approached the nominal level of 95% as the sample size increased from 300 to 600.
A misspecified secondary model.Lastly, we evaluated the performance of the eMultiMR estimators when confronted with a mis-specified secondary model.This scenario accounts for practical situations where the mean structure is incorrectly specified or complex missingness is not adequately addressed due to various reasons, such as design issues, participant dropouts, or refusal, among others (Chen, Han, and He 2021; Chen, Wang, and Chen 2023).Thus, we conducted a case study where the mean structure in the secondary model was misspecified.Specifically, instead of using X si to fit the secondary data, we replaced it with a mis-specified covariate vector ) .All other aspects remained the same as in the previous setups.Tables 2 and 3 illustrate that even with a mis-specified secondary model, the eMultiMR estimates remained unbiased, exhibiting satisfactory BSEs that are close to the MCSDs and well-controlled coverage proportions.As before, the eMultiMR estimator demonstrated more efficient across all setups.

Data Application
In our application, the primary outcome of interest was the development of either dementia or MCI (D/MCI; yes = 1 and no = 0).The primary drug of interest was the current use of any type of AHTN.We were particularly interested in the effect of short-term use of AHTN on the development of D/MCI at the 4-year visit.We selected subjects who were first enrolled between 2005 and 2010.The entry criteria for our study were as follows: being 60 or older at the first enrollment, having at least four annual visits, having no prior record of D/MCI diagnosis, and no AHTN use at the first visit.The treatment group consisted of subjects who regularly took AHTNs during the follow-up, while the control group consisted of subjects who did not take AHTNs at any visit.The covariates considered in the analysis were age, gender, race, body mass index (BMI), hypertension, diabetes, and average number of packs of cigarettes smoked per day (PACKSMK).After excluding subjects with missing covariates records, we ended up with a final sample size of 1373 subjects.Note that less than 3% of the subjects died within the first four visits, and the distributions of baseline demographic variables did not vary significantly between those subjects who dropped out and those who remained in the study.Therefore, it was reasonable to assume that the data were complete in this application.
In addition to the primary outcome, we considered the secondary outcome, which was the MMSE score highly associated with the diagnosis of D/MCI.The MMSE was calculated by the Alzheimer's Disease Research Centers and measured the global cognitive performance of subjects during each visit.Each subject had four repeated measures of the MMSE.For the primary causal inference, we adopted a working model with the covariates age, PACKSMK, BMI, gender, AHTN*age, AHTN*PACKSMK, AHTN*BMI, and AHTN*gender.
Due to the nonrandom assignment of AHTN treatment, which can be confounded by several Alzheimer's disease-related covariates, simple maximum likelihood estimation (MLE) by regressing the outcome D/MCI on AHTN could lead to biased estimates of the marginal causal odds ratio.Thus, calibration for the unbalanced treatment assignment is necessary to ensure a valid causal inference.However, both the PSs for treatment assignment and the regression model for the primary outcome are challenging to specify in such a large observational study with various variables that could confound the system in an unknown way.Thus, it is preferable to use multiple candidate models with different subsets of covariates, orders, and possibly different functions (Robins et al. 2007;Han 2014b).For the PSs, we considered two candidate models: (a) with age, race, gender, PACKSMK, BMI, hypertension, and diabetes having linear effects (PS1) and (b) with log(age), race, gender, I(PACKSMK > 3), I(BMI > 30), hypertension, and diabetes quantifying nonlinear effects (PS2).For the regression model, all past outcomes, time-varying variables, and baseline variables could be considered as candidate covariates.However, incorporating more variables in a regression model may introduce more bias in causal inference (known as the g-null paradox) (Hernán and Robins 2010).Again, allowing multiple candidate models can, to some extent, alleviate this issue.We used two candidate models, one with covariates selected based on LASSO variable selection by considering all past covariates (CM1) and the other with clinical meaningful covariates at the last visit as suggested by clinicians (CM2).Besides, we considered the DR estimate for PS1 and CM2 and compared the results with those from our proposed MultiMR (incorporating all four candidate models) and eMultiMR (incorporating all four candidate models and the secondary outcome).
Table 4 summarizes the results.None of the methods show any evidence of a causal effect of AHTN on the development of D/MCI after 4 years of follow-up.This may indicate that there is no or a very weak negative effect from the short-term use of AHTN on the development of D/MCI.Interestingly, all regression-based or calibrated estimates show a negative effect, while the standard MLE approach shows a positive effect, which may indicate that treatment assignment is imbalanced due to confounding variables and suggests a need for calibrated estimation.The eMultiMR estimate has around 25% lower estimation EVar: Bootstrap-based variance; 95%LL: lower limit of the 95% confidence interval for the odds ratio based on the bootstrap-based standard error; 95%UL: upper limit of the 95% confidence interval for the odds ratio based on the bootstrapbased standard error.
variability than the MultiMR estimate, validating the use of secondary outcomes.

Method Extensions
Time-varying treatments.One extension is to estimate the causal effect of a time-dependent exposure in the presence of timedependent confounders affected by any previous treatment.Our proposed (e)MultiMR estimation procedure may still accurately estimate the causal effect of a time-dependent exposure given a binary outcome in certain situations.For example, if we are interested in the causal effect between no drug takers and those who always take the drug on the outcome at the last visit.We can apply the procedure introduced in Section 2.3.2 using PS models and CM models calculated dynamically to calculate treatment assignment probabilities and imputed means, that is, a sequence of PS and CM models at different visits (Robins, Hernan, and Brumback 2000;Babino, Rotnitzky, and Robins 2019).It is worth noting that there may be an issue with incompatibility for CM models when dealing with binary primary outcomes (Hernán and Robins 2010;Babino, Rotnitzky, and Robins 2019).Thus, simply proposing sequential linear models may not be valid for a binary outcome.Thereby, our proposed eMultiMR with multiple candidate models based on different data-generating mechanisms may be more preferable and warrants further research.Subcohort of secondary data.In practice, it is possible that not all subjects in the main data population have records for the secondary data, but a subcohort does.To allow partially observed secondary data, we can modify the constraint in Section 2.3.3 by n i=1 p i R i h(D si ; θ ) = 0, where R i is the indicator for observing the secondary data of subject i.Like Theorem 2, the only requirement here is to assume E{R i h(D si ; θ * )} = 0 for some parameter values θ * , but not necessarily the true ones.Thus, the proposed method can be more flexibly applied to complicated scenarios with missing data (Chen, Wang, and Chen 2023).
Other types of secondary outcome.Our proposed estimator is not limited to longitudinal secondary outcomes.The strategy outlined in Section 2.3.3 can be extended to other types of secondary outcomes with different forms or structures, such as recurrent time-to-event data.The key step is to construct an over-identified estimating function.For cross-sectional secondary outcomes, we can use the following function: h(D si ; θ ) = d(X si , Zsi ; θ ) Y si − μ(X si ; θ ) , with the redundant variables Zsi .These variables are believed not to be predictive of the secondary outcomes, which could be determined numerically by a variable selection procedure (Chen, Han, and He 2021).Further numerical evaluations need to be conducted to assess performance.These aspects are not the focus of this article but will be pursued in future research.

Conclusions
We presented (e)MultiMR, a robust estimation framework for inferring a causal effect when the outcome is binary.The proposed method embraces the robust property, enhancing protection against misspecification of the PS or CM models.Moreover, this method can integrate information from secondary outcomes that are highly correlated with the primary outcome.The efficiency and robustness of the (e)MultiMR estimator were confirmed both theoretically and in extensive simulation studies.Future works, such as handling truncation by death and incorporating a high correlation among secondary outcomes, is open for research.We believe the proposed estimator has made a significant contribution to causal inference.

Figure 1 .
Figure1.A schematic workflow of eMultiMR estimation framework (A) and a summary of methods (B).eMultiMR calculates the calibrated PSs based on multiple candidate PS models and CM models and calculates informative scores based on secondary data and a corresponding working model; Both scores are then integrated to more robustly and precisely estimate the marginal causal effect.

Figure 2 .
Figure2.A causal relationship between simulated variables (A) and a summary of estimate notations (B).

Table 1 .
Evaluation of the (e)MultiMR estimators under sample size 300 and 600 given the correctly specified secondary model.The names of estimators have the form "method_000_000, with each digit of the three digit number, from left to right, indicating if right PS model, first misspecified PS model, and second misspecified PS model/right CMI model, first misspecified CMI model, and second misspecified CMI model were used.All results have been multiplied by 100.

Table 2 .
Evaluation of the (e)MultiMR estimators under sample size 300 and 600, given the misspecified secondary model.

Table 3 .
Evaluation of bootstrap-based standard errors and corresponding coverage proportions of the (e)MultiMR estimators under sample size 300 and 600, given the correctly specified and mis-specified secondary model, respectively.The names of estimators have the form "method_000_000, with each digit of the three digit number, from left to right, indicating if right PS model, first misspecified PS model, and second misspecified PS model/right CMI model, first misspecified CMI model, and second misspecified CMI model were used.All results have been multiplied by 100.MCSD: Monte-Carlo standard deviation; BSE: bootstrap-based standard error; 95%CP: 95% coverage proportion based on bootstrap-based standard error; (e)MultiMR: (enhanced) multiple-model-based robust.

Table 4 .
Evaluation of different methods in the real data application.