Finite-sample performance of the robust variance estimator in the presence of missing data

Abstract Theoretically, the maximum likelihood estimator has the sandwich-type asymptotic variance-covariance matrix under model misspecification. Its empirical estimator, that is called the robust variance estimator, is consistent. Thus, the estimator is asymptotically valid even under model misspecification. In practice, the robust variance estimator is used for computation of standard errors in longitudinal data analysis. Recently, Golden et al. (2019 Econometrics, 7, 1-27) showed that the maximum likelihood estimator retains a sandwich-type asymptotic variance-covariance matrix in the presence of missing data even when the missing-data mechanism is missing not at random. Although they revealed the asymptotic validity of the robust variance estimator in the simultaneous presence of both model misspecification and missing data, its finite-sample performance did not be investigated. In this article, we evaluated the finite-sample performance via simulation studies and clarify its small-sample problems. In addition, we illustrated the robust variance estimator using longitudinal CD4 count data from a randomized double-blind study.


Introduction
The maximum likelihood estimator is commonly used for statistical inference and has good asymptotic properties such as consistency, asymptotic normality, and asymptotic efficiency.These properties are ensured when the analysis model is correctly specified and there are no missing data.The analysis model comprises mean structure and (co)variance structure, as well as error distribution for outcomes.Analysts fit the model to the data; however, model misspecifications such as omission of important covariates in the mean structure and wrong fitting of the error distribution are sometimes inevitable.Model misspecification may cause a bias to parameter estimator and its naive variance estimator, which is commonly used variance estimator consisting of the inverse of the information matrix, thus distorting data interpretation (e.g.Ishii, Maruo, and Gosho 2022).On the other hand, such bias might often be generated from missing data (e.g.Little and Rubin 2002).Observing complete data are rare in practice and inappropriate dealing with missing data is another reason for a biased estimator and distorted interpretation.
Some researchers have studied the asymptotic properties of the maximum likelihood estimator addressing the missing data and model misspecification problems separately.As for the former problem, Takai and Kano (2013) showed that the maximum likelihood estimator based on the incomplete-data likelihood function exhibits consistency and asymptotic normality when the missing-data mechanism is missing at random (MAR) in the sense of Rubin (1976).However, the asymptotic properties are generally established only when the analysis model is correctly specified.As for the latter, the robust variance estimator (Cox 1961;White 1982) provides the exact asymptotic variance of the maximum likelihood estimator even under model misspecification; however, this setting does not account for the presence of missing data.The robust variance estimator is also used in the context of the generalized estimating equations, and the assumption of missing completely at random (MCAR) is required in the presence of missing data (Liang and Zeger 1986).
Recently, Golden et al. (2019) showed under some regularity conditions that the asymptotic distribution of the maximum likelihood estimator is a multivariate normal distribution with a sandwich-type variance-covariance matrix in the simultaneous presence of both model misspecification and missing data.The asymptotic variance-covariance matrix can be consistently estimated by the empirical estimator, which is called as the robust variance estimator.Furthermore, the proof of this asymptotic result does not assume any restriction on missing-data mechanisms; hence, the robust variance estimator enables asymptotically valid variance estimation under model misspecification and in the presence of missing data with arbitrary missing-data mechanisms.However, Golden et al. (2019) did not evaluate the finite-sample performance of the robust variance estimator.
In longitudinal clinical trials, mixed-effects models for repeated measures (MMRM) method (Mallinckrodt, Clark, and David 2001a) with an unstructured covariance structure is frequently used to analyze incomplete data.However, an unstructured covariance structure could lead to convergence problems in trials with a small sample size, large number of time points, or high dropout rate (Lu and Mehrotra 2010;Gosho, Noma, and Maruo 2021).In such cases, we sometimes use simpler structures such as first-order autoregressive and compound symmetry structures and use the robust variance estimator to handle misspecification of the covariance structure.Hence, it is important for practical use of the robust variance estimator to evaluate its finite-sample performances and to clarify its small-sample problems.In this article, we evaluate the finite-sample performance of the robust variance estimator through simulation studies assuming longitudinal clinical trials in which we cannot correctly specify the true analysis model and know the true missing-data mechanism.
The remainder of this paper is organized as follows.In Sec. 2, we review the asymptotic distribution of the maximum likelihood estimator and the robust variance estimator in the simultaneous presence of both model misspecification and missing data.In Sec. 3, we conduct simulations and case studies to evaluate the robust variance estimator.Finally, Sec. 4 discusses the results and concludes.All simulation results are provided in the supplementary material.

Asymptotic distribution
We describe the asymptotic result of Golden et al. (2019) without measure-theoretic notations.
Let y be a s-dimensional response vector and m be a corresponding missing indicator vector (i.e. for kth element, m k takes one if y k is missing and zero if y k is observed).We denote all missing patterns by m ð1Þ , :::, m ð2 s Þ : For ith missing pattern (i ¼ 1, :::, 2 s ), let y ðiÞ be the observed subvector of y is m ðiÞ and I i ðmÞ be the indicator function for the missing pattern m ðiÞ : Let f(y, m) be the true joint density function of (y, m) and f(y) be its marginal density function for y.Let the fitting marginal density function for y be gðy; hÞ: In this case, the fitting analysis model is misspecified if f ðyÞ 6 ¼ gðy; hÞ for all h.We write the marginal density function for y ðiÞ as g i ðy ðiÞ ; hÞ: From these notations, we define the log-likelihood function with respect to the observed data ðy ðiÞ , m ðiÞ Þ as sðhÞ ¼ X 2 s i¼1 I i ðmÞ log g i ðy ðiÞ ; hÞ: Similarly, for independent n observations ðy n , m n Þ, we define the loglikelihood function as I ij log g i ðy ðiÞ j ; hÞ, where I ij ¼ I i ðm j Þ is an indicator function, which takes one for m j ¼ m ðiÞ and zero otherwise (i ¼ 1, :::, 2 s , j ¼ 1, :::, n).When there are no missing data, s n is equal to the ordinary log-likelihood function.Let ĥ be the maximum likelihood estimator based on s n ðhÞ using available data.Takai and Kano (2013) shows the consistency of ĥ and gives an asymptotic variance-covariance matrix of ĥ when the missing-data mechanism is MAR and the analysis model is correctly specified.The expectation E f fsðhÞg under the true model is I i log g i ðy ðiÞ ; hÞf ðy, mÞdydm and is supposed to be maximized at h Ã 2 X h , where X h is a parameter space of h.Although h Ã is treated as a true value in the following consideration, h Ã generally differs from a true value under model misspecification or in presence of missing value with MNAR mechanism.However, in some practical situations, parameters of interest would be regarded as true; for example, the group effect parameter in a randomized clinical trial would be zero under the null hypothesis of no difference between groups (Gail, Wieand, and Piantadosi 1984;Mallinckrodt, Clark, and David 2001b;Barnes et al. 2008).
We can now derive the asymptotic properties of ĥ: The following result, which is equivalent to Theorem 2 in Golden et al. (2019), shows the asymptotic distribution of ĥ under some regularity conditions.Details of the regularity conditions and another proof using classical asymptotic theory are given in the supplementary material.

Robust variance estimator
From Theorem 1, the variance-covariance matrix of ffiffiffi n p ð ĥ À h Ã Þ converges to H À1 JH À1 : Hence, n À1 H À1 JH À1 provides an asymptotically valid variance-covariance estimator of ĥ: However, we cannot compute H and J analytically, because they are defined as an expectation under the unknown true distribution f.Thus, we use empirical estimators of H and J defined as follows: j ; hÞ @h @ log g i ðy ðiÞ ; hÞ @h T h¼ ĥ : As in the case of the complete data, H and J can be consistently estimated by Ĥ and Ĵ , respectively (e.g.Barndorff-Nielsen and Cox 1994, p.83).
The matrices n À1 ĤÀ1 and n À1 ĤÀ1 Ĵ ĤÀ1 are called the naive (model-based) and robust variance estimators, respectively.The above discussions do not require any assumption on the missingdata mechanism.Hence, the robust variance estimator n À1 ĤÀ1 Ĵ ĤÀ1 is a consistent estimator of the asymptotic variance-covariance matrix of ĥ even under model misspecification and MNAR mechanism.On the other hand, the naive variance estimator is asymptotically valid if the equality H ¼ J holds.Takai and Kano (2013) showed the equality H ¼ J when the analysis model is correctly specified and the missing-data mechanism is MAR.
In addition to build the analysis model, researchers often model the missing-data mechanism or make some assumptions on it; hence, model misspecification might also occur owing to an incorrect specification or assumption about the missing-data mechanism.It is generally wellknown that misspecifying the missing-data mechanism causes inconsistency of ĥ even when the analysis model is correctly specified.If ĥ does not have consistency for the true parameter, the equality H ¼ J does not hold.In such case, the asymptotic variance of Hence, it is advisable to use the robust variance estimator rather than the naive variance estimator.

Simulation design
We considered a longitudinal randomized clinical trial with two treatment groups, which evaluates the effect in a active group compared with a placebo group.The purpose of this simulation study is to check whether the robust variance estimator is more appropriate than the naive variance estimator in finite samples.This simulation study assumed both model misspecification and the presence of missing data.Let ðy 1 , y 2 , y 3 Þ T follow a multivariate normal distribution with heterogeneous first-order autoregressive covariance structure: The fitting distribution was the multivariate normal distribution with mean vector l þ l g x g and variancecovariance matrix R, where R were r 11 r 12 r 13 r 12 r 22 r 23 r 13 r 23 r 33 0 @ 1 A and when the fitting covariance structures were unstructured and compound symmetry, respectively.Hence, h is ðl T , l T g , r 11 , r 22 , r 33 , r 12 , r 13 , r 23 Þ T or ðl T , l T g , s 1 , s 2 Þ T : We considered four scenarios for the missing-data mechanism.Scenario 1 is a monotone and MAR mechanism, which is the same between groups.Scenario 2 is a monotone and MNAR mechanism, which is the same between groups.Scenario 3 is a non-monotone and MAR mechanism, which differs between groups.Scenario 4 is a non-monotone and MNAR mechanism, which differs between groups.Let U be the cumulative distribution function of the standard normal distribution.Missing probabilities in Scenario 1 were modeled by P(y 3 becomes the missing value) ¼ UðÀ2 À 4y 2 Þ and P((y 2 , y 3 ) becomes the missing value) ¼ UðÀ3:5 À 4y 1 Þ: Missing probabilities in Scenario 2 were modeled by P(y 3 becomes the missing value) ¼ UðÀ2:6 À 4y 3 Þ and P((y 2 , y 3 ) becomes the missing value) ¼ UðÀ4:8 À 4y 2 Þ: These two missing-data mechanisms can be interpreted as the dropout from the trial because of the lack of efficacy.Missing probabilities in Scenario 3 were modeled by P(y 2 becomes the missing value) ¼ UðÀ1:5 þ 4y 1 À 8ðy 1 Â x g ÞÞ and P(y 3 becomes the missing value) ¼ UðÀ1:5 þ 4y 1 À 8ðy 1 Â x g ÞÞ: Missing probabilities in Scenario 4 were modeled by P(y 2 becomes the missing value) ¼ UðÀ3 þ 4y 2 À 8ðy 2 Â x g ÞÞ and P(y 3 becomes the missing value) ¼ UðÀ4 þ 4y 3 À 8ðy 3 Â x g ÞÞ: All possible observations in Scenarios 3 and 4 were y 1 , ðy 1 , y 2 Þ, ðy 1 , y 3 Þ, and ðy 1 , y 2 , y 3 Þ: Under these two missing-data mechanisms, the missing probability in the placebo group (x g ¼ 0) increases as the outcome increases, while the missing probability in the active group (x g ¼ 1) decreases as the outcome increases (e.g.Gosho and Maruo 2021).
Let the sample size in each group be 20, 50, 200, 500 to investigate small and large sample properties.Let l ¼ ð0, 0, 0Þ T , l g ¼ ð0, d=2, dÞ T , r 2 1 ¼ 1, r 2 2 ¼ 2, r 2 3 ¼ 4, and q ¼ 0:8: Here, we set d ¼ 0, 0:5, 1, 2: We considered two types of Wald test: the Wald test based on the robust variance estimator (named the robust Wald test) and that based on the naive variance estimator (named the naive Wald test).We calculated the maximum likelihood estimator ĥ, the naive variance estimator n À1 Ĥ of ĥ, and the robust variance estimator n À1 ĤÀ1 Ĵ ĤÀ1 of ĥ, where n was the total number of subjects.We tested the null hypothesis l g3 ¼ 0, which is a group difference at the end of the trial, at a significance level of 0.05 using the two Wald tests.We conducted 10000 trial simulations and calculated the ratio of the mean of the naive or robust variance estimator to the simulated variance of ĥ, as well as test size (d ¼ 0) and power (d 6 ¼ 0) for the two Wald tests.Here, the simulated variance is the variance of maximum likelihood estimates ĥk , k ¼ 1, :::, 10000, the test size is the proportion of times that the null hypothesis is rejected when d ¼ 0, and the power is the proportion of times that the null hypothesis is rejected when d 6 ¼ 0: We also calculated the mean of the maximum likelihood estimator lg3 , the standard deviation (SD) of lg3 , and the mean of naive and robust SEs of lg3 :

Simulation results
Tables 1-6 show the results of the simulation study.Simulation results when the sample size per group is 20 or 500 are shown.In addition, simulation results under the alternative hypothesis l g3 6 ¼ 0 are shown only for d ¼ 0:5 and 2, when the sample sizes per group are 20 and 500, respectively.Results for all the settings are in the supplementary material.Tables 1-4 show the ratio of the mean of the two robust variance estimators to the simulated variance of ĥ: Tables 5  and 6 show the performance of lg3 , naive SE, and robust SE, as well as the test size and power for the naive and robust Wald tests for testing the null hypothesis l g3 ¼ 0: In Tables 1 and 2, the analysis model was correctly specified because the fitting covariance structure was assumed to be unstructured.The robust and naive variance estimators had bias in many component regardless of missing-data mechanisms when the sample size was small.The two variance estimators were close to the simulated variance of ĥ when the missing-data mechanism was MAR (Scenarios 1 and 3) and the sample size was large.On the other hand, when the missing-data mechanism was MNAR (Scenarios 2 and 4), the naive variance estimator exhibited a bias in many components even in large-samples.In particular, the naive variance estimator corresponding to the parameter of interest l g3 had a bias in Scenario 4. Hence, if the MAR assumption does not hold, the naive variance estimator is not valid even when the analysis model is correctly specified and the sample size is large.In contrast, the robust variance estimator was close to the simulated variance of ĥ in all the scenarios when the sample size was large.However, the robust variance estimator had a downward bias when the sample size was small.
In Tables 3 and 4, the analysis model was misspecified because we fit compound symmetry to the heterogeneous first-order autoregressive covariance structure.Most of the elements of the naive variance estimator presented a bias in all scenarios regardless of the sample size.On the other hand, all the elements of the robust variance estimator were close to the simulated variance of ĥ in all scenarios when the sample size was large.When the sample size was small, the robust variance estimator had a downward bias although the robust variance estimator reduced the bias of the naive variance estimator.
Table 5 illustrates the results when fitting the unstructured variance-covariance matrix (i.e.correctly specified).The maximum likelihood estimator lg3 had little bias under scenarios 1-3 with d ¼ 0 or scenarios 1 and 3 with d 6 ¼ 0: The two types of SEs were unbiased in scenarios 1-3 under large sample size; hence the test sizes for the two Wald tests were close to the nominal level 0.05 when the estimator lg3 was unbiased.On the other hand, when the sample size was small, the two types of SEs had a downward bias, and thus, the test sizes were inflated.In scenario 4, lg3 was biased, and thus the test sizes for the two Wald tests were notably inflated.The powers of the two Wald tests were similar in all settings.
Table 6 illustrates the results when fitting the compound symmetry variance-covariance matrix (i.e.misspecified).The maximum likelihood estimator lg3 had little bias only under scenarios 1 and 2 with d ¼ 0. The naive SE was biased in all scenarios, and thus the test sizes for the naive Wald test were not controlled at the nominal level even if lg3 was unbiased.On the other hand, when the sample size was large, the robust SE was unbiased in all scenarios and the test sizes for the robust Wald test were controlled at the nominal level if lg3 was unbiased.The powers of the robust Wald tests were smaller than that of the naive Wald test, since the robust variance estimator reduced the downward bias of the naive variance estimator.
We obtained similar consideration from the results presented in the supplementary material.
Table 1.Ratio (percent) of the mean of the naive or robust variance estimator to the simulated variance of ĥ when the fitting covariance structure is unstructured and sample size per group is 20.
Scenario 1 (MAR) Scenario 2 (MNAR) Table 3. Ratio (percent) of the mean of the naive or robust variance estimator to the simulated variance of ĥ when the fitting covariance structure is compound symmetry and sample size per group is 20.

Case study
In this section, we investigate the performance of the robust variance estimator for a real dataset with missing value.The dataset is from a randomized, double-blind study of acquired immune deficiency syndrome (AIDS) patients with advanced immune suppression (Henry et al., 1998;Fitzmaurice, Laird, and Ware 2011).Patients were randomized to dual or triple combinations of human immunodeficiency virus-1 reverse transcriptase inhibitors.Specifically, patients were randomized to one of four daily regimens containing 600 mg of zidovudine: zidovudine alternating monthly with 400 mg didanosine; zidovudine plus 2.25 mg of zalcitabine; zidovudine plus 400 mg of didanosine; or zidovudine plus 400 mg of didanosine plus 400 mg of nevirapine (triple therapy).Measurements of cluster of differentiation 4 (CD4) counts were scheduled to be collected at baseline and at eight-week intervals during follow-up.The more details about the data handling process are provided by Maruo et al. (2017).We focused on two groups (group 0 ¼ zidovudine alternating monthly with 400 mg didanosine, 1 ¼ zidovudine plus 400 mg of didanosine plus 400 mg of nevirapine) and weeks 0, 8, and 16.The sample sizes in the groups 0 and 1 were 320 and 330, respectively.In our case study, we used the log-transformed CD4 counts (i.e.log(CD4 counts þ1)) as the outcome variable.Exploratory variables were treatment group, week (categorical), and the interaction group Â week.We fitted two distributions: fitting model 1 and 2 were the multivariate normal distribution with unstructured matrix and compound symmetry matrix, respectively.Let h be the same as that in the above simulation study.We calculated the maximum likelihood estimator ĥ and the naive and robust SEs of ĥ: Table 7 shows the results for our case study.The naive SE differed from the robust SE in many components of h for both fitting models.For the mean parameters l and l g , the maximum likelihood estimates were quite similar between fitting models.However, the naive SEs corresponding to l and l g depended on the fitting model.In contrast, although we cannot ensure MAR mechanism in such real data analysis, the robust SEs corresponding to l and l g did not depend on the fitting model.
Table 4. Ratio (percent) of the mean of the naive or robust variance estimator to the simulated variance of ĥ when the fitting covariance structure is compound symmetry and sample size per group is 500.

Discussion
It was previously unclear whether the robust variance estimator is asymptotically valid in the simultaneous presence of both model misspecification and missing data with arbitrary missing-data mechanisms.For this problem, Golden et al. (2019) recently showed an important result that the maximum likelihood estimator has a sandwich-type asymptotic variance-covariance matrix H À1 JH À1 in the simultaneous presence of both model misspecification and missing data.
Furthermore, H and J can be consistently estimated by empirical estimators Ĥ and Ĵ, respectively.The robust variance estimator n À1 ĤÀ1 Ĵ ĤÀ1 is asymptotically valid even when the missingdata mechanism is MNAR, since the discussion of Golden et al. (2019) did not require any assumption on the missing-data mechanism.Hence, the robust variance estimator is asymptotically applicable under model misspecification and in the presence of missing data.
In this article, we evaluated small and large sample performances of the robust variance estimator via simulation studies assuming a longitudinal clinical trial in which the robust variance estimator is frequently used.The simulation study indicated that the naive variance estimator is biased under MNAR mechanism even if the analysis model is correctly specified.From a theoretical perspective, likelihood-based analysis of incomplete data under MNAR mechanism is one of misspecification problems of the missing model.As with the case of the analysis model, the misspecification of the missing model may yield inconsistency of the maximum likelihood estimator.Thus, the naive variance estimator is not consistent, since it is estimated by substituting the maximum likelihood estimator ĥ in h.In contrast, the robust variance estimator showed the consistency even under MNAR mechanism and misspecification of the analysis model.Hence, in the presence of missing data, it is asymptotically preferable to use the robust variance estimator, since it is generally impossible to ensure the correctness of both the MAR assumption and the specification of the analysis model.However, the robust variance estimator showed a bias when the sample size was small.Although the robust variance estimator has a small-sample bias, the degree of bias of the robust variance estimator is smaller than that of the naive variance estimator under model misspecification or MNAR mechanism.Thus, the robust variance estimator would be preferable to reduce a bias of the naive variance estimator even when the sample size is not sufficiently large.
As noted above, our simulation study showed a downward bias of the robust variance estimator when the sample size is small.To avoid the bias, some small-sample corrections for the robust variance estimator are proposed in existing studies (Mancl and DeRouen 2001;Kauermann and Carroll 2001).Gosho et al. (2017, Gosho, Noma, andMaruo 2021) also reviewed several bias-correction estimators and compared the performance between the estimators.However, existing small-sample corrections assume the orthogonality between the mean and variance parameters when applying to the MMRM method.Maruo et al. (2020) noted that variance estimators assuming the orthogonality have a bias when the missing-data mechanism is not MCAR.Hence, small-sample corrections without the orthogonality assumption should be investigated in future research.
Assume longitudinal randomized clinical trials comparing two groups, in which the primary interest is the group-effect parameter.If the group-effect estimator is unbiased, then the Waldtype test using the robust variance estimator is asymptotically valid even under the model misspecification and MNAR mechanism.In general, the group-effect estimator might be biased under the two problems; hence we have to ensure both the MAR assumption and correct model specification of the analysis model in order to obtain meaningful estimates.However, when the null hypothesis of no difference between two groups is true, the bias for the group-effect parameter would be zero in some situations even under model misspecification (Gail, Wieand, and Piantadosi 1984) and MNAR mechanism (Mallinckrodt, Clark, and David 2001b;Barnes et al. 2008).In such cases, the Wald-type test using the robust variance estimator is asymptotically valid in terms of the test size even under the two problems.Our simulation result presented in Table 6 (Scenario 2) illustrates this situation.Hence, the robust variance estimator would be useful to control the test size at the nominal level under the two problems.

Table 2 .
Ratio (percent)of the mean of the naive or robust variance estimator to the simulated variance of ĥ when the fitting covariance structure is unstructured and sample size per group is 500.
Missing-data mechanisms are the same in each treatment group.Scenarios 3 and 4: Missing-data mechanisms are not the same in each treatment group.

Table 5 .
Test size and power for the naive and robust Wald tests at a significance level of 0.05 when the fitting covariance structure is unstructured.

Table 7 .
Maximum likelihood estimates and the naive and robust SEs for AIDS clinical trial dataset.

Table 6 .
Test size and power for the naive and robust Wald tests at a significance level of 0.05 when the fitting covariance structure is compound symmetry.Scenarios 1 and 2: Missing-data mechanisms are the same in each treatment group.Scenarios 3 and 4: Missing-data mechanisms are not the same in each treatment group.