Pay Attention to the Ignorable Missing Data Mechanisms! An Exploration of Their Impact on the Efficiency of Regression Coefficients

Abstract The use of modern missing data techniques has become more prevalent with their increasing accessibility in statistical software. These techniques focus on handling data that are missing at random (MAR). Although all MAR mechanisms are routinely treated as the same, they are not equal. The impact of missing data on the efficiency of parameter estimates can differ for different MAR variations, even when the amount of missing data is held constant; yet, in current practice, only the rate of missing data is reported. The impact of MAR on the loss of efficiency can instead be more directly measured by the fraction of missing information (FMI). In this article, we explore this impact using FMIs in regression models with one and two predictors. With the help of a Shiny application, we demonstrate that efficiency loss due to missing data can be highly complex and is not always intuitive. We recommend substantive researchers who work with missing data report estimates of FMIs in addition to the rate of missingness. We also encourage methodologists to examine FMIs when designing simulation studies with missing data, and to explore the behavior of efficiency loss under MAR using FMIs in more complex models.


Introduction
Modern missing data techniques, such as full information maximum likelihood (FIML; Allison, 1987) and multiple imputation (MI; Rubin, 1987), have become increasingly accessible to psychologists in recent years.Statistical software such as AMOS (Arbuckle, 2014), Mplus (Muth en & Muth en, 2017), along with R (R Core Team, 2019) packages lavaan (Rosseel, 2012) and mice (van Buuren & Groothuis-Oudshoorn, 2011), have made these techniques viable tools in substantive research where missing data can be a common occurrence.These modern missing data techniques generally outperform traditional ad hoc techniques such as listwise deletion and pairwise deletion, in that they produce consistent parameter estimates and standard error estimates under a wider variety of missing data mechanisms (see Graham, 2009 for an overview).
There are three types of missing data mechanisms according to the categorization scheme by Rubin (1976): missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).Below we provide informal descriptions of these mechanisms.Under MCAR, the probability that data are missing is independent of the data.Under the broader case of MAR, the probability that data are missing is independent of the data as long as it is conditioned on observed data.Lastly, data are MNAR if missingness is dependent on some unobserved data.MAR is generally considered an ignorable missing data mechanism under most circumstances, 1 in the sense that we can ignore the details of MAR when applying likelihood-based modern missing data techniques such as FIML or MI (Little & Rubin, 1987).In contrast, the correct handling of MNAR data requires additional modeling of the specific missing data mechanism, for instance, using pattern mixture models (Glynn et al., 1986).
Central to this article is the observation that there are many variations of MAR due to the wide variety of ways missingness can be determined by observed data.For example, consider a course component with a pass/fail grade, where students must pass two out of three evaluations to receive a pass.Suppose that in a high performing class, many students passed the first two evaluations and are not required to attend the third evaluation as a result.In another class, many students failed the first two evaluations and did not attend the third evaluation as it would not improve their grade.In the third class, both types of absence occurred.In all three classes, data are MAR since the missingness in the third evaluation is determined by the observed performance in the first two evaluations.When handling missing data with modern techniques, we do not need to distinguish between such variations in MAR; parameter estimates will remain consistent and statistical inferences will remain valid following FIML or MI (under many common imputation techniques such as predictive mean matching) for any MAR variation.However, the efficiency (i.e., precision) of parameter estimates, and therefore the power of statistical tests, will vary with the particular MAR variation, even holding the rate of missing data constant.Understanding how the precision of estimation is affected by the specific MAR variation can be useful for both substantive researchers working with different types of missing data and methodologists studying the performance of different techniques with missing data.
In this article, we explore the impact of variations in MAR on the efficiency of parameter estimates in regression.The article is organized as follows.In the next section, we illustrate how specific MAR mechanisms can have differential impact on the efficiency of parameter estimates.Additionally, we describe the fraction of missing information (FMI), as a means of quantifying this impact.We then provide a simulated example to empirically demonstrate how FMI captures the efficiency loss due to different variations of MAR in small samples.Next, we introduce a Shiny application that can be used to interactively explore the impact of MAR in regression, using the pseudo-population FMI as a measure of efficiency loss.Finally, we summarize our main findings and discuss their implications for substantive research with missing data, as well as for conducting simulation studies of missing data techniques.

MAR variations and efficiency
When data are missing, less information is available in the partially observed dataset about the true parameter values than there would have been had the data been complete.Therefore, missing data, even when handled correctly using modern techniques, will cause more uncertainty in the parameter estimates, leading to larger standard errors, wider confidence intervals, and lower statistical power.One could say that statistical tests are effectively performed under a smaller sample size.Traditionally, to provide a basic assessment of this efficiency loss, researchers report on the amount of missing data in their dataset, such as the proportion of missing values per variable or per dataset, or the proportion of rows with missing values in the dataset.There are some merits to these metrics-given the same MAR mechanism, 50% missing values would lead to considerably greater power loss than 10% missing values.However, even when the amount of missing data is the same, previous studies have shown that different variations of MAR can have different impact on data analysis (most notably, Collins et al., 2001).MAR variations could lead to differential efficiency loss under a wide variety of scenarios (Anderson, 2021;Chen et al., 2020;Savalei & Rhemtulla, 2017;Sullivan et al., 2018;Yucel et al., 2011).In other words, the effective reduction in sample size is not straightforwardly related to any measure of the rate of missing data.There are many different dimensions along which MAR mechanisms can vary, affecting efficiency; we describe several such variations below.

Linear and nonlinear MAR
Consider the following MAR mechanism applied to two variables X and Y that follow a bivariate normal distribution.Suppose X is a fully observed variable that governs how values are missing on Y; we will call X a conditioning variable.Further suppose that for any observation, if the X value is at its population mean or above, Y is always missing; if the X value is instead below its population mean, Y is fully observed.Under this MAR variation, the relationship between the conditioning variable X and the missing probabilities on Y can be described with a conditioning function, depicted in the top left panel of Figure 1 (similar visualization can be found in Gomer & Yuan, 2021).This function is monotonic, in that the probability of missingness can only increase with higher values of X, resulting in what has been called a linear MAR mechanism (see Collins et al., 2001;Savalei & Rhemtulla, 2017;Yoo, 2009). 2 In this paper, we will refer to linear MAR as MAR-L.The MAR-L mechanism deletes information in the bivariate distribution of X and Y corresponding to one tail of X, but preserves information on the other tail (see the top right panel of Figure 1).Additionally, when the correlation between 2 This name arises from the most common way to simulate data from such an MAR mechanism, which specifies that data are missing on a set of variables if some linear combination of the conditioning variables exceeds a certain cutoff.
X and Y is high, one tail of Y is also deleted with higher probability, shifting the observed mean and reducing the observed variance; an example of the effect of MAR on the distribution of the variable containing missingness at different correlations is given in Table S1 in the Supplementary Material.The MAR mechanism can be nonlinear in various ways (Howard et al., 2015); for example, the conditioning function may be non-monotonic (Savalei & Rhemtulla, 2017;Chen et al., 2020).If data on Y are missing only when X takes on values in the top and bottom 25%, as depicted in the middle left panel of Figure 1, such a nonlinear MAR mechanism will lead to data missing at the tails of the distribution of X.We will refer to this type of nonlinear MAR mechanism that generates missing data corresponding to the tails of the conditioning variable as MAR-NLT.This mechanism is also named MAR-convex by Collins et al. (2001), in one of the earliest studies to investigate how its impact differs from the impact of MAR-linear in a psychology journal (see also Yoo, 2009).This mechanism will tend to delete data that are particularly informative to the strength of the bivariate relationship (middle right panel of Figure 1).When the correlation between X and Y is sufficiently high, the tail ends of Y are also deleted with higher probability, reducing the observed variance.In our earlier example, the case where the best and worst performing students did not attend the evaluation can be categorized as MAR-NLT.
Alternatively, when data are missing on Y only when X takes on values in the middle 50%, the nonlinear MAR mechanism generates missing data corresponding to the center of the conditioning variable.We will refer to it as MAR-NLC, and it is depicted in the bottom left panel of Figure 1.MAR-NLC can arise, for instance, from extreme groups design (Chen & Fouladi, 2022;Feldt, 1961;Fisher et al., 2020;Preacher et al., 2005), where participants who score the highest and lowest on a screening test are recruited for the second stage of a study.Unlike MAR-NLT, instead of deleting data that are more informative, MAR-NLC will tend to preserve information on the bivariate relationship (see Figure 1).With a high correlation between X and Y, values toward the center of Y are also deleted more frequently, inflating the observed variance.Thus, even when the overall missing rate is held constant, MAR-L, MAR-NLT, and MAR-NLC mechanisms may lead to different amounts of information loss.
To understand why efficiency loss differs among variations of MAR mechanisms even while holding the rate of missingness constant, we compare MAR-NLT and MAR-NLC in a simple regression predicting Y from X as an example.Imagine a set of data points that fall near the mean of X: Regardless of how the regression line "wobbles", i.e., the slope estimate changes, these data points will yield similar residual values. 3Data near the center thus provide less information to dictate the estimation of the slope.MAR-NLC leads to less efficiency loss on the regression slope because it systematically deletes these less informative data close to the center.In contrast, MAR-NLT systematically deletes data farther away from the mean, preserving only the less informative data in the center, leading to much higher efficiency loss.

Deterministic and probabilistic MAR
Another important dimension along which MAR mechanisms may vary is the strength of the selection mechanism (e.g., see Chen et al., 2020;Sullivan et al., 2018;Yucel et al., 2011).In the conditioning functions for MAR-L, MAR-NLT, and MAR-NLC, the values of X fully determine which values are missing in Y.In other words, the relationship between the conditioning variable and the missing probabilities is deterministic-this is the strongest MAR selection mechanism.There are several ways to define the strength of a MAR mechanism.In simple cutoff-based MAR mechanisms, like the examples given above, MAR strength can be defined by two conditional missing probabilities, p 1 and p 2 (Zhang, 2021).Under this definition, a deterministic MAR mechanism will have p 1 ¼ 1 and p 2 ¼ 0, 4 which means that each data point has either a 100% chance or 0% chance of being missing, as determined by the respective cutoffs in the conditioning variable.For example, Y is missing with a probability of p 1 ¼ 1 (i.e., always missing) when X > 0, and Y is missing with a probability of p 2 ¼ 0 (i.e., always observed) when X 0: A more realistic and weaker MAR mechanism will be probabilistic; e.g., Y is missing with probability p 1 ¼ :8 (80%) probability when X > 0, and Y is missing with probability p 2 ¼ :2 (20%) probability when X 0 (note that it is not necessary for p 1 and p 2 to add up to 1).The MAR mechanism is the weakest when p 1 ¼ p 2 -in this case, the missing rate is the same regardless of the value of X.In other words, the missingness is not related to X, and the missing data mechanism is in fact MCAR in the absence of any other conditioning variable.In this way, MCAR can be considered the end point of a continuum of MAR variations based on strength.Consequently, weaker MAR mechanisms tend to perform more similarly to MCAR compared to stronger MAR mechanisms, all else being equal (see Chen et al., 2020;Sullivan et al., 2018).

Other MAR variations
MAR mechanisms can differ in other important ways.For example, while all examples of MAR given above are generated from step conditioning functions (see Figure 1), the function relating the conditioning variable value and the missing probability can also be continuous, e.g., linear (Sullivan et al., 2018) or logistic (Anderson, 2021;Mazza et al., 2015;Yucel et al., 2011).An MAR mechanism can also involve multiple conditioning variables, i.e., the missing probability can depend on a linear or nonlinear combination of several variables in the dataset.Missingness can even be a function of the correlation between two fully observed variables within each subgroup, where the correlation varies across subgroups (Collins et al., 2001).Furthermore, when multiple variables contain missing values, MAR mechanisms can differ based on whether missingness occurs jointly or independently on each variable even when each individual variable's conditioning functions are the same.This variation will affect the number of missing patterns in the resulting dataset, which is known to affect the performance of missing data techniques even under MCAR (Savalei & Bentler, 2005).

Differences among parameters
The impact of different MAR variations on the efficiency of estimates can also differ from one parameter to another.For example, when MAR-NLT and MAR-NLC have conditioning functions that are symmetrical about the mean of the conditioning variable (e.g., data are missing on Y when X is smaller than À1 or greater than 1, where the mean of X is 0), the resulting data loss on the variable with missingness will also be symmetrical about the mean of the variable with missingness.In this case, they may lead to similar impact on the univariate mean statistics, even though they will generally result in different efficiency loss on bivariate statistics such as the regression slope.In contrast, because MAR-L leads to asymmetrical data loss, it is likely to produce greater efficiency loss on the mean estimate compared to MAR-NLT and MAR-NLC. 5n sum, because there are many variations of MAR, and efficiency loss can be different under each combination of MAR features, the proportion of missing data is an inadequate measure of efficiency loss due to missing data even if the mechanism is assumed to be ignorable.We can improve upon the current practice of reporting only the rate of missing data (per dataset or per variable) for presumptive MAR mechanisms by also adopting a measure that can quantify, under any variation of the MAR mechanism, the impact of missing data on the efficiency of each parameter estimate.In the next section, we review such a measure, which is the fraction of missing information (FMI; see Rubin, 1976;Savalei & Rhemtulla, 2012).

Fraction of missing information
FMI directly captures the relative amount of information loss due to missing data in each estimated model parameter.It is a function of the amount of variance inflation in the sampling distribution of a parameter estimate due to missing data, comparing observed sampling variance (under incomplete data) to what it would have been had the data been complete (Orchard & Woodbury, 1972;Rubin, 1987).Sample estimates of FMI were first introduced in the context of MI (Rubin, 1987), where the would-be variance of the sampling distribution of a parameter estimate with complete data was estimated from the "withinimputation" variance, and the increase in the sampling variability due to missing data was estimated from the "between-imputations" variance.More recently, Savalei and Rhemtulla (2012) showed how to obtain FMI estimates from FIML.This computation of FMI has been automated in the R (R Core Team, 2019) package lavaan 0.6-9, where FMI estimates can be requested with the fmi ¼ TRUE option in the parameterEstimate() function, after a model has been estimated under FIML (for more details, see Chen & Savalei, 2021).This is the computation we will use in this article.
Under the FIML framework (or equivalently, under an MI framework with a very large number of imputations), the FMI of the jth model parameter is defined as follows: where SE 2 O, j is the squared asymptotic standard error of the parameter estimate based on observed data (i.e., data that contain missing values), and SE 2 C, j is the squared asymptotic standard error of the parameter estimate based on complete data.In practice, of course, complete data are not available, but it is possible to obtain an estimate dj from FIML output (Chen & Savalei, 2021;Savalei & Rhemtulla, 2012).Note that when there are no missing data, SE 2 C, j ¼ SE 2 O, j and d j ¼ 0, indicating that no information about the jth parameter is lost.As the impact of missing data on precision increases, SE 2 O, j increases, and d j will become a fraction between 0 and 1. Theoretically, the FMI could be 1 if the parameter is not estimable at all (for example, if for a pair of variables X and Y there are no cases where both are observed, the regression slope cannot be estimated, so its associated FMI value is theoretically 1); estimation of the model will fail in this case.The asymptotic standard errors used in Equation 1 are obtained from the diagonals of the inverse of the information matrix (Orchard & Woodbury, 1972;Rubin, 1976;Savalei & Rosseel, 2021), and are not direct functions of rates of missing data.The FMI of a particular parameter will vary with the specific MAR variation, even while the rates of missing data (per variable and per dataset) are held constant.
The FMI can also be converted to an indicator of how much the confidence interval (CI) for the parameter is expected to widen due to missing data.The width inflation factor (WIF; Savalei & Rhemtulla, 2012) is defined as WIF The WIF is the ratio of the expected width of two CIs for the same parameter of interest: one based on incomplete data and one based on complete data.The WIF lends directly to substantive interpretations.For example, when d j ¼ :75, WIF j ¼ 1 ffiffiffiffiffiffiffiffi ffi 1À:75 p ¼ 2: In other words, if 75% of the information about a parameter is lost due to missingness, the observed CI is expected to become twice as wide as what the CI would have been under complete data.For another example, when d j ¼ :5, WIF j ¼ 1:41, and the width of the CI has increased by 41% due to the missing data.Thus, FMI is directly related to efficiency loss, but its value (i.e., 75% or 50%) is a function of the specific MAR variation and the parameters, not just of the rate of missing data (which can be higher and even lower than the FMI).
We can use the FMI to investigate efficiency loss due to missing data under different variations of MAR.The expected efficiency loss due to missing data (in terms of the proportional change in standard errors) is the same regardless of sample size, and can be studied at the population level.However, for a given MAR mechanism, population FMI values can be difficult to obtain analytically.To get around this problem, in this article we compute pseudo-population FMIs by generating a single large sample from each MAR variation and estimating the FMI using lavaan.Below, we first provide a concrete simulated example that illustrates how pseudo-population FMI can inform us of the impact of missing data on efficiency under several MAR variations.Next, we summarize the results of our main study, which uses the pseudo-population FMI to explore efficiency loss under a wide variety of MAR variations in regression models.

A simulated example
Suppose variable X predicts variable Y in a simple regression with b 1 ¼ :4, and half the data on Y are missing.With a sample size of N ¼ 75 and complete data, the test of the null hypothesis that b 1 ¼ 0 will have 96%.Power will be lower with missing data, but this reduction will differ based on the specific MAR variation.We demonstrate this by imposing missingness on Y as a function of the value of X in four different ways, consistent with MCAR, MAR, MAR-L, MAR-NLT, and MAR-NLC mechanisms.These mechanisms are set to be deterministic for this example (p 1 ¼ 1, p 2 ¼ 0), with a missing rate of p mis ¼ :5 on Y, which results in an overall 25% rate of missing data. 6The small sample size, high MAR strength, and relatively high missing rate are selected to showcase the differences among these mechanisms, and how we can predict these differences using pseudo-population FMI.
Table 1 shows the impact of missing data on power under different MAR variations, simulated from 1,000 replications of size N ¼ 75.Under MCAR, the power to reject the null that b 1 ¼ 0 is reduced from 96% to 74.5%.Under MAR-L and MAR-NLT, the power is even more severely impacted, and is now 37.4% and 21%, respectively.However, under MAR-NLC, the power is essentially unaffected relative to complete data.Table 1 also shows the similar impact of the MAR variations on the standard error for b 1 (average across 1,000 replications), which explains the differences in power.In real data, differences among mechanisms are likely to be less dramatic, because most MAR mechanisms should be probabilistic rather than deterministic.Nevertheless, this example shows that potential impact of the MAR mechanism on power and efficiency can be large while the rate of missing data is held constant.To illustrate the usefulness of the FMI, we computed the pseudo-population FMI from the entire set of simulated data (N ¼ 1,000,000) for each mechanism, and we also converted it to the related WIF measure.Table 1 shows that FMIs and WIFs vary quite a lot across the MAR variations, despite the constant rate of missing data.Their values are predictive of the power and average efficiency observed at N ¼ 75, and they track closely the average relative efficiency values from the simulation (the ratio of ESE in each condition to its value in the complete data condition).
We computed a single pseudo-population FMI estimate (from N ¼ 1,000,000) in each condition, rather than estimating FMIs in each sample of N ¼ 75, because FMI estimates require large samples to be accurate.Chen and Savalei (2021) recommended a minimum sample size of N ¼ 200 for simple regression, and even at this sample size the variability of FMI estimates can be quite large.However, this simulated example illustrates that we can learn a lot about the general patterns of efficiency loss by using pseudo-population FMI, and the results will hold for all samples, including those that are too small to yield precise FMI estimates.For instance, pseudo-population FMI values revealed that MAR-NLC leads to much less information loss than MCAR, likely because MCAR deletes more observations from the tails of the X distribution, where the values of X are more informative, just by chance, whereas MAR-NLC avoids deleting data from the tails.This differential information loss is reflected in the power differences at N ¼ 75.Thus, studying the behavior of pseudo-population FMIs under a wide variety of models and conditions can help methodologists understand the impact of different types of missing data on estimation, and to discover properties about different variations of MAR missingness that may not be apparent otherwise.Below we summarize the results of the first such investigation, focusing on simple regression and two-predictor multiple regression.

Main study: Exploring the impact of MAR variations on efficiency
In this section, we report on the results of our investigation of the behavior of pseudo-population FMIs under various scenarios of MAR missingness in the context of regression models.As a first study of its kind, it also serves as a demonstration of how methodologists can investigate MAR-induced efficiency loss using FMIs in other contexts, as well.To aid our exploration of the impact of different MAR mechanisms, we created a Shiny application via the shiny package (Chang et al., 2021), which obtains the pseudo-population FMIs for a specified set of conditions for the studied models.This application has been made publicly available so that interested readers may examine additional results beyond what we report below. 7In the rest of this section, we will refer to the pseudo-population FMI simply as "FMI" for brevity.
Our study focuses on regression models with one or two predictors under MCAR, MAR-L, MAR-NLT, and MAR-NLC variations while varying the following conditions: 1. which variable is the conditioning variable; 2. which variable contains missingness; 3. the strength of the MAR mechanism; 4. the values of the regression coefficients; 5. (in two-predictor regression) the correlation between predictors.
The conditioning variable can be a predictor or the criterion.The same applies to the variable with missing data, with the added constraint that it cannot be the same variable as the conditioning variable.The strength of the MAR mechanism can be Deterministic (p 1 ¼ 1, p 2 ¼ 0), Probabilistic (p 1 ¼ :8, p 2 ¼ :2), or Probabilistic II (p 1 ¼ :9, p 2 ¼ :1).The regression coefficient can be set to any values between 0 to 1, with the restriction that the correlation matrix must be 7 The application is available at https://semlab.shinyapps.io/regfmi/.The source code can be found on OSF at https://osf.io/2srvz.positive definite; the same rule applies to the correlation between the predictors in two-predictor regression.A summary of these options, all of which can be changed in the Shiny app, can be found in Table 2; further details on these options are provided in Appendix B. For each combination of conditions, we will vary the missing rates on the variable with missingess from 0 to .8.Many of these options are not necessarily reflective of the typical data obtained from empirical studies, but they have been chosen to highlight interesting and informative contrasts that can help us understand the key factors underlying MARinduced information loss under different conditions.
We now summarize key findings about the relationship between the missing mechanism and the efficiency of parameter estimates (as measured by pseudo-population FMIs) in regression.Overall, the two primary factors that determine efficiency loss due to missingness are the missing rates and the MAR variations.All else being equal, and unsurprisingly, efficiency loss increases as the rate of missing data increases.However, the details of the MAR variation will affect the way efficiency loss increases as a function of increased missing rates.In most cases, efficiency loss on the regression slope follows the following rank order, from highest to lowest: MAR- From 0 to 1 (default: .4) The slope of the first predictor.b 2 ÃÃ From 0 to 1 (default: .4) The slope of the second predictor.cov X1, X2 ÃÃ From 0 to 1 (default: .2) The covariance between the two predictors.b 0 From À1 to 1 (default: 0) The intercept of the regression.NLT, MAR-L, MCAR, MAR-NLC.This pattern holds for most but a few very specific conditions.
Beyond missing rates and MAR variations, a key factor is which variable contains missing data, and which variable conditions (i.e., predicts) that missingness; we organize our main findings about the efficiency of the regression slopes under three scenarios based on this factor.In Scenario I, data are missing on the criterion Y in simple and multiple regression.For the MAR mechanisms in this scenario, the missingness is a function of the predictor X 1 (without loss of generality).In Scenario II, data are missing on the predictor X 1 as a function of the criterion Y, in simple and multiple regression.Scenario III specifically applies to multiple regression, where data are missing in X 1 as a function of the other predictor, X 2 .Finally, we will discuss the FMIs for the regression intercept, which is typically of less theoretical interest than the regression slope, and exhibits relatively simple behaviors under the conditions of this study.Most findings are summarized in figures; Table 3 gives the specifications needed to reproduce all figures in this article in the Shiny app.

Scenario I: When data are missing on Y as a function of X 1
The pattern of efficiency loss in estimates of the regression slope due to MAR is most straightforward when missingness occurs only on the criterion (Y) as a function of a predictor (X 1 ).This scenario can occur naturally in substantive research: When X 1 is hypothesized to be a cause of changes in Y in such a model, it is sometimes reasonable to assume that the missingness in Y is also predicted by X 1 , and that the missingness is otherwise random.This scenario also leads to the most straightforward set of results.In our investigation, we found that under this scenario, the FMIs of the regression slopes are entirely unaffected by the strength of the coefficients; the slope coefficients were therefore arbitrarily set to b 1 ¼ b 2 ¼ :4: Of the three scenarios we have explored, this is unique to Scenario I.
Select results for slope FMIs in simple regression under Scenario I are presented in Figure 2. From Panel (a), we can see that while FMI increases as a function of the missing data rate for each individual missing mechanism (each line on the figure represents one mechanism), the FMI at each missing rate can vary drastically based on the missing mechanism.At every missing rate of p mis ¼ 0.1, 0.2, 0.4, 0.6, FMI of the slope estimate is smallest under MAR-NLC, followed by MCAR, then MAR-L, and finally MAR-NLT, which yields the greatest information loss.This replicates the findings from the simulated example we reported in the previous section, and generalizes them to a range of missing data rates.It should be noted that, although the FMI has been visualized for all missing rates, there are no missing data at p mis ¼ 0, and the FMI is always 0. Lastly, the slope FMI under MCAR takes on the same value as the missing rate on Y, namely, d b 1 ¼ p mis ; as we will later show, however, this result only holds for Scenario I, where missingness occurs only on the criterion Y.
The results above also generalize from deterministic MAR to probabilistic MAR, which is shown in Figure 2 Panel (b).For this probabilistic condition, p 1 ¼ :8 and p 2 ¼ :2: Simply speaking, in such a probabilistic condition, missingness only follows the conditioning variable 80% of the time; details of implementation can be found in Appendix B under the Missingness subheading.This probabilistic mechanism can be seen as more realistic than the deterministic condition, as conditioning variables rarely fully determine naturally occurring missingness in substantive research.From Figure 2 Panel (b), we see that the ordering of the MAR mechanisms, in terms of their impact on efficiency, is the same as the deterministic condition at p mis ¼ :4 and p mis ¼ :6, albeit with smaller differences among them; in fact, the FMI under all MAR conditions becomes closer to the MCAR condition with a weaker selection mechanism.8

Multiple regression under Scenario I
As we generalize from simple regression to multiple regression with two predictors, we make two notable new observations.The first new observation is that the slope of X 2 (the predictor that is not the conditioning variable) is not affected by the missing data mechanism.In fact, under Scenario I, d b 2 (the FMI of b 2 ) is simply equal to the missing rate p mis under all conditions; for this reason we have not included a figure for these results, as each panel would only show four overlapping lines at d b 2 ¼ p mis : Here, we hypothesized that, because the slope b 2 is a partial regression coefficient which partials out the conditioning variable X 1 , it appears that the effect of the specific MAR mechanism, which is a function of X 1 , has been removed from the estimation of b 2 as a result, leading to an efficiency loss equal to the efficiency loss under MCAR. 9he second new observation from multiple regression is that, while the FMI of b 1 still does not vary with the strength of the regression coefficients, it now varies based on the strength of the correlation between the predictors.Figure 3 shows the FMI of b 1 under q X 1 X 2 ¼ 0, 0.4, and 0.7 (shown in different rows).First, we see can see from Panel (a) and Panel (b) that, when q X 1 X 2 ¼ 0, the FMI of b 1 is exactly the same as it was in the simple regression.However, as q X 1 X 2 increases to 0.4 (the second row of the figure) and then to 0.7 (the third row of the figure), the FMIs under all MAR mechanisms become closer to MCAR.
As we observed in b 2 , because the conditioning variable is partialled out of the estimation of b 1 , the effect of the specifics of the mechanism is also removed from the estimation.The estimation of b 1 partials out X 2 .Previously, we saw that partialling out the conditioning variable X 1 removed the influence of the MAR variation.In this case, X 2 is not the conditioning variable.However, as X 1 and X 2 becomes increasingly correlated, the effect of partialling out X 2 begins to resemble the effect of partialling out the conditioning variable X 1 .Here we see an extension of the previous observation: When a variable correlated with the conditioning variable is partialled out, then the stronger that correlation is, the smaller the effect of the MAR variation on the efficiency loss.
Other patterns of results for multiple regression are generalizations of the same patterns from simple regression.First, the ordering of FMIs under the missing data mechanisms have been preserved, with MAR-NLT leading to the most efficiency loss and MAR-NLC leading to the least.Second, the FMI under MCAR are directly equal to the missing rate on Y (again we note that this is particular to Scenario I).Finally, in the multiple regression, as in the simple regression, we observe that the effect of a more probabilistic, and therefore weaker, MAR mechanism results in more similar efficiency loss to the MCAR condition.As pointed out earlier, MCAR is a special case of MAR with the weakest selection mechanism; as the strength of MAR weakens, the performance tends toward MCAR.
Scenario II: When data are missing on X 1 as a function of Y When missingness occurs on a predictor (X 1 ) as a function of the criterion (Y), some behaviors of the FMI are similar to the previous scenario, but others become more complex.Importantly, FMI no longer remains invariant to the strengths of regression coefficients in this scenario.To make room in the figures to illustrate this relationship, we now only show deterministic MAR mechanisms in our main result figures.As before, when the missing data mechanism is probabilistic, the differences in FMIs among missing data mechanisms become smaller; we present figures for the deterministic MAR mechanisms to highlight the differences among the different mechanisms.
In the case of simple regression under Scenario II, the FMI of the slope coefficient varies with the strength of the coefficient itself.Panels of Figure 4 show how the FMI behavior changes as b 1 takes on To understand the difference between the two scenarios, consider that the regression coefficient is given by b 1 ¼ covðX 1 , YÞ varðX 1 Þ : Although the numerator of the estimate of b 1 ¼ covðX 1 , YÞ varðX 1 Þ is affected by missing data in Y and will change along the strength of the relationship between the two variables, 10 when we can estimate its denominator varðX 1 Þ with complete data on X 1 (i.e., under Scenario I), the effect of the relationship 10 Specifically, we refer to the strength of the linear relationship between the two variables, since our study is limited to linear regression.
appears to be removed from the impact of missingness.Under Scenario II, however, the variance of X 1 must also be estimated via FIML, which borrows information from the relationship between X 1 and Y, i.e., b 1 .As a result, efficiency loss can vary drastically according to the strength of b 1 .The different patterns in the efficiency of the slope estimate under these two scenarios are thus a result of the interplay among the estimation of these quantities.
The interplay between the estimation of the numerator and denominator of b 1 leads to the most striking result at b 1 ¼ :7, where the differences among the MAR mechanisms disappear, and all mechanisms yield the same performance as MCAR; this can be seen in Panel (d) of Figure 4.However, as b 1 continues to increase to 0.90 in Panel (e), the differences among MAR reemerge, and we once again see the same rank order of efficiency loss from the different mechanisms as before.Curiously, in simple regression, b 1 ¼ :7 corresponds to R 2 ¼ b 2 1 ¼ :7 2 ¼ :49, which is close to R 2 ¼ :50, when X 1 explains 50% of the variance in Y.To further investigate this relationship, we hold the missing rate of X 1 constant at p mis ¼ :4, and plot the slope FMI as a function of the R 2 (see Figure 5).From this figure we can see that, for a given p mis , as R 2 approaches 0.5, the FMI converges to the same value for all missing data mechanisms. 11It would appear that, when X 1 and Y share 50% of their variance and Y conditions the missingness on X 1 , the estimation of b 1 ¼ covðX 1 , YÞ varðX 1 Þ becomes independent of the specific missing data mechanism, and is affected only by the missing rate.
The simple regression example in this scenario serves as a salient reminder that FMI is not invariant to reparameterization (see Savalei & Rhemtulla, 2012).Conceptually, predicting X from Y can sometimes be considered identical to predicting Y from X, from the perspective of estimating the standardized slope (or when their variances are equal).However, with  Note.Missing rate on X 1 is set to .4. q X2Y : the correlation between X 2 and Y.The line for each mechanism was produced by smoothing via the rlm function in R with the function y $ poly(x,6).The smoothing, chosen visually, is used to make the connection to Figure 5 easier to see by showing the overall trend in each mechanism.To avoid non-positive definite matrices, we used q X 1 Y , q X 2 Y , and q X 1 X 2 values between 0 and .65,which lead to fewer results and less accurate smoothing above q 2 X1YÁX2 ¼ :5: MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at the tails/center.Only deterministic MAR conditions are shown (p 1 ¼ 1 and p 2 ¼ 0).The mechanism of the simulated data points is denoted by their shapes, and the mechanism of the smoothed lines is denoted by the line type.11 Since our app currently only generates variables with variances of 1, and R 2 is a standardized metric, we have conducted additional pseudopopulation estimation and verified that the result holds for R 2 ¼ :50 when variables have variances that are not 1.missing data, the efficiency loss for the regression coefficient can be drastically different between the two models.

Multiple regression under Scenario II
Multiple regression introduces additional complexity to Scenario 2, as the FMIs of b 1 and b 2 now both vary with the strengths of b 1 , b 2 , as well as q X 1 X 2 : As before, we assign X 1 to be the variable with missingness, without loss of generality, and data on X 2 and Y are complete.First, we focus on the FMI of b 1 , as it is more directly comparable to the FMI of b 1 in the simple regression of this scenario.Recall that in the simple regression, as R 2 approaches 0.50, the FMIs of b 1 under all mechanisms begin to take on the same value (Figure 5), i.e., the specific missing data mechanism does not affect the estimation of b 1 when X 1 explains 50% of the variance in Y.In multiple regression, the proportion of the variance in Y explained by X 1 while controlling for X 2 is the squared partial correlation, q 2 X 1 YÁX 2 ; in other words, this quantifies, out of the variance in Y not explained by X 2 , what percentage of that unexplained variance is explained by X 1 .This quantity is a function of b 1 , b 2 , and q X 1 X 2 : When we compute the FMI of b 1 under a range of values for b 1 , b 2 , and q X 1 X 2 , we indeed find that the FMI of b 1 under all mechanisms takes on the same value when the squared partial correlation q 2 X 1 YÁX 2 ¼ :50, as shown in Figure 6.This generalizes our earlier finding-in multiple regression with two predictors, when Y conditions the missingness in X 1 , the efficiency loss in b 1 is unaffected by the missing data mechanism when X 1 explains 50% of the variance in Y not explained by X 2 .However, there is one key difference as we generalize to multiple regression.In simple regression, once we know the p mis and the exact mechanism, the FMI of b 1 is fully determined by the proportion of variance in Y explained by X, i.e., R 2 , for all MCAR and MAR mechanisms.In multiple regression, this is only true for MCAR.Under MAR conditions, q X 2 Y , the correlation between Y and X 2 , plays an additional role to the FMI of b 1 , in that when the strength of q X 2 Y is high, the FMIs under MAR mechanisms move slightly closer to the FMI under MCAR (as shown on Figure 6).
Selected results for the FMI of b 1 as a function of the missing data rate and for several values of the correlation between the predictors, the value of b 1 , and the value of b 2 are shown in Figure 7.Although complex at first glance, these results can be explained through the findings shown in Figure 6-namely, for any given missing data mechanism and missing rate, the FMI of b 1 is determined primarily by the proportion of variance in Y unexplained by X 2 that is explained by X 1 (i.e., the squared partial correlation q X 1 YÁX 2 ), with some additional influence of the correlation between Y and X 2 (q X 1 YÁX 2 ).To demonstrate this, we first note that under the special case of b 1 ¼ 0 (the first row of the figure), FMI is unaffected by q X 1 X 2 (the solid lines and dashed lines always overlap in this row).This is because, when b 1 ¼ 0, the variance explained by X 1 is always 0; therefore changes in q X 1 X 2 will have no effect on the FMI.Staying on the first row and moving from Panel (a) to Panel (c), we see the additional influence of q X 2 Y ; although q X 1 YÁX 2 stays constant at 0 on this row, the increase in b 2 is associated with an increase in q X 2 Y (in fact, when ), which pushes the FMI under different missing data mechanism closer together.
For b 1 , another special case occurs at b 1 ¼ b 2 ¼ 0, where the different missing data mechanisms lead to the most varied FMIs (the lines in the Figure 7 Panel a are the farthest apart). 12The main factor that makes the different mechanisms yield more similar FMI is the strength of b 1 , because it is also the main factor that determines the strength of q X 1 YÁX 2 : In Figure 7, the first, second, and third rows correspond to b 1 ¼ 0, 0.4, 0.7; we can see that the lines are pushed closer together as we move down.As b 1 increases from 0, q X 1 YÁX 2 rapidly approaches .5.This squeezing effect is especially notable when b 2 and q X 1 X 2 are lower, because when X 2 is less related to Y and X 1 , b 1 more directly represents the proportion of variance in Y unexplained by X 2 that is explained by X 1 .Moving from Panel (a) down the column to Panel (g), we see that when b 2 ¼ 0, b 1 has a drastic effect on removing the differences among mechanisms.
An increase in the strength of b 2 also brings the FMIs of b 1 among different mechanisms closer together, but its effect is weaker compared to a similar increase in b 1 ; the first, second, and third columns correspond to b 2 ¼ 0, 0.4, .7.We see that, in general, as the strength of b 2 grows, the FMIs under different conditions are pushed slightly closer together.However, there is one notable exception: As we move from Panel (g) to (h), we see that an increase in b 2 pushes the lines apart instead of closer together.This is because the squared partial correlation between X 1 and Y is already closer to 0.5 in Panel (g) due to b 1 being .7.When the strength b 2 increases under such a condition, the squared partial correlation grows beyond 0.5, leading to more distinct effects of missing data mechanisms, like when R 2 increases above 0.5 in simple regression.
We now shift our focus to the FMI of b 2 , with selected results shown in Figure 8.At first glance, it may seem strange that the estimation of b 2 is affected by missing data, because neither X 2 nor Y contain missing data; however, b 2 captures the conditional relationship between X 2 and Y, where the variable that is being controlled for, X 1 , has missing data.Thus, missing data on X 1 adds uncertainty to the value of b 2 because it adds uncertainty to the estimate of the relationship between X 1 and Y.When only X 2 and Y are related, i.e., b 1 ¼ 0 and q X 1 X 2 ¼ 0 (the solid lines with circles in the first row of the figure), the 12 Since q X1YÁX2 ¼ 0 and q X2Y ¼ 0 when b 1 ¼ b 2 ¼ 0, this also corresponds to what we can observe in Figure 6, in that the darkest points at the leftmost of the figure show the greatest vertical distances.FMI of b 2 is 0 regardless of the strength of b 2 (because the FMIs are all 0, the lines overlap at the bottom of each panel).In this case, missingness occurs on X 1 , but X 1 is unrelated to either Y or X 2 , thus the relationship between X 2 and Y is estimated with complete data.In comparison, when X 1 are Y related, i.e., b 2 6 ¼ 0 and q X 1 X 2 6 ¼ 0 (the solid lines with circles in the first column of the figure), some uncertainty is introduced into the estimation of b 2 , which grows with the strength of b 1 ; but that uncertainty is not affected by the specific missing data mechanism (the solid lines with circles overlap).
When X 1 and X 2 are correlated (the dashed lines with squares in the figure show q X 1 X 2 ¼ :4), the patterns of FMI of b 2 become drastically different.The estimation of b 2 is affected by the missing data mechanism through this correlation for all values of b 1 and b 2 .In a sense, the estimation of the partial regression coefficient b 2 is "contaminated" by the fact that X 2 is related to a variable with missingness conditioned by Y. Interestingly, MAR-NLC fares particularly well under this condition, experiencing very little efficiency loss, especially when b 2 is not high.The only cases where q X 1 X 2 is less important to the increase in FMI is when b 1 is high (the solid and dashed lines are closer together in Panels g and h).When X 1 is correlated with Y, the model is better able to borrow information from Y to deal with the missingness in X 1 , and the correlation between X 1 and X 2 matters less.Overall, under Scenario II, since the estimation of b 2 is indirectly affected due to the relationships among the variables, its efficiency loss is more complex than that of b 1 , and cannot simply be reduced down to the proportion of variance explained.
In substantive research, Scenario II can arise when data are only available for a subset of participants, restricted by a certain variable, but we are interested in studying what predicts that variable: Suppose we have access to all SAT scores of applicants to an undergraduate program, and these scores were used to decide acceptance; further suppose we are only able to ask those who have been accepted how much time and money they spent preparing for the SAT test.In this case, preparation effort predicts the SAT score, but the SAT score determines the missingness on the data available for preparation effort.This also applies to studies of clinical samples that are selected strictly based on a diagnostic criterion for a certain illness, and potential causes of that illness are subsequently studied in that sample.Additionally, it is possible to perform regression in cases where the causal direction is not clear.As a result, there may be scenarios where, although X causes both Y and the missingness in Y, a researcher instead conducts a regression analysis predicting X from Y, inadvertently producing Scenario II.Our investigation of FMI sheds light on this scenario's unusual consequences on the efficiency of parameter estimates.
Scenario III: When data are missing on X 1 as a function of X 2 In multiple regression, data can be missing on a predictor (let it be X 1 , without loss of generality) as a function of another predictor X 2 ; this leads to yet another distinct set of FMI behaviors, compared to the last two scenarios.As before, we first focus on the FMI of b 1 , shown in Figure 9.We immediately see that under Scenario III, the FMI of b 1 is now the same across the different missing data mechanisms; this causes the lines under different mechanisms to overlap on these two panels; as a result, each line represents the FMI of b 1 under all missing data mechanisms (solid line: q X 1 X 2 ¼ 0; dashed line: q X 1 X 2 ¼ :4).
Here, the partial regression coefficient b 1 partials out the conditioning variable X 2 , resulting in an FMI of b 1 that is unaffected by the MAR mechanism.This finding is similar to what we observed in the multiple regression under Scenario I, where the FMI of b 2 was equal among all missing data mechanisms because the conditioning variable X 1 was partialled out in the estimation of b 2 .The main difference from Scenario I is that, although FMI is unaffected by the missing data mechanism here, it is affected by the strengths of the regression coefficients, such that the FMI is equal to p mis when b 1 is 0, but lowers as b 1 increases-the invariance of FMI to the strengths of slopes only applies to Scenario I.
The impact of missingness on b 2 in this scenario is not as strongly driven by the relationship between X 1 and X 2 as Scenario II (as shown in Figure 10), because X 2 now contains the full information on how X 1 is missing, regardless of the relationship between them.Instead, the efficiency loss in b 2 is primarily due to the need to partial out the influence of b 1 ; the main driver of this impact is the strength of b 1 .On the first row of the figure, we see that when b 1 ¼ 0, the FMI of b 2 tends to be low.When b 1 ¼ 0 and q X 1 X 2 ¼ 0, we again see that FMI is always 0, like in Scenario II (solid line on the first row): b 2 is estimated with complete data when the data containing missingness is related to neither X 2 nor the criterion Y.In contrast to Scenario II, because X 2 is now the conditioning variable, the FMI of b 2 no longer drastically increases along with the magnitude of q X 1 X 2 : Instead, the FMI of b 2 increases and becomes more varied among missing data mechanisms as b 1 grows.From this scenario, we see that the patterns of FMI behaviors become less complex outside of the less natural setup of Scenario III, but it is not always as simple as the special case of Scenario I.

Efficiency loss on the intercept estimate
The intercept is generally of much less research interest compared to the slope in substantive studies.Here we provide some select results on the intercept in order to demonstrate how much the impact of missing data can differ from one parameter to another (slopes, intercept, means, covariances, etc.), and that patterns observed in one parameter do not necessarily generalize to another.To do so, we will focus on the intercept FMI in the simple regression of Scenario I, shown in Figure 11.In this figure, the lines for MAR-NLC, MAR-NLT, and MCAR overlap completely; that is, although these three mechanisms varied greatly in terms of the information they contribute to the estimation of the regression slope, they lead to the same amount of information loss in the intercept under the same missing rate.Under MAR-L, however, the FMI is much higher than under other mechanisms.
A main reason for the difference in the impact of missing data on regression slope and intercept observed in this study is that, in our scenarios, X always has a population mean of 0 (the value of the intercept is manipulated by changing the population mean of Y).Since the center of the regression line is on the Y-axis, the intercept is essentially a mean estimate on Y, unrelated to the slope (b 0 ¼ Y À b 1 X 1 , which means when X 1 ¼ 0, b 0 ¼ Y ).Therefore, with centered predictors, the deletion of data that affect the slope estimate does not necessarily affect the efficiency of the intercept estimate; instead, the efficiency of estimation primarily depends on the estimation of the mean of Y.In a preliminary investigation not reported here, we found that when X was not centered, MAR-L could become more efficient than MAR-NLC for certain values of the mean of X.Since it is common practice to center predictors in regression, this example is relevant to substantive research as it stands.We do not further investigate the effect of changing the mean of X in this article; the ability to manipulate the mean of the predictors will be available to interested readers in a future version of the Shiny app.
The MAR-NLC and MAR-NLT mechanisms in our study follow conditioning relationships that are symmetrical around the mean of X.These mechanisms lead to missing data that are, on average, symmetrical around the mean of Y. Consequently, the observed mean of Y is not biased by the missing data.In contrast, MAR-L heavily affects the observed mean due to the data loss occurring only on one tail.This leads to higher efficiency loss on the intercept in MAR-L compared to MAR-NLC and MAR-NLT.This finding also extends to multiple regression, regardless of which variable is the conditioning variable, Y, X 1 , or X 2 , when the predictors are centered and the MAR mechanism is symmetrical around the predictors, the variation of MAR does not affect efficiency loss.
Lastly, by comparing Panel (a) and Panel (b) of Figure 11, we can see that MAR-L leads to more similar FMI to MCAR under the probabilistic condition.Although the ordering of the MAR variations does not hold for the intercept estimates in this setup up, we still observe that the weaker MAR performs more similarly to MCAR compared to the deterministic MAR.

General discussion
When data are MAR, the efficiency loss for parameter estimates can differ greatly due to the specific MAR variation, even when the missing rate remains the same.This dependency can lead to unexpected consequences for the precision of estimation and the power of statistical inferences.In current practice, only the rate of missing data is typically reported as a coarse measure of the impact of missing data on the loss of precision and power.In recent years, psychologists have become increasingly concerned about the dangers of conducting research with low statistical power (Ioannidis, 2005) .In this article, we demonstrated No Yes Note.Ã b 1 's sensitivity to the missing data mechanism diminishes as the proportion of variance in Y, controlling for other variables, explained by X 1 is closed to 50%, i.e., R 2 ¼ :5 in simple regression or q 2 X1YÁX2 ¼ :5 in multiple regression.
how methodologists can use pseudo-population FMIs to learn about patterns in efficiency loss as a result of different ways of generating missing data.The study of more counter-intuitive and less predictable effects of missing data on efficiency loss can help make future research in missing data techniques more informative.In this article, we reported on the results of the first such study.Specifically, we studied the behavior of FMI in simple regression and in two-predictor multiple regression under a wide variety of scenarios; a summary of key results can be found in Table 4.We found that the amount of efficiency loss due to MAR is determined by the complex interplay among a multitude of factors, such as the missing rates, where the missingness occurs, which variables condition the missingness, the conditioning relationship, and the true parameter values.
Overall, the complexity of our findings highlights the importance of computing FMIs for the model parameters of interest when assessing the impact of MAR data.Existing simulation studies on missing data techniques (including those conducted by us) typically do not vary all of these factors, and therefore the generalizability of the findings could be limited.For example, a researcher may conclude on the basis of a simulation study that a new estimation method with missing data yields good performance with MCAR and MAR data, but if the generated MAR data lead to low FMI, such as the MAR-NLC condition studied here, the results probably do not generalize to more sinister kinds of MAR.We hope our research prompts methodologists to move beyond simple categorization of ignorable missing mechanisms as MCAR or MAR in simulation studies.
Using FMI, we were able to demonstrate various important properties of missing data.Some of these are more well-known, for example, that weak MAR mechanisms are more similar to MCAR than are strong MAR mechanisms.In simulation studies involving missing data, it is helpful to include a strong MAR mechanism to contrast the largest potential differences, while weaker MAR can serve to establish what more realistic differences may be-some missing data researchers already follow this practice (e.g., Chen et al., 2020;Mazza et al., 2015;Sullivan et al., 2018).We also demonstrated that MAR-NLC can yield better performance than MCAR, a fact that has long been implicitly leveraged by the sampling approach called the extreme groups design (Feldt, 1961;Fisher et al., 2020;Preacher et al., 2005), with its connection to missing data only more recently discussed in psychology (Chen & Fouladi, 2022).We were also able to make some less commonly known observations by studying FMI.For example, while the FMI of the regression slope is heavily affected by whether the missing data are highly informative of the multivariate relationships, the FMI of the regression intercept when the predictors were centered was mainly affected by the symmetry of missing data.Although intercepts are generally of less theoretical interest in regression, they can often be focal parameters of interest in multilevel modeling.The asymmetry in the missing data distribution may therefore pose a greater threat to the power of the analysis than the deletion of data on the tail ends of the distribution in those contexts.
In general, efficiency loss due to the four missing data mechanisms almost always decreased in the same order for the regression coefficient: MAR-NLT, MAR-L, MCAR, MAR-NLC.The only exceptions were some occasions where efficiency loss was insensitive to the exact missing data mechanism.However, very specific circumstances were required for the mechanism not to matter-understanding these requirements can be enlightening.Specifically, in two-predictor multiple regression, we found that when the estimation of a regression coefficient partials out the variable conditioning the missingness, the specifics of the missing data mechanism do not affect the FMI beyond the missing rate.This can occur directly, such as when estimating b 2 , partialling out X 1 , while X 1 is the conditioning variable; or indirectly, such as when estimating b 1 , partialling out X 2 , while X 1 is the conditioning variable, but X 1 and X 2 are highly correlated.In an earlier study of FMI (Chen & Savalei, 2021), we found that, in the context of a two-factor model, the FMI of the factor correlation was insensitive to the missing data mechanism when missingness on the indicators was exclusively conditioned by indicators within the same factor.That is, conditioning relationships involving indicators of the same factor did not affect the estimation of the between-factor correlation.Here, we found that even when the parameter is associated with a variable with missingness (e.g., b 1 and X 1 containing missingness), the effect of the conditioning relationship can be removed from the estimation, if the estimated coefficient itself captures a relationship conditioning on the same variable.Future research should investigate whether this finding generalizes, for example, to a structural equation model where latent variables predict each other.
It is worth emphasizing that the impact of missing data is heavily affected by parameterization, and that findings under one particular model may not hold under another model, even when the two models are conceptually similar.For example, one may consider adding a second predictor to the regression the same as adding an auxiliary variable in missing data analysis (e.g., see Collins et al., 2001), and interpret our findings as directly applicable to the latter.However, here the second predictor is an integral part of the model and is involved in the definition of the parameter of interest (e.g., a conditional regression coefficient of X 1 on Y controlling for X 2 ), whereas traditional conceptualization of auxiliary variables involves adding them to a model in a way that does not distort the meaning of any parameters (e.g., the" saturated correlates" model of Graham, 2003).It is therefore difficult to determine which of our results would generalize to auxiliary variables due to the sensitivity to reparameterization.For example, we found that a high correlation between the two predictors could in fact lead to higher efficiency loss in MAR-(see Figure 3); this finding may not generalize to the inclusion of an auxiliary variable with a MAR-NLC mechanism.

Implications for substantive researchers
For substantive researchers, we recommend reporting sample FMI estimates for key parameters as a diagnostic measure of estimated efficiency loss when performing missing data analysis.With the automation of the computation of FMI in the lavaan package (v 0.6-9) in R, this measure is readily available to substantive researchers and to methodologists.This additional diagnostic can inform researchers whether the original power planning of the study may have been jeopardized by missing data in the final sample.However, a relatively large sample size may be needed.An earlier study investigated the finite sample properties of FMI estimates obtained under FIML and recommended a sample size of at least N ¼ 200 and preferably N ¼ 500 (Chen & Savalei, 2021) to get accurate FMI estimates in practice; the paper also includes a tutorial on how to obtain sample FMIs using lavaan (Rosseel, 2012).
While we recommend using FMI and WIF to assess the impact of missing data, it is important to caution against any attempt to "undo" the effect of missingness in real data by applying these measures.For example, one may be tempted to use WIF to "correct" the CI of a parameter computed from real data to what the CI would have been under complete data.This would be a misuse of FMI.In the presence of missingness, the CI is widened to reflect the fact that the precision of the point estimate has been lowered due to missing data.Shrinking the CI using WIF would not change this loss of precision.Instead, this "corrected" CI would simply put undue confidence in an imprecise estimate.Similarly, a high FMI does not suggest that a nonsignificant result would have been significant had there been no missing data.We recommend researchers to interpret a high FMI as follows: The precision of the point estimate has been heavily affected by the presence of missing data, and should the researchers wish to conduct a similar study, they should either plan for a larger sample size in order to overcome the efficiency loss in case similar missingness occurs, or take steps to mitigate data loss (e.g., by using planned missing data designs to lower the burden on participants and thereby reduce attrition).
Beyond sample FMI estimates, substantive researchers can also, with the help of methodologists, adopt pseudo-population FMI to assist in power planning and study design.Because pseudo-population FMI can be estimated at a low computational cost, it can be used to quickly compare efficiency loss under a plausible range of scenarios and missing data mechanisms for a particular context.In particular, pseudo-population FMI can be helpful in searching for the optimal design in planned missing data studies (Brandmaier et al., 2020;Graham et al., 2006;Wu et al., 2016).

Future directions
Since our study was the first exploratory investigation of the impact of MAR variations on FMIs and efficiency, we focused on two regression models and four cutoff-based missing data mechanisms.Future research can expand on our initial findings in several ways.First, we used the cutoff approach to produce our MAR mechanisms because of its versatility: it is straightforward to define such a mechanism under a wide range of missing rates and selection strengths for both linear and nonlinear conditioning relationships.In future investigations, other conditioning functions should be explored to study new MAR variations, such as MAR with logistic conditioning functions (Anderson, 2021;Mazza et al., 2015;Yucel et al., 2011).Second, the effect of the number of missing data patterns can also be studied by creating MAR variations with multiple variables containing missingness, and varying whether the data are missing jointly or independently for each variable.Third, larger models, where the conditioning relationships are defined by a combination of variables, and with missingness occurring on multiple variables, can be studied.These larger models include regression with interaction terms, factor analysis models, more complex structural equation models, and multi-level models.Lastly, we have examined FMI defined for the FIML method under multivariate normality; extensions of FMI to use standard errors robust to nonnormality (Satorra & Bentler, 1994) should be studied.

Open Scholarship
This article has earned the Center for Open Science badges for Open Materials through Open Practices Disclosure.The data and materials are openly accessible at https://osf.io/2srvz.To obtain the author's disclosure form, please contact the Editor.

Appendix A
In this appendix, we provide a detailed description of the simulation example.The model used in the simulation was a simple regression model, and where X and Y followed the standard normal distribution.One thousand complete datasets of size N ¼ 75 were generated from this population.Incomplete data were created from each complete dataset in four different ways, corresponding to the missing data mechanism conditions: MCAR, MAR-L, MAR-NLT, and MAR-NLC.In each condition, X was fully observed, while 50% of the values were missing on Y; the missing rate on Y, represented by p mis ¼ :50, is therefore constant across the conditions.Similarly, the overall missing rate in the dataset containing Y and X is 25%, which is the same across all conditions.In the MCAR condition, 50% of the values of Y were randomly selected to be missing.In every MAR condition, X was the conditioning variable for the missingness on Y.The MAR mechanisms were produced using the deterministic cutoff approach, by applying step functions depicted in Figure 1.
For the MAR-L condition, each Y value was missing with a conditional probability of p 1 ¼ 1 if X > 0, and with a conditional probability of p 2 ¼ 0 if X 0: This created a deterministic MAR mechanism, where data were missing on Y for those simulated participants whose X values were in the upper half of the distribution of X (see Figure 1).For the MAR-NLT condition, Y was missing with probability p 1 ¼ 1 when jXj > :674, and with probability p 2 ¼ 0 otherwise. 13This mechanism led to missing data on Y for those participants whose values on X were in both tails of the distribution of X.For the MAR-NLC condition, Y was missing with p 1 ¼ 1 probability when jXj < :674, and with p 2 ¼ 0 probability otherwise.In contrast to MAR-NLT, this mechanism led to missing data on Y for those participants whose X values were at the center of the distribution of X.Because X and Y were only correlated at b 1 ¼ :4, missing values on Y occurred in the entire distribution of Y under all four mechanisms, but the exact distribution is dependent on the MAR variation, e.g., they were more likely to occur in the tails of the distribution of Y under MAR-NLT than under MAR-NLC.
We focus on the estimates of the slope in the analysis; results pertaining to the intercept can be found along with the simulation code on OSF. 14 Since comparisons of efficiency are difficult to interpret in the presence of bias, we first checked the bias in the estimates across conditions by computing the average estimated slope in each condition; we expected these estimates to be largely unbiased, since they were obtained via FIML under MAR.To compare efficiency, we computed the empirical standard error (ESE), which is the empirical standard deviation of the slope estimates in each condition; a larger ESE indicates a less efficient parameter estimate.To make the efficiency comparisons more interpretable, we also computed relative efficiency (RE) using complete data as a baseline.For each missing data condition, its RE is obtained by taking its ESE and dividing by the ESE obtained under complete data.The complete data condition will therefore have an RE of 1, and higher RE indicates worse efficiency relative to the complete data condition.In order to obtain FMI estimates using FIML, regression analyses were performed as path model analyses in lavaan (Rosseel, 2012) 0.6-9, in R (R Core Team, 2019) version 4.1.0.To ensure the FMI estimates computed from samples of N ¼ 75 are reasonably unbiased, we also simulated the population FMIs using a single-replication simulation with a sample size of N ¼ 1,000,000.
In all conditions, the estimates of the regression coefficient were relatively unbiased over the 1,000 simulated samples as expected, with means of b ¼0.402, 0.394, 0.388, 0.415, 0.400, for complete data, MCAR, MAR-L, MAR-NLT, and MAR-NLC conditions respectively, compared to the population value of .4.However, the efficiency of these estimates differed across the conditions, as shown in Table 1 of the manuscript.Despite the same amount of missing data on X (50%) in all conditions, the MAR-NLT condition produces the most inefficient estimates, whereas MAR-NLC produced the most efficient slope estimates, performing even better than MCAR.Note that only under MCAR is the FMI is essentially the same as the rate of missing data.In other conditions, FMI reflects the proportion of missing information, which is highest for MAR-NLT and lowest for MAR-NLC.The discrepancy in efficiency or information loss across the missing data mechanisms translates into large differences in statistical power when testing the null hypothesis H 0 : b 1 ¼ 0 using a ¼ :05: Empirical power, which was estimated by the percentage of replications that yielded significance, was very high for MAR-NLC, at 94.8%, and abysmally low for MAR-NLT, at 21.0%.
In order to examine if these differences in efficiency loss in the small sample is reflected by the pseudo-population FMI values, we obtained these values from a single sample with N ¼ 1,000,000.The pseudo-population FMIs were d pop ¼ 0.499, 0.819, 0.928, 0.072 for MCAR, MAR-L, MAR-NLT, and MAR-NLC.By converting these FMI values, we obtain pseudo-population WIFs of 1.413, 2.350, 3.727, and 1.038 under the four mechanisms respectively.These WIFs closely tracked the ESEs across the conditions, which were 1. 497, 2.551, 4.125, and 1.051, showing that we can predict missing data behaviors at the finite sample level using the pseudo-population FMI.
4 missing data mechanisms: MCAR, MAR-L, MAR-NLT, and MAR-NLC.The MAR mechanisms use step functions as their conditioning functions (see Figure 1); logistic functions are not included because there are no straightforward methods to use them to generate nonlinear MAR mechanisms while controlling for the overall missing rate.In all conditions, the variable with missingness takes on missing rates of p mis ¼ 0, :10, :20, :40, and .60.We will refer to the conditional probability that V m is missing given that the relevant condition, involving values of V c (see below), has been met, as p 1 .We will refer to the conditional probability that V m is missing given that the relevant condition has not been met as pi 2 .For simplicity (and to make the mathematics of controlling the overall rate of missingness easier), we set p 2 ¼ 1 À p 1 in all simulations, but this is not a required constraint on these MAR variations more generally. 16

MAR strength
The strength of the MAR mechanisms can be set within the application to be deterministic, or probabilistic with two levels (referred to as Probabilistic and Probabilistic II).For deterministic MAR, which is the default setting, p 1 ¼ 1 and p 2 ¼ 0: In other words, the value of V c fully determines whether the value of V m is missing or observed.For Probabilistic MAR, p 1 ¼ :8 and p 2 ¼ :2; for Probabilistic II MAR, p 1 ¼ :9 and p 2 ¼ :1: The cutoffs for the conditioning rules are determined by the app to ensure the overall rate of missing data on V m remains set to p mis (see below).We note that under Probabilistic MAR, all MAR mechanisms become MCAR when the overall p mis ¼ :20: The cutoffs for the step functions are completely determined for any given set of p mis , p 1 , and p 2 .For example, for an MAR-L mechanism where p mis ¼ :5, p 1 ¼ 1, and p 2 ¼ 0, the cutoff must be 0; that is, V m is missing with probability p 1 ¼ 1 if V c > 0 and it is missing with probability p 2 ¼ 0 otherwise.We will refer to the percentile that corresponds to the cutoff value as p c ; for the above example, p c ¼ 50%: These values can be derived algebraically.For example, for MAR-L, the overall missing rate and the conditional missing rates are related as follows: Therefore, to achieve the overall missing rate of p mis , values in V m which correspond to V c above the p c th percentile are each assigned a p 1 probability to be missing, whereas values in V m which correspond to V c within the lower p c th percentile are each assigned a p 2 probability to be missing.For simplicity, we set MAR-NLT and MAR-NLC to be symmetrical.For MAR-NLT, each data point on V m has a missing probability of p 1 where V c falls below the ð:5 À :5p c Þ th percentile or above the ð:5 þ :5p c Þ th percentile, and a missing probability of p 2 otherwise.For MAR-NLC, each data point on V m has a missing probability of p 1 where V c falls between the ð:5p c Þ th and the ð1 À :5p c Þ th percentiles, and a missing probability of p 2 otherwise.

Pseudo-population FMI estimation
The FMIs are computed using the lavaan package (v 0.6-9) in R under a single replication with a large sample size.In order to obtain FMI estimates using FIML, regression analyses were performed as path model analyses in lavaan (Rosseel, 2012) 0.6-9 with the option missing¼'ml', in R (R Core Team, 2019) version 4.1.0.In order to handle conditions with missing data on the predictors, we treat predictors as random in lavaan using the option fixed.x¼ FALSE.This deviates slightly from traditional regression, which treats predictors as fixed.The default in lavaan sets fixed.x to TRUE, which leads to listwise deletion being automatically performed.By default, our application conducts a single replication with the sample size of N ¼ 5,000, which allows for a quick exploration of various conditions, but does not produce very accurate outputs.The option N ¼ 25,000 provides reasonably precise estimates.An additional in-between option for N ¼ 10,000 is also available.Due to concerns about the server running time, larger N options are not available in the online version of the application.However, the open source version of the application can be downloaded from the OSF and run locally, which provides the option of N ¼ 1,000,000; programming savvy researchers can also set N to any arbitrary number via the open source code.The FMIs reported in the paper are computed with N ¼ 1,000,000.

Figure 1 .
Figure 1.A few variations of MAR.Note.The left panels show the conditioning functions of deterministic MAR with p mis ¼ :5: For simplicity, the nonlinear MARs are set to be symmetrical around 0. The right panels show how these missing data mechanisms affect the bivariate distribution of X and Y when q XY ¼ :7; the marginals of the right panel figures show histograms of the observed values in Y.
Figure Panel Regression model b 1 b 2 cov X1, X2 V mis V con Scen.MAR strength

Figure 2 .
Figure 2. Scenario I. Pseudo-population FMI of b 1 in a simple regression when X 1 conditions missingness on Y. Note. Figure shows how probabilistic MAR performs more similarly to MCAR compared to deterministic MAR.b 1 ¼ :4 and b 0 ¼ 0 : changing these values does not affect FMI.Y contains missingness conditioned by X 1 .MCAR: missing completely at random; MAR: missing at random; MAR-L: linear MAR; MAR-NLC: nonlinear MAR missing at the center; MAR-NLT: nonlinear MAR at the tail; Deterministic MAR: p 1 ¼ 1 and p 2 ¼ 0; Probabilistic MAR: p 1 ¼ .80,p 2 ¼ .20.Note that probabilistic MAR is the same as MCAR when p mis ¼ .20.

Figure 3 .
Figure 3. Scenario I. Pseudo-population FMIs of b 1 in a two-predictor multiple regression when X 1 conditions missingness on Y. Note. Figure shows how MAR performs more similarly to MCAR as q X 1 X 2 increases and as MAR becomes more probabilistic under Scenario I.The strength of bs do not affect FMIs.q X1X2 : the covariance between the two predictors; MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at the tails/center; Deterministic MAR: p 1 ¼ 1 and p 2 ¼ 0; Probabilistic MAR: p 1 ¼ .80 and p 2 ¼ .20.

Figure 4 .
Figure 4. Scenario II.Pseudo-population FMI of b 1 in a simple regression when Y conditions missingness on X 1 .Note. Figure shows how FMI changes with b 1 in Scenario II.MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at the tails/center.The grey dotted line shows when FMI¼p mis ; any point above this line shows FMI>p mis , and any point below shows FMI<p mis .Only the deterministic MAR conditions are shown; probabilistic MAR conditions display similar patterns.Missing rate on X 1 is set to p mis ¼ :4: Panel (e) shows a reversal of the trend from b 1 ¼ 0 to b 1 ¼ 0:7, as it grows to b 1 ¼ :75, :8, :85, and so on.It is not a special property of b 1 ¼ :9: See Figure 5 for more on this.

Figure 5 .
Figure 5. Pseudo-population FMI of b 1 as a function of R 2 in simple regression under Scenario II, where Y conditions missingness on X 1 .Note. Figure shows how when R 2 ¼ :5 in a simple regression under Scenario II, the slope FMI becomes equal under all missing data mechanisms for a given missing rate.Missing rate on X 1 is set to p mis ¼ :4: MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at tails/center.Only the deterministic MAR conditions are shown; probabilistic MAR conditions display similar patterns.

Figure 6 .
Figure6.The relationship between the FMI of b 1 and q X1YÁX2 in multiple regression with two predictors under Scenario II, Y conditions missingness on X 1 .Note.Missing rate on X 1 is set to .4. q X2Y : the correlation between X 2 and Y.The line for each mechanism was produced by smoothing via the rlm function in R with the function y $ poly(x,6).The smoothing, chosen visually, is used to make the connection to Figure5easier to see by showing the overall trend in each mechanism.To avoid non-positive definite matrices, we used q X 1 Y , q X 2 Y , and q X 1 X 2 values between 0 and .65,which lead to fewer results and less accurate smoothing above q 2 X1YÁX2 ¼ :5: MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at the tails/center.Only deterministic MAR conditions are shown (p 1 ¼ 1 and p 2 ¼ 0).The mechanism of the simulated data points is denoted by their shapes, and the mechanism of the smoothed lines is denoted by the line type.

Figure 7 .
Figure 7. Scenario II.Pseudo-population FMI of b 1 in two-predictor multiple regression when the criterion (Y) conditions missingness on a predictor (X 1 ).Note. Figure shows how d b 1 under different missing mechanisms changes as a function of b 1 and b 2 under Scenario II, where X 1 contains missingness conditioned by Y.The condition where b 1 ¼ :70 and b 2 ¼ :70 is omitted as it leads to a non-positive definite covariance matrix under q X 1 X 2 ¼ :4: MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at the tails/center.Only deterministic MAR conditions are shown (p 1 ¼ 1 and p 2 ¼ 0).The grey dotted line shows when the FMI is equal to the missing rate.

Figure 8 .
Figure 8. Scenario II.Pseudo-population FMI of b 2 in two-predictor multiple regression when the criterion (Y) conditions missingness on a predictor (X 1 ).Note. Figure shows how d b 2 under different missing mechanisms changes as a function of b 1 and b 2 under Scenario II, where X 1 contains missingness conditioned by Y.The condition where b 1 ¼ :70 and b 2 ¼ :70 is omitted as it leads to a non-positive definite covariance matrix under q X 1 X 2 ¼ :4: MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at the tails/center.Only deterministic MAR conditions are shown (p 1 ¼ 1 and p 2 ¼ 0).The grey dotted line shows when the FMI is equal to the missing rate.

Figure 9 .
Figure 9. Scenario III.Pseudo-population FMI of b 1 in two-predictor multiple regression when the criterion (X 2 ) conditions missingness on a predictor (X 1 ).Note. Figure shows how the d b 1 under different missing mechanisms changes as a function of b 1 and b 2 under Scenario III, where X 1 contains missingness conditioned by X 2 .The condition where b 1 ¼ :70 and b 2 ¼ :70 is omitted as it leads to a non-positive definite covariance matrix under q X 1 X 2 ¼ :4: MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at the tails/center.Only deterministic MAR conditions are shown (p 1 ¼ 1 and p 2 ¼ 0).Under this scenario, the FMI of b 1 is not affected by the missing data mechanism, hence most of the lines overlap each other.The grey dotted line shows when the FMI is equal to the missing rate.

Figure 10 .
Figure 10.Scenario III.Pseudo-population FMI of b 2 in two-predictor multiple regression when the criterion (X 2 ) conditions missingness on a predictor (X 1 ).Note. Figure shows how the d b 2 under different missing mechanisms changes as a function of b 1 and b 2 under Scenario II, where X 1 contains missingness conditioned by Y.The condition where b 1 ¼ :70 and b 2 ¼ :70 is omitted as it leads to a non-positive definite covariance matrix under q X 1 X 2 ¼ :4: MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at the tails/center.Only deterministic MAR conditions are shown (p 1 ¼ 1 and p 2 ¼ 0).The grey dotted line shows when the FMI is equal to the missing rate.

Figure 11 .
Figure 11.Pseudo-population FMIs for the intercept of a simple regression when the predictor (X 1 Þ conditions missingness on the criterion (Y).Note.The regression parameters are b 1 ¼ :4 and b 0 ¼ 0, although changing these values does not affect FMIs.Y contains missingness conditioned by X 1 .MAR-L: linear MAR; MAR-NLT/NLC: nonlinear MAR missing at the tails/center.For deterministic MAR, p 1 ¼ 1 and p 2 ¼ 0. For probabilistic MAR, p 1 ¼ .80 and p 2 ¼ .20.Note that probabilistic MAR is the same as MCAR when p mis ¼ .20.The lines for MCAR, MAR-NLT, and MAR-NLC overlap each other, because their FMIs are the same.

Table 1 .
Pseudo-population FMI predicts relative efficiency in finite samples.Linear MAR.MAR-NLT: Nonlinear MAR missing at tails.MAR-NLC: Nonlinear MAR missing at center.All MAR mechanisms were deterministic.p mis : Overall missing rate of Y. Power b 1 : The power to detect b 1 is nonzero, estimated from the percentage of significant results obtained over 1,000 replications with a sample size of N ¼ 75.ESE: The empirical standard error of the estimate of b 1 over 1,000 replications with a sample size of N ¼ 75; lower ESE indicates better efficiency.RE b 1 : Relative efficiency compared to complete data when estimating b 1 , computed from the ESE; lower RE indicates better efficiency.d b 1 : The pseudo-population FMI of the regression coefficient b 1 computed under 1 replication and a sample size of 1 million.WIF b 1 : Width inflation factor that corresponds to the pseudo-population FMI d b 1 : Comparing the bolded columns show how pseudo-population WIF predicts RE in finite samples.

Table 2 .
Summary of options in the interactive Shiny app.
This is a default selection in the web application.ÃÃ Available to multiple regression only.

Table 3 .
How to generate the figures using the Shiny app.
Any valid value for this parameter would result in the same FMIs.V mis : variable with missing values; V con : conditioning variable; Scen.: scenario.

Table 4 .
Summary of findings for regression slopes that differ among the three scenarios.