Simulation-Based Performance Evaluation of Missing Data Handling in Network Analysis

Abstract Network analysis has gained popularity as an approach to investigate psychological constructs. However, there are currently no guidelines for applied researchers when encountering missing values. In this simulation study, we compared the performance of a two-step EM algorithm with separated steps for missing handling and regularization, a combined direct EM algorithm, and pairwise deletion. We investigated conditions with varying network sizes, numbers of observations, missing data mechanisms, and percentages of missing values. These approaches are evaluated with regard to recovering population networks in terms of loss in the precision matrix, edge set identification and network statistics. The simulation showed adequate performance only in conditions with large samples (n≥500) or small networks (p = 10). Comparing the missing data approaches, the direct EM appears to be more sensitive and superior in nearly all chosen conditions. The two-step EM yields better results when the ratio of n/p is very large – being less sensitive but more specific. Pairwise deletion failed to converge across numerous conditions and yielded inferior results overall. Overall, direct EM is recommended in most cases, as it is able to mitigate the impact of missing data quite well, while modifications to two-step EM could improve its performance.


Introduction
Network analysis is rapidly gaining importance in psychological research.In its early applications as social network analysis (Wasserman & Faust, 1994), it described relationships within the workforce of an organization (e.g.Cross et al., 2002) or the collaboration between authors in science (Bibi et al., 2018).Newer literature also uses networks to describe associations between variables of one ore more psychological constructs in what has been coined psychological networks.In these networks, nodes represent symptoms or questionnaire items which are connected by edges, depicting their relationship.This can be used either to investigate intra-individual variability in longitudinal network models or in cross-sectional settings, where intra-and inter-individual variability are not distinguished.
Applications are plentiful throughout different areas such as clinical psychology (Fisher et al., 2017;Murphy et al., 2018), personality psychology (Costantini et al., 2015;Marcus et al., 2018) or health psychology (Hevey, 2018;van Zyl, 2021).Applying these relatively new analysis methods to psychological research is accompanied by a number of methodological challenges prevalent in psychological studies.The focus of this study lies on the problem of missing values in cross-sectional network analysis.
Investigations of the consequences of missing data in psychological networks as well as guidelines are rare, with this issue also being raised by researchers (Santos et al., 2018).Previous literature focuses mostly on social network analyses, in which entire edges or nodes are not observed (e.g.Kossinets, 2006;Robins et al., 2004).However, in psychological network analysis the nodes are the variables under investigation and missingness pertains to specific information within a node, meaning the values of a variable were not observed for all individuals.As is described below, sample size is known to affect the density of regularized networks because of the amount of available information.A similar impact can be expected from missing values, because missing responses on nodes in psychological networks reduce the amount of observations on that specific node, affecting the precision of the estimation of edges related to that node in the process.Mansueto et al. (2022) evaluated the performance of missing handling mechanisms in networks retrieved from longitudinal data with observations missing completely at random (MCAR).They found two different missing data handling approaches combined with model estimation methods (penalized maximum likelihood estimation with Kalman filter imputation and full information maximum likelihood estimation with stepwise model search) to work quite well in the longitudinal settings.To our knowledge, there is only one paper (St€ adler & B€ uhlmann, 2012) evaluating a missing data handling process related to cross-sectional networks.Their results indicate, that their approach-described in more detail below-is able to handle MCAR and MAR data.In the present study, we evaluate three different approaches to mitigate the effect of missing values in psychological networks: a two-step EM algorithm, which separates missing data handling and regularization (Epskamp et al., 2017), a direct EM algorithm, which combines both steps (Augugliaro et al., 2021), and pairwise deletion.In the next section we will describe these approaches and their implementation in current software after a brief introduction into network analysis for psychological data.

Network estimation
Data retrieved from cross-sectional psychological research is typically modeled via a pairwise Markov Random Field (PMRF).PMRF as a class contains a wide range of different models, all defined by pairs of nodes with undirected edges representing their relationship.An undirected edge represents a connection between the corresponding nodes without assuming causality.There are different ways to estimate PMRF models for varying scales of measurement.Continuous data is modeled with Gaussian graphical models (GGM; Costantini et al., 2015;Lauritzen, 1996), while the Ising model is used for binary data (van Borkulo et al., 2014), and different estimation methods are recommended for ordinal data (see Isvoranu & Epskamp, 2021).In this study, the focus lies on continuous data.
In GGM, the p different variables are assumed to follow multivariate normal distributions with means of zero (variables are centered).
Y ¼ ðy 1 , :::, y p Þ � N ð0, RÞ R denotes the variance-covariance matrix of the variables y 1 through y p .The values of the edges are called weights.Rather than using the entries of R, network analysis uses the partial correlation coefficients to represent the relationship between two nodes (Epskamp et al., 2018), to ensure standardized parameters which depict relations when controlling for all other variables in the data set.The range of these parameters is between −1 and 1 with a value of 0 indicating conditional independence.Partial correlations are obtained from the precision matrix (also known as concentration matrix), which is defined as the inverse of R.
(2) Lauritzen (1996) showed that partial correlations can be calculated by standardizing the corresponding element (and reversing the sign) of the precision matrix.
In practice, estimation of partial correlations is achieved by plugging in the sample variance-covariance matrix S as an estimator for R in Equation ( 2) and subsequently Equation (3).Although the demonstrated way is easily done and provides interpretable parameters, it is not optimal for two reasons.First, calculating the inverse of S is only possible if S is positive definite.Hartlap et al. (2007) showed analytically that this is not the case when n < p, that is, for cases with more variables in the network than subjects n, because the covariance matrix is singular.The authors also demonstrated a severe bias in the estimation of the precision matrix when n is close to p. Thus, network analysis would only be possible in low-dimensional settings (n � p).Even though in psychological application n is often much larger than p, mitigating the previously described issue, psychological assessments bring along sampling variation.Sampling variation makes exact zero-coefficients unlikely even if the true parameters are 0.These spurious edges (Costantini et al., 2015) introduce random noise into the network structure.One of the options to address this problem is graphical lasso regularization (glasso; Friedman et al., 2008).
Using the sample variance-covariance matrix S, the precision matrix H is estimated by maximizing its penalized log-likelihood function.log detðHÞ − trðSHÞ − kjjHjj 1 (4) The last term includes the penalty parameter k multiplied by the L 1 norm of H, which represents the sum of all absolute values within the matrix.Thus, in the process of maximization the values in the implied precision matrix shrink to 0 and following Equation (3) partial correlations become 0 as well.
Using the glasso, estimation is typically performed for 100 values of k, logarithmically spaced between a maximum and minimum (Epskamp, 2016).This results in as many estimations Ĥ for the precision matrix.While there are different possibilities to determine the optimal degree of penalization, the Extended Bayesian Information Criteria (EBIC; Chen & Chen, 2008;Foygel & Drton, 2010) is most often used in regularized psychological networks to choose a Ĥ: E is the edge set of the graph, so jEj denotes the number of non-zero edges.l n is the log-likelihood of the model.p represents the number of variables in the model (i.e. the number of nodes in the graph), while n is number of observations.Because of the inclusion of n, this procedure will tend to identify more nonzero edges with increases in sample size and thus more available information.The inclusion of p counterbalances this, resulting in the available information per estimated parameter becoming a central relation in determining the sparsity of the selected network.The EBIC also includes the hyperparameter c, which additionally penalizes complex models.Even though setting the c hyperparameter is a case-by-case decision, some guidelines can be derived theoretically or from empirical findings.A value of c ¼ 0 recovers the classical BIC (Schwarz, 1978) without an extra penalty on more complex models.It is worth mentioning that this will still select models with edges regularized to zero.Setting c ¼ 1 leads to very sparse or even empty networks.A hyperparameter of c ¼ 0:5 is suggested by Foygel and Drton (2010) and simulations with a psychological network structure also suggested it, outperforming higher penalizing values (Epskamp, 2016).A value closer to 0 is seen as more explorative with uncertainty concerning the network structure.This again contributes to the tradeoff between available information and the sparsity of the structure that is identified.We refer to the combination of model estimation via glasso and selection via EBIC as glasso EBIC (Williams et al., 2019).
A central goal of regularization is to make the structure of a statistical model easier to interpret (Lauritzen, 1996) and show the most important connections.Sparse structures help researchers inspect the network for impactful edges and nodes.The glasso EBIC excels in sparse estimation with its two avenues of punishing complex models.With its introduction, Foygel and Drton (2010) showed that GGM structures can be retrieved well.Epskamp (2016) extended investigations to plausible psychological network structures.The method showed high specificity, meaning edges present in the estimated network structure can be seen as genuine.On the other hand, sensitivity varies with sample size and density of the population network (see also Epskamp & Fried, 2018).Thus, edges estimated to be zero in empirical data cannot be seen as proof of nonexistence in the true model (Epskamp et al., 2017).This is due to the fact that sparsity in the true model is an assumption for regularization (e.g.Lauritzen, 1996;Meinshausen 2006).The glasso EBIC selects sparse network structures with small n, leading to a low amount of false positives (Epskamp & Fried, 2018), only returning more dense networks for high n (Epskamp et al., 2017).Costantini et al. (2015) note that optimally interpretable networks need to be sparse while closely resembling true structures, which could be seen as a justification for reducing the amount of edges even beyond those that are truly 0 in the population.For high-dimensional settings, the glasso EBIC shows good predictive accuracy (Williams et al., 2019).Concerning edge weights, correlations between true and estimated values tend to be more stable with increasing sample size (Epskamp, 2016).
While these findings are encouraging, using this approach is not without criticism.Hein€ avaara et al. (2016) showed, that the glasso is not consistent for model selection, meaning the true model is not found for n !1: Furthermore, the loss between true and estimated precision matrix does not diminish with bigger n using the glasso EBIC as one would typically expect (Kuismin & Sillanp€ a€ a, 2016).Williams et al. (2019) replicated these findings for smaller networks seeing them as more realistic in psychological research, finding less predictive value of the estimated networks for bigger n.In these conditions, glasso EBIC also showed less specificity than previously found, which can be attributed to the calculation of the EBIC value.Williams et al. (2019) advocate network estimation without regularization and Williams and Rast (2020) propose finding zeros in the precision matrix via bootstrapped confidence intervals.While the authors' approach shows promise in terms of model selection consistency and normal distribution of parameters, they only investigated it in small networks.Their arguments additionally focus on high ratios of n to p as well as high densities, meaning these methods also rely on a high degree of available information.
Contrary to these cases, psychological research could try to focus on using all items of bigger questionnaires as nodes-providing a larger p-or deal with smaller n (e.g.focus on rare psychological disorders).While it is unusual for n to be smaller than p, the ratio can be smaller than it is assumed in the critique of Williams and Rast (2020).This incurs a lack of information making regularization approaches appear worthwhile, especially if one takes missing values into account.The sustained relevance for evaluating the performance of the glasso EBIC in psychological settings is also shown by its continuing usage in psychological research (e.g.Cramer et al., 2020;See et al., 2020).

Missing values
All properties of networks and their analysis discussed in the previous section stem from studies assuming complete data.Thus, it is of great importance to investigate how the glasso EBIC performs when missing data occur.Missing values are a common problem in psychological research (especially surveys) for a multitude of different reasons.Participants may not be willing to answer a question or simply overlook it.From the researchers standpoint, there can be a planned missing data design to lower costs (Graham et al., 2006).
Missing mechanisms are traditionally categorized in three different kinds, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR; Little & Rubin, 2019;Rubin, 1976).The data set can be split into Y obs denoting observed values while Y mis denotes unobserved (or missing) values.In line with Enders (2010), missingness can be symbolized by an indicator V taking on the value 0 when data are observed and the value 1 when data are not observed.If data is MCAR the probability that a data point is missing is unrelated to any properties of the observed or missing data.Thus, pðVjy obs , y mis , /Þ ¼ pðVj/Þ with / being unknown parameters controlling the probability.For MAR data, missingness depends on values of observed variables pðVjy obs , y mis , /Þ ¼ pðVjy obs , /Þ: In other words, the probability of a data point missing is dependent only on other observed variables Y obs and the parameters /: If data is MNAR, the probability of a missing value additionally depends on the value of the observation that is missing.Consequently, the probability of missingness is explained by all (hypothetical) parts of the data pðVjy obs , y mis , /Þ: Though these three missing mechanisms are often presented as disparate states, Graham (2009) points out that they are not mutually exclusive and often do not exist in these pure forms in the real world.However, the three missing mechanisms differ in their consequences for statistical analysis and interpretation.Their impact on missing data handling is discussed in more detail in the next section.

Missing handling and its software implementation
In this section, we describe the missing data handling implemented in two R-packages for network analysis, namely bootnet (Epskamp et al., 2017) and cglasso (Augugliaro et al., 2021).Evaluation of missing handling mechanisms in combination with other analysis methods typically include standard errors.Basic crosssectional network analysis and thus our simulation does not include inferential statistics.Consequently, standard errors will not be a topic in this short review.

Deletion techniques
The bootnet package provides enhanced possibilities in network analysis on top of the popular qgraph package (Epskamp et al., 2012).Both packages are frequently used in psychological literature.One of the enhancements is providing the user with the ability to apply missing data handling mechanisms in case of incomplete data.Two integrated options are listwise and pairwise deletion.
Deletion techniques have long been used because of their intuitiveness and low computational demands.With listwise deletion, parameters are computed only using subjects with observed values on every variable guaranteeing a constant number of observations.In contrast, pairwise deletion uses all subjects with observed values on specific subsets required for the particular step in the analysis.Thus, pairwise deletion can result in using different numbers of observations for different parameters.In network analysis both deletion techniques are used for covariance estimation before the regularization step.As shown in Equation (5), regularization using glasso EBIC needs a number of observations in the computation.This is trivial for listwise deletion, while for pairwise deletion the average number of sample size in determining the (co-)variances is used.
To our knowledge deletion techniques have not yet been evaluated in the specific context of regularized network analysis.Thus, hints about their performance can only be retrieved from related contexts such as structural equation modeling (SEM).Even though deletion techniques are still offered, they have been shown to be suboptimal in a multitude of situations.Listwise deletion is mainly an option under MCAR yielding consistent estimates of covariances (Jamshidian & Bentler, 1999).Even if the deletion theoretically does not induce bias, the amount of information is drastically reduced (Enders, 2010).The deletion can lead to very small data sets, possibly resulting in bias or nonconvergence.Estimations of covariances in MAR and MNAR settings yield biased results (Enders, 2010;Jamshidian & Bentler, 1999).In contrast, pairwise deletion does not result in the same degree of loss of information.However, potentially using different n for each covariance can lead to nonpositive definite matrices (Arbuckle, 1996;Enders, 2010).Initial simulations showed bias in SEM when using listwise deletion under MCAR (Brown, 1994), though the estimates should conceptionally be consistent and later studies were unable to replicate these findings (Enders, 2010;Marsh, 1998;Newman, 2014).Pairwise deletion and listwise deletion lead to biased estimates with data MAR (Arbuckle, 1996;Muth� en et al., 1987) in a SEM context and under MNAR (Enders, 2010).In comparison, pairwise deletion showed less bias than listwise (Enders, 2001a;Muth� en et al., 1987) and is also more efficient (Enders, 2001b) resulting in its usage for our simulation.

Two-step EM
Another missing data handling option in bootnet is a procedure, which we will refer to as two-step EM in the following1 .Missing data handling and regularization are performed in two different steps.In the first step, the covariance structure is estimated by the lavaan package (Rosseel, 2012), which uses an expectation-maximization (EM) algorithm.The term EM was first used by Dempster et al. (1977), defining the expectation-(E-) and maximization-step (M-Step) in an iterative algorithm.First, the E-step fills missing values with conditional expectations, followed by the M-step which computes the parameters of interest using maximum likelihood (ML) estimation (Little & Rubin, 2019).As previously discussed, bias in covariance matrices increases when n gets to closer to p, so the results of the EM algorithm can already be biased in these cases.After the covariance matrix is estimated glasso EBIC is used to return a regularized network structure.The covariance matrix obtained from the EM algorithm lacks a straightforward sample size definition (Enders, 2010).By default, bootnet employs the average of pairwise observations in the context of the glasso EBIC method.
Because the two-step EM separates missing data handling and regularization, the focus for evaluating its previous performance should be on correct estimation of the covariance matrix.EM estimation of a covariance matrix is consistent for MCAR and MAR (Graham et al., 1996;Little, 1988;Newman, 2014).Estimation is biased in cases with MNAR, but still outperforms the deletion techniques (Enders, 2010).

Direct EM
The cglasso package offers network estimation in case of missing data via an integrative EM algorithm.The algorithm was developed by St€ adler and B€ uhlmann (2012) as a combination of missing data handling and lasso regularization.It starts by estimating the inverse covariance matrix (and mean vector) inverse covariance matrix (and mean vector) using the observed data.These values are used to compute the univariate and bivariate conditional expectations during the E-Step (St€ adler & B€ uhlmann, 2012, p. 222).Afterwards, the M-step re-estimates the inverse covariance matrix (and mean vector) using these conditional expectations and the previously described glasso.The algorithm iterates between the E-and the M-step until convergence.It is important to note: In contrast to the EM algorithm used in the bootnet package, the approach by St€ adler and B€ uhlmann (2012) does not estimate the covariance matrix and perform regularization in separate steps, but performs both tasks iteratively.Hence, we will use direct EM when referring to the process used by cglasso.Similar to the glasso method applied to complete data cases, the direct EM algorithm is executed across a range of penalty parameters.The optimal estimation of the inverse covariance matrix is subsequently identified by employing the EBIC, with n representing the total number of observations in the data set (with and without missing values).
St€ adler and B€ uhlmann (2012) evaluated their approach finding it to work well, especially in low density networks.Similar to the glasso with full data, specificity was high, while sensitivity was strongly varying with sample and network size.With regards to relative differences between estimated and population matrix, best performance was also achieved in networks with low densities.Manipulating the missing mechanisms showed that the MAR and MCAR setting yielded good results, while MNAR led to larger discrepancies, resulting in the approach not being recommended in such situations.While this study showed the promise of using this approach for missing data handling in network analysis, data examples were from biological settings which can differ from psychological networks in structure and data properties.

Goals of the present study
In this paper, we aim to fill the need to research the influence of missing data on the regularized estimation of network structures using cross-sectional data.Our study contributes to existing literature by delving into a psychologically plausible network structure and data setting, examining the effect of partially observed variables.Through a systematic evaluation of pairwise deletion, direct EM, and two-step EM, our aim is to offer insights regarding the viability of conducting network analyses in common scenarios characterized by missing data.We anticipate that the comparative analysis will enable us to establish guidelines for selecting the most suitable data handling method.

Methods
The missing data handling mechanisms were compared in a Monte Carlo simulation study.To generate realistic population settings, we used an openly accessible data set of a personality assessment (Johnson, 2014, https://osf.io/wxvth/)as the foundation for our population network.The data set consists of n ¼ 619, 150 participants assessed using the IPIP-NEO-120 (Johnson, 2014), which measures the big five personality traits (Neuroticism, Conscientiousness, Extraversion, Agreeableness, and Openness to Experience; Fiske, 1949;McCrae & Costa, 1997).It consists of 120 items-four items for every of the six subfacets of each of the five traits.The data were collected from 2001 to 2011 via online assessment.
To create networks containing zeros, we used the regularization approach on the first 1000 rows of the data set which did not have missing values.Afterwards, the networks were estimated again without a penalty parameter, forcing previously excluded edges to be zero but keeping the weights of the other edges without a penalty.This choice was informed by previous literature warning about simulating data from regularized values (Epskamp, 2016).The resulting network structures were used as population networks to simulate complete data sets for our study.
Afterwards, amputation of these complete data sets was done for conditions with missing values.Analogously to Grund et al. (2018), amputation according to the three missing mechanisms was performed by simulating latent response propensities represented by the matrix R.
In line with the notation used by Grund et al. (2018), R ij represents the response propensity of person i on a variable j. b 0jj 0 and b 1jj 0 are regression parameters for the missing data regression for a variable j with one standardized predictor P j 0 influencing the missingness.We chose b 1jj 0 ¼ 0 for MCAR and b 1jj 0 ¼ 0:6 for MAR and MNAR.The value for the intercept b 0jj 0 is a quantile of the standard normal distribution according to the missingness rate of the specific condition, so that 10%, 20% or 30% of the values in the matrix R are bigger than 0. In line with Grund et al. (2018), we added a normally distributed residual value r ij with mean zero and variance 1 − b 2 1jj 0 : Any data point with R ij > 0 was deleted.In all missing conditions, half of the variables contained missing values, while the other half were kept complete.Variables with and without missings were chosen to balance the averages of network statistics betweenness and strength, since network statistics are part of the evaluation criteria.In conditions with MAR data, a predictor variable j 0 for the amputation process of a variable j was chosen from the same subdimension of the IPIP-NEO-120 (with j 6 ¼ j 0 ).In case of MNAR conditions, missingness for variables was determined using themselves as predictors (j ¼ j 0 ).

Simulation design
The study manipulated four factors based on previous literature (Epskamp, 2016;Stadler & B€ uhlmann, 2012;Williams & Rast, 2020).As factor one, network sizes-i.e. the number of nodes, p-was varied in four steps: 10, 30, 60 and 120.The smallest network with p ¼ 10 represents clinical applications to broad symptoms.Thirty nodes were chosen as the upper end of such a clinical application as well as the lower bound for the analysis of complex situations with multiple psychological constructs.Typically used network sizes in psychological research vary between 10 and 30 nodes (Wysocki & Rhemtulla, 2021).We included conditions with 60 and 120 nodes to explore use cases involving the comprehensive analysis of entire questionnaires at the item-level, thus assessing the scalability of network analysis in such potential scenarios.
It is important to note that in our simulation network size is related to network density, with bigger networks being less dense in the population.The true density of psychological networks and its interplay with the size are still the subject of debate.In each of the four network sizes, exactly half of the items contain missing values.Sample size n was varied between 150, 250, 500, and 1000.Missingness rate (m) was varied between completely observed, 0.1, 0.2, and 0.3.Finally, MCAR, MAR, and MNAR were investigated as missingness mechanisms.The simulation excluded the duplicate combinations of a completely observed data set with the different missing mechanisms, resulting in 16 conditions without and 144 with missings.All conditions were replicated 300 times, simulating multivariate normally distributed data.The replications were simulated in a way that conditions containing the same information in data generation (number of observations and network size) led to the same 300 data sets.The amputation followed as a second step on these equal data sets.This approach not only enables the comparability within the missing handling approaches, but also the direct comparison between a specific condition with a complete and an amputated data set.
Networks created in conditions without missingness were estimated by each of the two described packages.Data created by conditions with missingness were estimated with the direct EM, two-step EM and pairwise deletion.In light of the higher availability of more modern techniques, we only take one deletion technique into account in this study.We chose pairwise deletion because of its described superiority across many other statistical methods.The inclusion of a deletion technique is mainly due to its prevalence in applications (e.g.Santos et al., 2018) and to provide a baseline with which to contrast the two EM approaches.

Performance measures
Evaluation criteria were chosen to represent the ability of the handling methods to retrieve the population networks well and yield reliable results for applied researchers using the method.The criteria can be separated into three categories: edge set identification, information loss, and the recovery of network statistics.

Edge set identification
Edge set identification was used to evaluate aspects of network analysis in nearly all of the previous literature (Epskamp et al., 2017;Mansueto et al., 2022;Williams & Rast, 2020).There are mainly two rates of interest, which were already mentioned in the introduction: sensitivity and specificity.The former is also known as true positive rate and represents the rate of correctly identifying non-zero edges: where TP represents the number of correctly identified non-zero edges (true positives) and FN represents the number population non-zero edges which were estimated to be zero in the application (false negatives).
To complement this information, specificity, also known as the true negative rate, depicts the rate of correctly identifying zero edges: Here, TN represents the number of correctly identified zero edges (true negatives) and FP represents the number of population zero edges which were estimated to be non-zero in the application (false positives).
To summarize the ability of the estimation procedure to recover truly non-zero elements of the networks we used the Matthews correlation coefficient (MCC; Matthews, 1975). 2 This coefficient is determined as:

MCC ¼
TP � TN − FP � FN ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi While the inclusion in previous evaluation studies concerning aspects of network analysis is already a good reason to investigate edge set identification, the introduction of missing values can be expected to have an impact on it.Due to the described relation of available information to density, we expect conditions with missing values to display higher specificity.The variation of missing mechanisms and handling methods could additionally affect the analyses ability to correctly identify the population edges.We assume deletion to be inferior to the EM based approaches, because it does not attempt to mitigate the loss of information introduced by missing values.

Information loss
Information loss of the estimated precision matrix was taken into account as another evaluation criterion.It describes the discrepancy between the estimated and the population precision matrix.The Kullback-Leibler loss (KL-loss; Kullback & Leibler, 1951), which was also used in previous literature in network analysis (Kuismin & Sillanp€ a€ a, 2016;Mohammadi & Wit, 2015), was computed as: where H denotes the inverse covariance matrix of the population and Ĥ the estimated inverse covariance matrix.The resulting value represents information This coefficient is commonly used in studies on network analysis and is identical to the correlation coefficient /, which is widely used in psychology.Both are also identical to the product-moment correlation of two dichotomous variables.
loss between true and estimated network -a value of 0 indicating identical matrices.It is an unstandardized measure with higher values representing worse performance.Conditions with the same p can be compared directly.We chose the loss of a completely empty network as baseline for discussing the resulting values due to the regularization's objective of driving the entries toward zero.KL-loss can be seen as a summarized measure representing the difference in the precision matrix and therefore edge weights between estimate and population.Increasing the missingness in a data set should lead to higher loss values.Since pairwise deletion struggles with the recovery of covariances under MAR and MNAR, it should show higher KL-loss compared to more sophisticated approaches.The two EM based approaches are expected to perform similar to conditions without missing values under MAR, but get worse under MNAR.

Network statistics
An important concern for applied researchers lies in having statistics describing the resulting network.Until recently, centrality indices (Epskamp et al., 2018;Opsahl et al., 2010) were typically used, aiming to find nodes with the highest influence within the network.In weighted networks these included closeness, defined as the strength of indirect and direct connections, betweenness, representing importance of a node in connecting other nodes, and strength (Epskamp et al., 2018;Freeman, 1978).
Recently, applying these statistics in psychological research has come under criticism.The focal point of the discussion is the question whether psychological processes function by the same mechanics as is assumed in other applications (Bringmann et al., 2019).For example, betweenness relies on shortest paths between two nodes.It is at least questionable, whether human characteristics are related to each other in this way.Closeness has a value of zero for all nodes, if the network is disconnected.Furthermore, Epskamp et al. (2018) show that only strength is stable for dropping observations in data sets, which undermines reliability and interpretability of the other statistics.Consequently, our evaluation of the performance relies on the strength values.Inspired by Barrat et al. (2004), the strength s of node i in the weighted networks is defined as the sum of the absolute weight values of its edges s i ¼ P jw ij j with i 6 ¼ j: In line with Epskamp and Fried (2018) we used correlations of strength-values between the population and estimated models as a performance measure.We calculated correlations separately for the two types of nodes in our simulations-variables with and without missing values.Evaluation studies of other analysis tools in psychological methodology often use bias as a performance criterion (e.g.Geiser & Simmons, 2021), which we also computed.The average absolute bias b for a node i in strength s is defined as the mean difference between the estimated ŝ and the true value s across all replications rep.
Values closer to zero indicate less bias and are therefore preferable.In situations with many parameters, merging them together is a possibility to obtain interpretable aggregate performance measures (e.g.Geiser & Simmons, 2021;Koch et al., 2014).� b c is the average bias of strength for nodes of the same type c across all 300 replications.The two types in our simulations are variables with and without missingness.It is important to note that due to the regularization, edge weights are biased even with full data.Since strength represents the sum of the weights, it inherits this bias.Still, bias values can be helpful for the evaluation by comparing conditions with and without missing values and comparisons between missing data handling methods.
As mentioned earlier, strength values are linked to partial correlations and therefore also to the entries in the inverse covariance matrix.Thus, we can anticipate a similar pattern comparing conditions as in the KL-loss.However, this investigation is still valuable as due to our amputation design both strength correlation and bias can help distinguish between variables with and without missingness.The inclusion of strength correlation and bias is logical because the correlation between values can remain consistent even if all estimates are affected by bias in the same manner.Increasing rate of missing values is expected to have a negative impact on correlation and bias, particularly for nodes with missingness.Notably, the correlation of values for nodes without missingness may remain largely unaffected, while the bias is likely to increase due to imprecise estimations of the precision matrix.Furthermore, the additional effects on strength correlation and bias resulting from MNAR should be notably more severe for variables with missing values.

Software
All simulations and analyses were performed in R Version 4.1.0(R Core Team, 2021).Simulation and analysis code can be found on the Open Science Framework (https://osf.io/sqn4h/).The package MASS (Venables & Ripley, 2002, Version 7.3-54) was used to create the data from the population model.The amount of penalty parameters k was set to 100 with a 0.01 ratio for calculating the minimum value.cglasso (Version 2.0.4) and bootnet (Version 1.4.3)default to different settings for spacing and minimum penalty parameters.To ensure comparability we manually adjusted the settings in cglasso to match those of bootnet, because it is the approach more prevalent in psychological research.The hyperparameter c was set to 0.5.In addition to these differences in default arguments, cglasso also performs the glasso EBIC in a slightly different fashion from what was described in the previous section.Specifically, the EBIC value is calculated slightly different from what was described in the previous section, but this change does not impact model selection.
There is also a small change regarding regularization in bootnet.Regularization is done with correlation matrices instead of the traditional usage of covariance matrices.Thus, the results of network analysis using glasso EBIC differ for bootnet and cglasso even without missing data handling.With the correlation matrix being a standardized version of the covariance matrix, the difference between the two is rather marginal but does exist (and will be demonstrated in the results later on).We also had to modify the qgraph package (Version 1.6.9;underlying bootnet) slightly in its communication with lavaan (Version 0.6-8; used for covariance estimation).This modification has since been integrated in the qgraph package on CRAN.

Results
In light of the large number of conditions and performance measures, not all results of the study can be shown here.In the electronic supplemental material (ESM) we provide further figures and tables related to findings, which are mentioned but not discussed in detail.Only the most crucial figures and tables are presented here.As an additional electronic supplement, we provide a shinyApp3 , where the interested reader is able to look at specific combinations with violin plots for single replications and aggregated tables for conditions.
Before going into the evaluation of the missing handling approaches, it is important to investigate whether the described difference in handling complete data between the two used packages leads to differences in the packages' performance without missing values.For example, a look at different measures shows the following results: averaged across all complete data conditions (across all sample and network sizes), differences are very small KL-loss (M ¼ −0:055 with similar variability for bootnet SD ¼ 13.973 and cglasso SD ¼ 13.587) and the MCC (M ¼ 0.007 with similar variability for bootnet SD ¼ 0.265 and cglasso SD ¼ 0.247).This serves as an indication that more pronounced performance differences found in other conditions may be attributable to the missing handling approaches, not to technical differences in the implementation of the network estimation process.Note that all following figures include full data results for both packages, which helps to reinforce this conclusion.

Convergence
Non-convergence occurred exclusively during the covariance estimation with lavaan (Version 0.6-8), because the resulting covariance matrix was not positive definite.Conditions prone to non-convergence are characterized by a big number of nodes and a small amount of information (i.e.high missingness rates and low n) and was almost entirely limited to conditions with pairwise deletion.For these conditions it frequently led to convergence rates of 0%.For the two-step EM convergence rate did not drop below 98.7% in any condition and was limited to conditions with n ¼ 150.Direct EM converged in all replications.A table with convergence rates can be found in the ESM (Table ESM1).Replications which failed to converge were excluded from all further analysis.

Matthews correlation coefficient
Figure 1 shows the MCCs for conditions simulated under MAR.Performance can be evaluated by comparing the missing handling mechanisms to the complete data cases, but also to the usual standards for correlations.A first quick look at the results revealed conditions with 150 observations resulted in values close to zero.This is due to the fact, that these cases end in empty or at least relatively sparse networks resulting in very low sensitivity (M ¼ 0.02).Therefore, conditions with 150 observations are excluded from further investigation in this paper.Corresponding results can be investigated in the previously mentioned shinyApp accompanying this study.
250 observations result in MCCs of around 0.4 only for complete conditions.None of the missing handling mechanisms can reach this performanceeven in the smallest network conditions.Generally, larger ratios of n to p or less missingness led to better recovery of the network structure, indicating better performance with an increase in available information overall.Concerning the handling mechanisms, the difference between direct and two-step EM are nuanced.Conditions with 10 and 20% missingness and a medium ratio of number of observations to number of nodes resulted in nearly equal performance of the two EM based handling mechanisms.With less information available or bigger networks, generally the direct EM showed higher correlations than the two-step EM.In small networks, direct EM performed poorly, however.For the largest sample size, two-step EM generally returned a better estimation of the structure (with the exception of the largest network, highest missingness).Pairwise deletion did not converge for conditions with a high proportion of missing values and large networks.Non-convergence in many conditions hampers the evaluation of the pairwise deletion, yet the converged conditions do not fall far from the EM approaches.
The same plots for conditions with data MNAR (Figure ESM1) and MCAR (Figure ESM2) can be found in the ESM.While the values under MCAR do not differ from MAR substantially, MNAR shows worse performance for all handling mechanisms.The main points in comparing the handling mechanism of the MAR condition stay consistent through the different missing mechanisms.

Sensitivity and specificity
Table 1 shows sensitivity and specificity in detecting non-zero edges under MAR.Comparing the values for the two handling methods in general, direct EM estimation seems to be a lot more sensitive, while twostep EM is more specific.Furthermore, looking at the main structure of the table, there is an interesting abnormality concerning the network with 30 nodes that does not seem to fall in line with other results.For example, sensitivity is better for 500 observations in the network with 60 nodes compared to 30 nodes, while usually an increase in network size with constant observations leads to less sensitive results.This indicates some specific characteristics of the chosen population network.
Going into a more detailed analyses reveals that estimations with n ¼ 250 result in very sparse networks as indicated by their low sensitivity and high specificity.This in turn leads to the low MCCs discussed above.While this is especially true for conditions with missing values, it can also be seen for complete data.For complete data conditions with n ¼ 250, it is only possible to recover approximately half of the true edges for the smallest network size of 10.
Looking into conditions with 500 observations, sensitivity with missing values is only comparable to complete data with 10% missing for all sizes and 20% for the smallest network.On the other hand, specificity in conditions with missing values is close to the complete data cases.With the increase to 1000 observations, the direct EM is able to compete with complete data in sensitivity.However, the approach loses specificity in smaller networks.The two-step EM still

MULTIVARIATE BEHAVIORAL RESEARCH
performs well in specificity and gains sensitivity especially for the smaller network sizes.However, sensitivity stays relatively low for bigger network sizes and a bigger amount of missing values with 16.4% recovered true edges in the most extreme case.
Comparing the two EM approaches, the two-step EM results in sparser networks.This is indicated by its lower sensitivity and higher specificity across all conditions, leading to a worse overall balance between sensitivity and specificity except in cases with small p and large n.In these cases direct EM's performance worsens with regards to specificity.Corresponding tables with detailed results for MNAR (Table ESM2) and MCAR (Table ESM3) are provided in the ESM.The presented patterns in comparing the two EM algorithms stay the same in these conditions.While resulting sensitivity and specificity values are nearly equal under MCAR and MAR, performance worsens under MNAR-especially for conditions with missingness at 30%.

Information loss
Figures 2 and 3 show the distributions of the KL-loss for MAR with n ¼ 500 and n ¼ 1000, respectively.To best interpret the performance of the missing data handling, the results should be compared to the complete data cases.Furthermore, the dashed black line in these figures represents the loss value if the estimated network were completely empty.In contrast, a perfectly replicated precision matrix would lead to a loss of zero.Results for conditions with n ¼ 250 reiterated the fact, that for these low sample sizes very often empty networks are estimated (Figure ESM3).
For cases with n ¼ 500 (Figure 2), the presented violin plots show a broad range in loss values, even for the networks without missing data.Therefore, the estimation can be seen as inconsistent.The condition with p ¼ 30, which already defied regular patterns for edge set identification, shows this in the most egregious fashion.In all other network sizes, the networks show an average loss of about 25% of the true precision matrix.Of the three missing data approaches, direct EM performs best on average, with the two-step EM outperforming pairwise deletion.For all network sizes, direct EM performs comparably to complete data situations when missingness is 10%.For small networks (p ¼ 10) the bulk of replications also showed similar KL-loss for higher missingness rates, but numerous replications resulted in high loss (overly sparse networks).Two-step EM performed noticeably worse than its direct counterpart in all situations, with the differences becoming more pronounced with less available information (i.e. in larger networks and higher missingness conditions).Pairwise deletion shows the highest loss in all conditions and, as mentioned above, did not provide any results in the three lowest information density conditions due to non-convergence.Simulating 500 observations under MNAR (Figure ESM4) resulted in the same patterns for comparing the handling approaches, with direct EM showing the best results.While performance was generally worse compared to data under MAR, direct EM could still reach the same results as complete data sets for 10% missingness in the smallest network.In other network sizes, there was a slight increase in KL-loss for 10% missing values, which grew with increases in rate of missingness.KL-loss for data simulated under MCAR (Figure ESM5) is slightly lower than under MAR.Described patterns between the handling mechanisms stay the same, with pairwise deletions performance being closer to the EM approaches.
Figure 3 shows the loss values under MAR with 1000 observations.Overall, the higher amount of observations improves performance of all approaches.This is true not only on average, but the dispersion of results across replications is also drastically reduced.Regarding the missing handling methods, the same patterns emerge as for 500 observations, with direct EM performing best and pairwise performing worst.Direct EM is now able to estimate the precision matrix well in all presented conditions.The network with 30 nodes stays an exception with the direct EM showing a higher loss than complete data conditions.Simulating 1000 observations under MNAR (Figure ESM6) leads to more KL-loss for all handling mechanisms, while keeping the ranking.However, direct EM still performs acceptable, if the ratio of available to missing data is large.It gains similar results to complete data sets with all combinations at 10%.20% missingness leads to a slight increase in loss, while 30% shows substantially more.With data simulated under MCAR (Figure ESM7) direct EM performed even better than under MAR.Different replications resulted in consistent results with loss vales close to 0 across all conditions.Two-step EM also improved its performance, but still performs worse than direct EM.Performance of pairwise deletion proved to be better under MCAR than under MAR, being comparable to the results of two-step EM in these cases.

Strength bias
The final group of evaluation criteria are network statistics.Figure 4 shows bias for strength values with data simulated under MAR for n ¼ 500.The dashed black lines represent the maximum value of negative bias, which results if an estimated network structure does not contain any edges.
Generally, performance gets worse with more missing values and larger network sizes.The absolute bias increases, resulting in larger negative values.This can be attributed to the presented drop in sensitivity, with non-zero edge weights estimated as 0. An exception to this finding is, again, direct EM with high missingness in the smallest networks, where sensitivity is high but specificity is low.This results in a positive bias for strength values.Edges between nodes are estimated as stronger than they are in the population network.Comparing nodes with and without missing values, there is no difference for complete data and conditions with low degrees of missingness.With 30% missingness nodes with missing values show slightly higher bias across all handling mechanisms.
Shifting our focus to differences between handling mechanisms, pairwise deletion performs similarly to both EM approaches with a low amount of missingness.But especially with higher missingness and nodes with missing values, the EM algorithms are able to retrieve strength values more reliably.Overall, direct EM shows less bias than two-step EM.Both EM algorithms perform similarly to complete data conditions for 10% and mostly for 20% missingness.For higher missingness and bigger networks, direct EM shows higher negative bias compared to complete data, but outperforms the two-step EM, which drops to a high negative bias.
The variability in bias between replications is higher for direct EM with 10% missingness.This is also true compared to full data conditions, which makes the results of direct EM less reliable.As we move to 20% missing data, we still observe higher variability in comparison, but this is also due to a floor effect emerging in two-step EM and pairwise deletion because they return increasing numbers of completely empty networks.Notably, this floor effect does not occur for direct EM looking at nodes without missingess, but for nodes with missingness there is increasing bias.In contrast, two-step EM shows relatively stable variability across different levels of missingness, and the difference in variability between nodes with and without missing values is less pronounced, even when missingness rates are higher.
Simulated results under MCAR (Figure ESM8) reveal similar patterns for all conditions with slightly better average performance than for MAR.Even under MNAR (Figure ESM9) the EM algorithms show bias similar to that found in the conditions without missing values for a small amount of missingness and the smallest network.Differing from MAR, the performance already drops for 20%, especially for nodes with missing values, but direct EM is also the best performing approach in these conditions.1000 observations under MAR (Figure ESM10) improve the results in comparison to n ¼ 500 with more replications performing well and less variation in bias.Direct EM remains competitive in performance to complete data sets in all conditions.The algorithm results in very low, sometimes even positive bias for nodes without missing values, when missingness is high.

Strength correlation
Though precautions were taken to have similar values in centrality indices for nodes with and without missing values, results indicate differences between these two even in cases with complete data.This means that correlations between estimated and true strength values differ between these two types of nodes, even when no data are missing.This is particularly true for smaller networks, meaning interpretation for differences between these two kinds should therefore focus on the bigger networks.Figure 5 shows the average correlation of strength values between estimated and population networks.Mean strength correlations are 0.75 for 500 observations with small network sizes or small degrees of missingness.For 1000 observations, nearly all conditions reach this value.On the other hand, correlations in conditions with 250 observations are still low on average, even without missing values in the data set.Concerning differences in missing data handling mechanisms, direct EM shows better performance for nodes without missing values compared to nodes with missing values, while the opposite is true for the two-step EM.Yet, direct EM's performance is at least on par with two-step EM for the nodes with missing values in every condition.Actually, direct EM shows higher mean correlations for conditions when the degree of available information is low (high missingness and high number of nodes).
Generally, values of strength correlation are rather consistent over replications.Using 500 observations simulated under MAR (Figure ESM11), a large degree of variability was only found for the smallest network size.Still, there are some replications resulting in correlations of zero.The simulation shows more consistent results for networks with a higher number of nodes.The size of deviations is similar for all handling mechanisms.Using 1000 observations (Figure ESM12), consistency improves even more.There are still some outliers with small correlations in the smallest networks and for bigger networks the deviation approaches 0 (e.g.SD ¼ 0.02 for 20% missingness in p ¼ 120 network size using direct EM).Furthermore, direct EM shows higher consistency for all simulated conditions using 1000 observations.
With data simulated under MCAR (Figure ESM13) strength correlations between estimated and population network are slightly higher than under MAR, while under MNAR, there is a slight drop in performance (Figure ESM14).Both these effects are the strongest for conditions with 30% missingness.Direct EM shows mean correlations of at least 0.5 with 1000 observations, even when inspecting conditions with MNAR.This indicates that direct EM is able to identify the relevance of nodes in terms of strength centrality.Patterns between handling mechanisms stay the same under all three missing mechanisms.

Discussion
The main goal of this study was to evaluate the performance of different approaches to handle missing values in cross-sectional network analysis using glasso EBIC for model search.Beyond this aim, the simulation also yielded some general insights into the application of network analysis in psychological contexts.
The results indicate that large sample sizes are needed to achieve a good performance even without missing values.This result is in line with Williams et al. (2019), who found a big performance improvement for using 250 compared to 100 observations.In the present simulation, it was possible to recover the most important aspects of the population network with 250 observations only for the smallest network size (p ¼ 10).For networks with more than 10 nodes, 500 observations were needed to reach the benchmarks used by Epskamp and Fried (2018) in edge set identification and average correlations of strengths between estimated and population networks.Although the needed number of observations might seem high, particularly in the context of clinical psychology, a review by Wysocki and Rhemtulla (2021) of 37 applied studies using network analysis within the domains of psychopathology and personality revealed that almost half of these cases encompassed observation counts surpassing 500.
Concerning the KL-loss, there is no easy way to have a definite comparison to the results of previous literature, because of its non-standardized nature.From empirical experience, it seems to be adequate for most conditions with 500 observations.Our introduction of the loss of completely empty networks to set a scale should make comparisons more feasible for future studies.1000 observations display even more consistency, especially for centrality indices, but also have the disadvantage of estimating more false-positive edges.This drop-off can also be found in the simulation by Williams and Rast (2020), but is not as drastic in our results.Epskamp and Fried (2018) hint at even more consistent results for sizes exceeding the ones used here, while maintaining stable values in specificity.
Since the present study is, to our knowledge, the first investigating the performance of missing data handling methods for cross-sectional networks in a psychological setting, their performance has to be evaluated in comparison to complete data and each other.The chosen evaluation criteria showed largely homogeneous results.Direct EM seems to be superior to the two-step approach across most conditions.Its biggest strength lays in recovering the true precision matrix (low KL-loss), where it is often very close to the performance of conditions without missing values.Similar results can be found for strength correlations.Using direct EM also results in the least bias for strength values, especially for nodes without missing values.This indicates that direct EM is able to use the information provided by observed nodes adequately, where the two-step EM fails to do so.In fact, the twostep EM estimation is more severely biased for the strength of nodes that do not contain missing values, both in terms of absolute bias and correlations between population and estimated values.However, with a large number of observations and high degrees of missingess, direct EM overestimates the strength of nodes with missing values slightly.
Overall, MCCs show the smallest differences between the two EM algorithms among all evaluation criteria.For these, two-step EM's good performance is driven by its cautious approach, resulting in a specificity of 100% in some conditions, while having a low, albeit non-zero sensitivity.When considering values for both sparsity and specificity, it becomes evident that direct EM tends to produce denser networks, whereas the two-step approach yields sparser ones.High specificity of the two-step EM could be influenced by the manner in which the sample size is defined in the second step (glasso EBIC ), where the average number of pairwise observations is employed as N.For a more reliable identification of relevant edges, direct EM should generally be preferred.It is important to note that the superior performance of direct EM from cglasso is valid under the previously described modification to make the selection of penalty parameters equal to the approach of bootnet.
The simulation showed that pairwise deletion always performed at least slightly worse than the EM approaches, indicating that it should not be used in network analysis.This is even true for MCAR, where it is theoretically unbiased.The focus of the present paper was on MAR, for which EM approaches are seen as unbiased when utilized in other statistical analyses.In most cases, findings regarding the performance of missing data handling mechanisms under MCAR do not differ substantially from those found for MAR-something which was also found in this study.Conditions with MNAR were also included in the simulation and returned some interesting findings.The biggest differences in performance between MAR and MNAR appear for higher proportions of missings when investigating the KL-loss and strength bias.These criteria are more sensitive to smaller differences between population and estimated network than the criteria relying on zero/non-zero decisions regarding edges (e.g.MCC).In this study, direct EM handles MNAR surprisingly well.We were unable to detect substantial differences between data simulated under MNAR and MAR.This surprising result may be attributed to our approach in simulating missingness, aligning with the definition of missing mechanisms (see Little & Rubin, 2019;Rubin, 1976).In MNAR scenarios, the propensity for missing data is tied (at least in part) to an unobserved variable, which we translated to being related to the value of the variable itself.When dealing with network analysis involving psychological constructs, we encounter strong interdependencies among variables.It is reasonable to assume that missingness can still be accounted for by other related variables, even if they were not part of the missingness process.To create a negative impact in a scenario with data MNAR, future research could involve amputating groups of strongly related variables in a blockwise manner.
Concerning sample sizes, small data sets should not be used to perform networks analysis with missing values, which is in line with the findings for complete data sets.While 250 observations seem sufficient for interpretable results with few missings (10%) and small networks (p ¼ 10), 500 observations should be the minimum when either aspect increases.1000 observations contributed to a better performance on all criteria, indicating a payoff in trustworthy results beyond the 500 observation minimum.For the biggest networks and/or higher missingness, only direct EM with 1000 observations performed similar to complete data situations.120 nodes and 30% missingness led to a small but acceptable decrease in performance for 1000 observations.It is important to remember that the population networks themselves were also estimated from a data set of 1000 observations, which could be a reason for finding the best performance in these settings.

Limitations and future directions
The present study focused on one possibility to estimate a network structure-using lasso regularization and choosing the penalty parameter via EBIC.However, regularization is not undisputed.Williams et al. (2019), Williams and Rast (2020) stress the fact, that especially for n bigger than p (conditions, in which glasso EBIC and direct EM seemed to perform well with and without missing values in our simulation), there is no need for regularization.However, previous studies did not investigate conditions with missing values and looked only at small differences in network sizes.Future simulations are needed to compare direct EM to a combination of the unregularized network estimation with missing data handling.Studies need to investigate the tradeoff between using regularization and not using it when missing data is present for varying ratios of n and p.Moreover, the default definition of N in the two-step approach of bootnet may warrant reconsideration.As previously described, the EM algorithm does not have a straightforward sample size associated with the covariance matrix.Other statistical analyses using a two-stage approach implement corrections for their estimates (e.g. standard errors by Savalei & Bentler, 2009), It is possible that enhancing the performance of the twostep EM method could be achieved by identifying a suitable value for N.
The present simulation was conducted using continuous data in all conditions.While this decision was made to fit the requirements for both EM algorithms, ordinal data are at least equally prevalent in psychology, particularly in personality or clinical questionnaires.Literature advises to choose an estimation method based on the research question (see Isvoranu & Epskamp, 2021).Future work should focus on expanding and evaluating these estimation methods to being able to handle missing data.This also includes investigations into the robustness of network analysis (in case of missing values) with a varying number of categories and distributions of the ordinal data.
Population network structures and values were retrieved from a psychological data set.While this could be seen as a strength to be more realistic (Mansueto et al., 2022), it resulted in different densities over different network sizes, which could have had an influence on the simulation.Epskamp and Fried (2018) speculate that specificity values are higher for sparser networks.In the present simulation, specificity is higher in conditions with a higher number of nodes, which are less dense networks.Additionally, all data sets for one specific number of nodes were simulated from the same population structure.Thus, replications for other networks retrieved from psychological settings are still needed to investigate the plausibility and external validity of the presented results.This point is especially well represented by the network with 30 nodes, that reduces the simplicity of the patterns of the results, though the general findings regarding the comparisons between the approaches hold true even for these conditions.Handling methods were chosen based on the premise of them already being implemented in software.Likelihood based approaches, which EM algorithms are a part of, represent one modern way to handling missing data with many different methods.
Comparing their performance to another commonly used technique, multiple imputation strategies (Little & Rubin, 2019), would be of interest.There is no implementation of regularized network analysis with multiple data sets yet, possibly based on the problem of combining regularized estimates.Ongoing work of Nehler and Schultze (2022) focuses on stacking imputations to determine edge weights.
For the purpose of simulating the three different missing mechanisms, the data set was split into fully observed variables and reduced variables.A more realistic scenario would be missings on all variables, which should also result in a mixture of different missing mechanisms (Graham, 2009).This can of course be seen as a limitation for the practical use of the results, especially considering the slight difference in strength bias and correlations for nodes with and without missing values.On the other hand, the relatively consistent results across the three mechanisms are encouraging.With some of the missingness based on MAR, there should be little bias in the results using direct EM.

Conclusion
The aim of the present study was to evaluate the performance of missing data handling methods in crosssectional network analysis.We focused on available tools to retrieve recommendations for applications with missing values.The evaluation criteria showed the package cglasso (Augugliaro et al., 2021) using a direct EM algorithm to be the best choice across most of the tested conditions.An exception was found with a large ratio of n to p and high missingness, where the two-step EM integrated in the bootnet package (Epskamp et al., 2017) is preferable due to its higher degree of specificity in identifying edges.Overall, network analysis with missing values should only be used with sample sizes of 500 or more, which are not always available in psychological research.Since this is the first inclusion of missings using regularization in cross-sectional network analysis in a psychological context, replications and extensions are urgently needed.

Figure 1 .
Figure 1.Average Matthews correlation coefficients for all conditions with data simulated under MAR.Columns vary different degrees of missingness, while rows vary network size.Values for pairwise deletion are not available for larger networks with higher missing rates (bottom right of the figure) due to non-convergence.

Figure 2 .
Figure 2. Kullback-Leibler loss for complete data in both packages as well as all missing handling methods.Missing values were simulated under MAR.All shown conditions used 500 observations.Columns vary different degrees of missingness, while rows vary network size.Dashed lines represent the loss resulting from a completely disconnected network.

Figure 3 .
Figure 3. Kullback-Leibler loss for complete data in both packages as well as all missing handling methods.All shown conditions used 1000 observations.Missing values were simulated under MAR.Columns vary different degrees of missingness, while rows vary network size.Dashed lines represent the loss resulting from a completely disconnected network.

Figure 4 .
Figure 4. Bias of strength values.Results for A nodes without and B nodes with missing values.All conditions shown used 500 observations.Missing values were simulated under MAR.Columns vary different degrees of missingness, while rows vary network size.Dashed lines represent the maximum negative bias possible.

Figure 5 .
Figure 5. Average correlations of strength values between estimated and population networks as a function of sample size.All conditions with a missingness above 0 were simulated under MAR.Lines are separated for nodes without and nodes with missing values.Columns vary different degrees of missingness, while rows vary network size.Values for pairwise deletion are not available for larger networks with higher missing rates (bottom right of the figure) due to non-convergence.

Table 1 .
Means (and standard deviations) of specificity and sensitivity for conditions simulated under MAR with n � 250: Results for pairwise deletion are not shown due to a large proportion of non-convergence.