The Effect of Model Size on the Root Mean Square Error of Approximation (RMSEA): The Nonnormal Case

Abstract This study aimed to understand the effect of model size on the root mean square error of approximation (RMSEA) under nonnormal data. We considered three methods for computing the sample RMSEA and the associated confidence intervals (CIs; i.e., the normal theory method, the BSL method, and the Lai method). The performance of the three methods was compared across various model sizes, sample sizes, levels of misspecification, and levels of nonnormality. Results indicated that the normal theory RMSEA should not be used under nonnormal data unless the model size is very small. In the presence of nonnormal data, researchers should consider using either the BSL or the Lai method to estimate RMSEA and its CIs. The Lai method is recommended when very large models are fit under nonnormal data.


Introduction
Structural equation modeling (SEM) is a widely used statistical method to analyze data in social science research and other disciplines. One crucial step in conducting a SEM analysis is to assess the goodness of fit of the hypothesized model to the data (Bearden et al., 1982;Hoelter, 1983;Lomax & Schumacker, 2004), as almost all models are "incorrect," or misspecified to some degree in real data applications (MacCallum, 2003). To quantify the size of model misfit, a host of goodness of fit indices have been developed and widely used in practice (Lai, 2021;Davey et al., 2005). The root mean square error of approximation (RMSEA; Steiger and Lind, 1980;Browne & Cudeck, 1992) is one of the most popular fit indices currently used in SEM applications, as well as studied in the SEM literature.
The population RMSEA is defined as the discrepancy due to the approximation per degree of freedom. Accordingly, RMSEA measures the "badness" of fit with lower RMSEA values indicating better model fit. When fitting SEM models to sample data, one common approach of using RMSEA is to compare its sample value with a fixed cut-off recommended by methodological researchers. For example, Hu and Bentler (1999) suggested that a sample RMSEA with a value 0.06 can demonstrate acceptable model fit. Assuming that the observed data are multivariate normally distributed, the sample RMSEA follows a non-central chisquare distribution under model misspecification (Browne & Cudeck, 1992). The known asymptotic distribution allows researchers to calculate the analytic confidence interval (CI) for the population RMSEA and to conduct a formal statistical test for comparing RMSEA with a cutoff at the population level (i.e., RMSEA c). Thus, the sample estimate of RMSEA can be used in both a descriptive and inferential manner, which is considered a key strength of RMSEA over other fit indices (Browne & Cudeck, 1992;Li & Bentler, 2006).
In an ideal situation, the values of RMSEA would only reflect the "effect size" of model misspecification. However, in addition to the level of model misfit, RMSEA can be influenced by other characteristics of the model (Saris et al., 2009). The size of the fitted model 1 is considered one important factor to consider when estimating and interpreting the RMSEA (Savalei, 2012;Kenny et al., 2015;Shi et al., 2019).
Previous studies have shed light on understanding the effect of model size on RMSEA. At the population level, since the RMSEA penalizes model complexity by incorporating the df in the denominator of the formula, its population value changes as df increases or decreases (Kenny and McCoach 2003;Chen et al., 2008;Savalei, 2012). Therefore, the population values of RMSEA computed from models with very small or very large df could be artificially inflated or deflated. Shi et al. (2022) showed that when df ¼ 2, the population RMSEA for a modestly misspecified model (i.e., fitting a one-factor model to two-factor data with an inter-factor correlation of q ¼ 0.90 and standardized factor loadings k ¼ 0.80) is 0.120, which is far above the conventional cutoff for close fit. Moreover, Shi et al. (2019) showed that for the same level of model misfit, as df increased from 35 to 7,020, the population RMSEA dropped from 0.120 to 0.009 when using misspecified models with omitted residual correlations.
At the sample level, results from many studies have also suggested that researchers should be cautious when estimating and interpreting RMSEA estimates from either very small or very large models (Kenny & McCoach, 2003;Kenny et al., 2015;Maydeu-Olivares et al., 2018;Shi et al., 2019Shi et al., , 2022. First, the bias and variability of the sample RMSEA are affected by model size. Shi et al. (2019) found that for a fixed sample size, the sample RMSEA tended to be upwardly biased, and the differences between the sample and population RMSEA generally became larger as model size increased (e.g., df increased from 35 to 7,020). On the other hand, the sampling variability of RMSEA tended to decrease as model size (df) increased. Thus, as shown by Kenny et al. (2015; see also Shi et al., 2022), both the theoretical standard errors and empirical standard deviations of the sample RMSEA were artificially inflated when fitting models with very small df (e.g., df ¼ 1), especially when the sample size was small.
Previous research has also shown that the CIs of RMSEA become less accurate with an increase in model size, especially in small samples. For example, Curran et al. (2003) showed that for N ¼ 50, the coverage rates of the 90% CI decreased from 88 to 73% as df increased from 24 to 85. Maydeu-Olivares et al. (2018) found that when fitting large models (e.g., p ¼ 60 and df ¼ 1,710), the CI coverage rates for RMSEA were noticeably lower than the nominal level (e.g., 90%), even when the sample size reached 1,000. When the size of the fitted model is very small (e.g., df ¼ 1), the CIs for the RMSEA are generally accurate in terms of the coverage rates. However, the width of the CI tends to become wider as df decreases, suggesting a greater level of uncertainty on the parameter estimate. As Kenny et al. (2015) report, for a fixed sample size of 200, the mean width of the 90% CIs increased from 0.043 to 0.155 as df decreased from 50 to 1.
The above conclusions were all made based on multivariate normal data. In real applications, however, psychological data are often nonnormally distributed (Cain et al., 2017;Micceri, 1989). Though theoretical population RMSEA values are not influenced by the distribution of the sample data, point, and interval estimates for the sample RMSEA rely on distributional assumptions of the observed data.
The sample RMSEA and associated CIs are typically computed using either the ML-based likelihood ratio test statistic or the chi-square test statistic (T ML ) 2 , both of which assume multivariate normality and follow a chi-square reference distribution. When fitting models with nonnormal data, however, the ML-based likelihood ratio test statistic no longer follows a chi-square distribution. Researchers have proposed robust versions of the likelihood ratio test statistic under ML estimation method with either mean, or mean and variance adjustments (e.g., Satorra and Bentler, 1994). These robust test statistics have been implemented in most SEM software (e.g., T MLM under "estimator ¼ MLM" in Mplus; Muth en & Muth en, 2016) and studies have shown that the statistics generally perform well under nonnormal data conditions (Maydeu-Olivares, 2017).
A naïve method to compute the sample RMSEA under nonnormal data is to simply replace the normal theorybased chi-square test statistic (T ML ) with a robust version of the test statistic (e.g., T MLM ). However, it has been shown that the RMSEA under the naïve method is an inconsistent estimate of the population RMSEA 3 and is not recommended for used (Brosseau-Liard et al., 2012;Gao et al., 2020;Shi et al., 2020). Methodologists have proposed two alternative methods to obtain consistent point estimates and more accurate CIs for RMSEA under nonnormal data. The first method was introduced by Brosseau-Liard et al., 2012; see also Li & Bentler, 2006;Savalei, 2018). Similar to the naïve approach, the BSL method uses the robust test statistic to compute the sample RMSEA and its CI. However, a scaling factor that is used to obtain the robust test statistic is also included in the sample RMSEA formula to make the estimator consistent for the population RMSEA. More recently, Lai (2020) proposed a new point estimator and CI method for RMSEA in nonnormal data. Instead of using the robust chi-square test statistic, the Lai method estimates the sample RMSEA directly using a value from the sample fit function. The asymptotic distribution of the sample fit function was derived using the multivariate delta method, and Wald CIs for both the fit function and RMSEA can thus be obtained under a normal distribution. More details and formulae for computing RMSEA under different methods are reviewed in the next section.
Several studies have been conducted to explore the performance of RMSEA under different model size with nonnormal data. Maydeu-Olivares et al. (2018) examined the performance of the BSL point estimator and its associated CI for RMSEA with various levels of nonnormality and model size (i.e., df ranged from 35 to 1,710). Results showed that the BSL method yielded more accurate point estimates and CIs compared to those obtained using the normal theory-based chi-square test statistics. However, the BSL CIs only performed well in small models (i.e., df ¼ 35). As model size increased, the coverage rates of the BSL CIs were lower than the nominal level (i.e., 90%). It is noted that the BSL RMSEA and CI were computed using robust test statistics, and several robust test statistics based on ML have been proposed (see Maydeu-Olivares, 2017 for a review). Maydeu-Olivares et al. (2018) only focused on one "version" of the BSL RMSEA, however, which was computed using the mean adjusted test statistic (i.e., T MLM ). Gao et al. (2020) and Shi et al. (2020) compared the performance of BSL RMSEAs using three different robust corrections 2 Referred to as the normal theory method, or RMSEA ML thereafter.
applied to the likelihood ratio test statistic (e.g., T MLM, T MLR, and T MLMV ). Results showed that the BSL-based RMSEA performed better than the normal theory RMSEA in nonnormal data when using any of the three robust test statistics. In addition, among the choice of robust corrections, the BSL RMSEA using the mean-and varianceadjusted test statistic (T MLMV ) yielded the most accurate point estimates and CI. Regardless of the choice of robust test statistic, the performance of BSL RMSEA became worse as the model size increased (e.g., df ¼ 464). The performance of the Lai method has also been evaluated using simulations (Lai, 2020). Results showed that in fitting latent curve models with nonnormal data, the Lai method overperformed the normal theory method, the naïve method, and the BSL method by yielding more accurate point estimates and better CI coverage rates.
Although previous research has considered the effect of model size on RMSEA under nonnormal data, there are limitations in the existing literature. First, the model size conditions manipulated in previous studies are somewhat restricted. The minimum and maximum df considered by Gao et al. (2020) were 104 and 464, respectively (i.e., factor analysis models with 16 or 32 observed variables). For Lai (2020), the df ranged from 6 to 30. In practice, both very small and very large models are frequently encountered in psychological research. For example, when fitting a one-factor model for a short scale with 4 items (e.g., global job satisfaction; Price, 1977), the df is only 2. On the other hand, an EFA model with five factors fitted to the 120-item Revised NEO Personality Inventory (NEO-PI-R; Costa & McCrae, 1992) has a df of 7,140 (Vedel et al., 2019).
Second, as the newest method for estimating RMSEA under nonnormal data, the Lai (2020) method was proposed and evaluated in the context of latent growth models only. As discussed in Lai (2020), the new method can be applied to any SEM model, however, the performance of the Lai method has yet to be evaluated and compared with other existing approaches in the context of other commonly used models (e.g., factor analysis models). Third, though multiple approaches have been proposed to generate nonnormal data when conducting simulation studies (Astivia, 2020), previous work has only considered singular data generation algorithms when evaluating the performance of RMSEA in nonnormal data. For example, Maydeu-Olivares et al. (2018) generated nonnormal data by discretizing continuous data into ordinal categories. Gao et al. (2020) generated nonnormal data by applying the procedure described in Vale and Maurelli (1983). Finally, Lai (2020) used a method developed by Foldnes and Olsson (2016) to generate nonnormal data. This focus is limiting, as researchers have indicated that the different data generation algorithms yield different sample characteristics and even different simulation conclusions (Astivia & Zumbo, 2015;Foldnes & Olsson, 2016). It is crucial to consider using more than one method when simulating nonnormal data to evaluate the robustness of these findings.
To fill these gaps in the literature, this article aims to understand the effect of model size on RMSEA under nonnormal data by conducting a more comprehensive simulation study that not only compares the performance of different methods in estimating point RMSEA and CIs under a variety of simulation conditions, but also considers the impact of data generation used. We manipulated a larger range of model size (i.e., df ¼ 2 to df ¼ 1,710) so that the findings can be generalized to more representative modeling situations. In addition, we used two different algorithms to generate nonnormal data to check the robustness of findings.
The remainder of our article is organized as follows. First, we review the different methods and formulae to compute the sample RMSEA and CI. Second, we present a simulation study that compares the performances of different methods to estimate RMSEA under nonnormal data, with a focus on understanding the effect of model size. Finally, we discuss the implications and provide recommendations for empirical researchers.

Formulas for Computing RMSEA
The population RMSEA is defined as (Browne & Cudeck, 1993;Steiger & Lind, 1980;Steiger, 1990): where F $ denotes the discrepancy between the population covariance matrix and the model-implied covariance matrix. It is noted that the population RMSEA depends on the estimation method employed. When maximum likelihood (ML) is used, the population RMSEA is defined as Assuming no mean structure, F^m l is obtained by minimizing R 0 and R are the model implied and the population covariance matrices, respectively. P denotes the number of observed variables in the fitted model. Given the sample data and assuming normality (Browne & Cudeck, 1993), the sample estimate of ML RMSEA is calculated as where T ml denotes the ML-based likelihood ratio test statistic. N is the sample size. The 90% CI for RMSEA is obtained via the following formula ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi where L and U are the respective lower and upper limits of the noncentral chi-square under consideration. Specifically, U and L are the solutions to F T T ml ; df , U ð Þ¼ 0:05 and F T T ml ; df , L ð Þ¼ 0:95 (6) F T ð:; df , kÞ denotes the non-central chi-square distribution with df and non-centrality parameter k (Browne & Cudeck, 1993).
Under nonnormal data, the naïve method to compute the sample RMSEA and its associated CI is simply to replace the normal theory-based chi-square test statistic (T ML ) in Equations (4) and (6) with a robust test statistic that either contains a mean-or mean-and variance-adjustment (T MLM, T MLR, or T MLMV ). An alternative method to obtain consistent point estimates and more accurate CIs for RMSEA under nonnormal data was proposed by Brosseau-Liard et al., 2012 (BLS; see also Savalei, 2018). The BLS method gives the sample RMSEA under nonnormal data as: where T is a robust chi-square test statistic (T ¼ T mlm , T mlr, or T mlmv ) and c denotes the associated scaling factor for the given test statistic (c ¼ c mlm , c mlr , or c mlmv ). The 90% BSL CI can be obtained as: ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi where L and U are the lower and upper limits obtained using the robust chi-square test statistics.
More recently, Lai (2020) proposed another method to obtain point estimates and CIs for RMSEA under nonnormal data. The Lai method computes the sample RMSEA and CI directly using the discrepancy function (F), instead of the chi-square test statistic (T). A consistent sample estimate of the RMSEA is given as , which corrects the bias of F^in finite samples. Two Wald CIs, CI lai1 and CI lai1 were also proposed under the normal distribution as SE 2 ; SE 1 and SE 2 are the asymptotic standard errors based on the first-order and second-order Taylor expansion. Z are the critical values under the normal curve with corresponding significance level a. See Lai (2020) for the more details in computingŝ 1 , SE 1 and SE 2 .

Monte Carlo Simulation
In the interest of simplicity, we conducted a Monte Carlo simulation study to understand the effect of model size on the RMSEA in the context of the confirmatory factor analysis model (CFA). Both correctly specified and misspecified model scenarios were considered. For correctly specified conditions, the population and fitted models were both a one-factor CFA model. Under model misspecification, the population data were generated under a CFA model with two factors, but a one-factor model was fitted to the data. The simulation conditions were created by manipulating the following four variables.

Model Size
The effect of model size was evaluated by manipulating the number of observed variables (p) in the population: p ¼ 4, p ¼ 10, p ¼ 30, and p ¼ 60. For conditions where the population model had two factors, the same number of observed variables were loaded on each factor. By fitting a one-factor model to the simulated data, the corresponding degrees of freedom for the fitted models were df ¼ 2, 35, 405, and 1,710, respectively. To better reflect what happens in real data analysis, we varied the population factor loadings (k) across items. Specifically, when p ¼ 4, the population factor loading values were [0.45, 0.55, 0.65, and 0.75]. For other model size conditions, the population factor loadings were repeated in sets of five (i.e., values of [0.4, 0.5, 0.6, 0.7, and 0.8] repeated two times for p ¼ 10, six times for p ¼ 30, and 12 times for p ¼ 60). The values of population factor loadings (k) were selected so that the average factor loadings across items k ¼ 0.60. The variances of the error terms were set to 1 À k 2 .

Level of Model Misspecification
Model misspecification was introduced by fitting a one-factor model to two-factor data. The size of model misspecification was manipulated by changing the inter-factor correlation (q) in the population model, with smaller interfactor correlations reflecting more severe model misfit. We examined two levels of model misspecification in this context: severe model misspecification (i.e., q ¼ 0.6) and minor model misspecification (i.e., q ¼ 0.9).
We recognize that the "size of model misspecification" can be defined using different univariate summary measures, such as the discrepancy function (F ML ), the value of RMSEA itself, or the power of the test statistic (e.g., T ML ) to reject the misspecified model (Savalei, 2012). In this study, the level of model misspecification is defined as the size of the omitted parameter (i.e., q). This perspective of the "size of misspecification" has been adopted by many previous studies (e.g., Saris et al., 2009;Shi et al., 2019;Steiger, 2000), and is arguably more intuitive for applied researchers to interpret (Savalei, 2012). For example, in the context of this study, many applied researchers would agree that a unidimensional model can be fitted to a two-factor data with an inter-factor correlation q ¼ 0.95. However, fitting a one-factor model to two-factor data with q ¼ 0.60 is generally considered unacceptable.

Data Distribution
We considered both normally and nonnormally distributed data across five conditions. For the first condition, we generated data from multivariate normal distributions. Under nonnormal data conditions, given that the findings may be confounded by the choice of data generation method, we generated nonnormal data using two different methods. The first nonnormal data generation algorithm considered was a method based on third-order polynomial transformations introduced by Vale and Maurelli (VM, 1983). Under the VM method, we specified the population values of univariate skewness and (excess) kurtosis for each observed variable. The second data generation algorithm considered was a method using copulas (Copula, Mair et al., 2012). Under the Copula method, one may expect that the univariate distributions for each variable are still normal, but multivariate normality can be introduced through a copula function.
The data generation methods and parameter values for nonnormality were selected based on previous studies (Falk, 2018;Nevitt & Hancock, 2001). Specifically, two levels of nonnormality (i.e., moderate and severe) were generated under each method but required different parameter specifications given differences across the algorithms. For VM, two sets of population skewness and (excess) kurtosis values were examined: skewness ¼ 2, kurtosis ¼ 7 (moderate); skewness ¼ 2, kurtosis ¼ 15 (severe). For the Copula method, the marginal distributions for each variable were set to be normal, and nonnormal data were generated under multivariate Gumbel copulas with dependency parameters k ¼ 2 (minor) and 10 (severe). Though we used the same labels (i.e., moderate vs. severe) to describe the levels of nonnormality across the two data generation methods for purposes of convenience, but levels of nonnormality with the same label (e.g., minor) are not directly comparable across the VM and Copula methods.
We calculated population eigenvalues of the UC matrix using the method introduced by Foldnes and Grønneberg (2018). As expected, both means and variances of the eigenvalues increased as the levels of non-normality increased. Boxplots of these values are provided in the Supplementary materials.

Sample Size
We examined a range of sample sizes to reflect a range of small to large samples typically used in psychological studies. Sample sizes (N) examined include 100, 200, 500, and 1000.
Fully crossing the manipulated simulation factors yielded 240 unique conditions for evaluation (i.e., 4 model size levels Â3 levels of model misspecification Â5 data distributions Â4 sample size levels). Each of these conditions was replicated 2,000 times, yielding a total of 480,000 datasets for analysis. ML estimation was used to fit all models in each simulated dataset. The sample RMSEA and 90% CIs were computed using the normal theory method, the BSL method, and the Lai method. The naïve method results were not included, as the naïve estimator has been shown to be inconsistent under nonnormal data (Savalei, 2018). Under BSL, we only reported the results obtained using the meanand variance-corrected test statistic (T MV ) 4 , which has been shown to perform better than other robust variants (Gao et al., 2020).
To evaluate and compare the performance of the different methods across the simulation conditions, we examined both point and interval estimates as outcome variables. For point estimates, we computed the average sample RMSEA values and the empirical standard deviations of the sample RMSEAs across replications. For interval estimates, we calculated coverage rates of the CIs (defined as the percentage of CIs across replications that contain the true RMSEA value). All data generation and analyses were conducted using R Core Team (2019) using the following packages: Mnonr (Qu et al., 2020), Copula (Hofert et al., 2020), and Simsem (Pornprasertmanit et al., 2016).

Results
The admissible solution rates were above 99% for most simulation conditions with an average rate of 99.43%. Admissible solution rates across all the simulated conditions are provided in the Supplementary materials. Low admissible solution rates (e.g., the lowest rate of 81.85%) often occurred when the sample size was small, the sample size was small, and the level of misspecification was severe (e.g., df ¼2, N ¼ 100, q ¼ 0.60). Replications that showed convergence problems or improper solutions were noted and eliminated when computing the outcomes.

Population RMSEA
The population RMSEA values across simulated conditions are reported in Table 1. Not surprisingly, the population RMSEA increased as the levels of misspecification became more severe. Given the same level of model misfit, the population RMSEA decreased as model size increased. For example, when fitting a one-factor model to two-factor data with q ¼ 0.6, the population RMSEA values dropped from 0.118 to 0.059 as the number of observed variables increased from 4 to 60 (i.e., df increased from 2 to 1,710).

Sample RMSEA
For each dataset, the sample RMSEA values were computed using the normal theory method, the BSL MV method, and the Lai method. Table 1 summarizes the average sample RSMEAs and the empirical standard deviations of the sample RMSEAs using all three methods under normally distributed data. Under the nonnormal conditions, the results from the data generated using the VM and Copula methods are reported in Tables 2 and 3, respectively. To compare the point estimates across different methods, for each simulated condition, we highlighted the average sample RMSEA value that is the closest to the population RMSEA.
As shown in Table 1, the average sample RMSEAs estimated using all three methods are very similar across all simulated conditions. In general, the bias in the sample RMSEA 5 became larger as the model size became very small or very large, especially when the sample size is small. For example, when fitting correctly specified models with N ¼ 100, the average sample RMSEA based on normal theory dropped from 0.033 to 0.026 as df increased from 2 to 35, and then increased from 0.026 to 0.058 as df continued increasing from 35 to 1,710.
As shown in Tables 2 and 3, the behavior of the sample RMSEAs was very similar across the two data generation methods under nonnormal data conditions. Therefore, we only focus on the results when nonnormal data were generated using the VM method (i.e., Table 2). With nonnormal data, the sample RMSEAs using the normal theory method were generally overestimated, yielding average sample values larger than the population value. The bias of the normal theory-based RMSEA became more pronounced as model size increased and the level of nonnormality became more severe. For example, for N ¼ 200, when fitting a very small, correctly specified model with moderate nonnormality, the average sample RMSEA using the normal theory method was 0.036. Keeping all other factors the same, the average sample RMSEA increased to 0.061as df increased to 1,710 under severe nonnormality.
In the presence of nonnormal data, the BSL MV method and Lai method both generated sample RMSEA values closer to the population value than those estimated based on the normal theory method. In addition, the BSL MV method generally provided more accurate RMSEA point estimates than the Lai method. For example, when fitting small CFA models (df ¼ 2) with minor misspecification, severe nonormal data, and N ¼ 200, the average sample RSMEA using the BSL MV method and the Lai method was 0.037 and 0.043, respectively. The difference in terms of the average sample estimates between the BSL MV method and the Lai method was smaller as N increased. As the sample size reached 1,000, the difference became negligible.
The behaviors of the empirical standard deviations of sample RMSEA using all three methods were very similar across all simulated conditions, regardless of the data distribution. Not surprisingly, the empirical standard deviations increased as the sample size decreased. Holding sample size constant, the empirical standard deviations also increased as the model size decreased. For example, as shown in Table 2, under correctly specified models, for N ¼ 200 and a moderate level of nonnormality, the empirical standard deviations of the BSL MV RMSEA increased from 0.004 to 0.043 as df decreased from 1,710 to 2.     Table 3. Population RMSEA and average of RMSEA estimated across replications with nonnormal data generated by Gumbel copula.
Correctly specified model (q

Interval Estimates
The 90% CIs for RMSEA were computed under the normal theory, the BLS MV , and the Lai methods. It is noted that two analytic CIs were derived in Lai (2020): one based on the standard errors obtained using the first-order delta method (referred to as "Lai 1 " hereafter), and the other using the standard errors obtained using the second-order Taylor expansion (referred to as "Lai 2 " hereafter) 6 . For all four CIs, the coverage rates (i.e., percentage of CIs that included the true parameter, CRs) of the 90% CIs around the population RMSEA were calculated. We first focus on the CIs under normality conditions. Table 4 summarizes the CRs of the 90% CIs estimated using the four different methods. With a nominal level of 90% (0.90), CRs between 0.85 and 0.95 were considered acceptable (Gao et al., 2020) and those cases are highlighted in boldface. To compare the CRs obtained from the four different CIs, Figure 1 presents the distribution boxplots for CRs across CI methods and model size. As shown in Table  4 and Figure 1, under normal data, the normal theory CIs generally yielded acceptable CRs when the number of observed variables was p ¼ 4 or 10 (dfs ranged from 2 to 35). Under the same small to medium model size conditions, the performance of the BLS MV CIs was comparable to the normal theory CIs, whereas the CRs computed using the Lai method could be noticeably larger than the nominal level. For example, for N ¼ 200 and df ¼ 35, with a minor level of model misspecification, the CRs for the 90% CIs computed using the normal theory method, BLS MV , Lai 1 and Lai 2 were 0.89, 0.92, 0.99, and 0.98, respectively.
As model size increased, the CRs for the normal theory CIs noticeably decreased, especially when the sample size was small. Though the BSL and Lai methods were proposed to handle nonnormal data, the results showed that the CIs estimated using BLS MV and Lai outperformed those from the normal theory method with normal data, when a large model was fitted in small to medium samples (e.g., N 500). When N 500 and p ! 30, using either the BLS MV or the Lai methods with normal data yielded better CRs than those generated using the normal theory method. In general, the performance among BLS MV , Lai 1 , and Lai 2 were comparable in large models, and Lai 1 CIs yielded the best performance when the model size was very large. For example, under the same condition described in the above paragraph (q ¼ 0.9, N ¼ 200), the CR for the normal theory CIs dropped to 0.76 when df increased to 405; however, the CRs for BLS MV , Lai 1 , and Lai 2 were all approximately 0.95. As df increased to 1,710, the CRs for CIs estimated using the normal theory method, BLS MV , Lai 1 , and Lai 2 were 0.80, 0.88, and 0.58, respectively.
The CRs for the 90% CIs for the nonnormal data conditions generated using the VM and Copula methods are summarized in Tables 5 and 6, respectively. Cases with CRs between 0.85 and 0.95 are highlighted in the tables. The patterns of CRs obtained from the two nonnormal data generation methods were similar. Therefore, we focus on interpreting the results under nonnormal data generated using the VM method (i.e., Table 5). In addition, we plotted the distribution boxplots for CRs across different CI methods against various model sizes under nonnormal data. Results indicated that in these conditions, the normal theory CIs only yielded acceptable CRs when the model size was very small. The normal theory CIs performed poorly under all other conditions.
In comparison to the normal theory CIs, the CIs computed using BLS MV and Lai yield better CRs. For example, as shown in Table 5, under moderate nonnormal data, for q ¼ 0.9, N ¼ 100, and df ¼ 35, the CRs for CIs using the normal theory method, BLS MV , Lai 1 , and Lai 2 were 0.48, 0.92, 0.97, and 0.96, respectively. For both BLS MV and Lai methods, the CRs performed worse as sample size decreased, the level of nonnormality increased, and the model size increased. For example, when fitting minorly misspecified models, for df ¼ 35, N ¼ 1,000 and moderate level of nonnormality, the CRs for BLS MV CI was 0.92; The CR dropped to 0.20 as df increased to 405, the sample size (N) decreased to 100 and the level of nonnormality became severe. Generally speaking, the Lai 1 CIs yielded the best performance when fitting large CFA models (e.g., p ! 30, or df ! 405) under nonnormal data. The BLS MV CI could only yield acceptable CRs for large models with p ¼ 30 when the sample size N reached 200 (under moderate nonnormality), or 500 (under severe nonnormality). However, the CRs for BLS MV tended to be noticeably lower than the nominal level as p increased to 60 (df ¼ 1710). Under very large models, the Lai 1 CIs performed better by yielding acceptance CRs when sample size N reached 500 (under moderate nonnormality), or 1,000 (under severe nonnormality).
To better understand the behaviors of the CIs, for each simulation condition, we also obtained information on the direction in which the CIs missed the population RMSEA value. More specifically, we calculated the probabilities of missing the true population parameter from the above (i.e., the true RMSEA is above the upper bound) and below (i.e., the true RMSEA is below the lower bound) the CIs. Results are provided in Tables C1ÀC6 in the Supplementary materials. Results showed that across all the methods, when the 90% CIs missed the true population value, it is more likely that the true RMSEAs were below the lower bounds of the CIs, suggesting that the models fit better than they actually do.

Discussion and Conclusions
This study explored the performance of the RMSEA under nonnormal data. The performance of the sample RMSEAs and associated CIs (estimated using three different methods) were compared across various model sizes, sample sizes, levels of misspecification, and levels of nonnormality. The major findings are summarized below. At the population level, results indicated that under misspecified models, the population RMSEA values decreased as the model size (df) increased. This finding is consistent with previous studies (Kenny & McCoach, 2003;Savalei, 2012;Shi et al., 2019). 6 Lai (2020) referred the two analytical CIs as the W1 CI and the W2 CI.
Since the df is in the denominator of the formula defining the population RMSEA, not surprisingly, larger dfs are associated with smaller population RMSEA values.
At the sample level, under normal data, our findings are also in line with previous literature (Kenny & McCoach, 2003;Kenny et al., 2015;Shi et al. 2019Shi et al. , 2022. The sample RMSEA using the normal theory method was close to population values, the bias of sample RMSEA increased as the model size (df) became very small (e.g., df ¼ 2) or very large (e.g., df ¼ 1,710), especially when the sample size was small. Finally, the empirical standard deviation of the normal theory RMSEA increased as model size (df) decreased.
In terms of interval estimates, the normal theory CIs yielded acceptable CRs under normal data conditions except when the model size was large. In addition, results indicated that the BSL MV and Lai methods showed satisfactory performance with normal data, especially with large modelsthe BSL MV and Lai methods yielded more CIs compared to those obtained from the normal theory method. An explanation for why the BSL MV method outperformed the normal theory method under large models with normal data can be found in the fact that both the normal theory method and the BSL MV method include the chi-square test statistic in their formulation. Previous studies have shown that when fitting large CFA models with normal data, the robust chisquare test statistics performed better than the chi-square test statistics used under the normal theory method (Gao et al., 2020;Maydeu-Olivares, 2017).
Our results expand the understanding of the behavior of RMSEA under nonnormal data. First, the normal theory method only yielded unbiased sample RMSEA and accurate CIs when the model size is very small under the nonnormal conditions. Second, both BSL MV and Lai methods yielded less biased sample RMSEA than those obtained using the normal theory method in the presence of nonnormal data. The BSL MV method generally produced the most accurate sample RMSEAs; the difference in terms of the sample RMSEAs between the BSL MV method and the Lai method became less pronounced as the sample size increased and the model size increased. Third, the standard deviations for sample RMSEAs using both the BSL MV and Lai methods increased as the model size decreased, indicating that there were higher levels of uncertainties in terms of the parameter estimates. Finally, both BSL MV and Lai methods yielded more accurate CIs than those from the normal theory method under nonnormal data. The CRs for both BSL MV and Lai methods dropped as the model size and the sample size decreased. The Lai 1 CIs yielded the best performance when fitting very large CFA models (e.g., p ¼ 60 or df ¼ 1,710) with small to medium sample sizes (N 1,000).
It is noteworthy that the Lai method was proposed under latent growth models. One key difference between the Lai and BSL MV methods is that the Lai method does not rely on the assumption of equal eigenvalues (Lai, 2020). Under latent growth models, the equal eigenvalue assumption is more likely to be violated. Therefore, Lai (2020) found that the Lai CIs overperformed the BSL MV CIs under latent growth models. In this study, we showed that the Lai method also performed well under the commonly used CFA model. Lai (2020) also noted that under covariance structure analysis, it is more reasonable to ignore the requirement of equal eigenvalues. As a result, we also observed a satisfactory performance for RMSEAs under the BSL MV method. Future studies should consider further comparing different RMSEA methods to verify the research findings under a larger variety of SEM models (e.g., models with mean structures, path analysis, etc.).
It is worth noting that there are other limitations to this study. First, although we used two different data generation methods, there are many other cases to consider under nonnormal data. For example, for the copula method, we only included conditions where the univariate distribution for each variable was normal. In practice, nonnormality may also be observed in the marginal distributions. Second, only  (2020).   one type of model misspecification (i.e., misspecified dimensionality) was investigated in this study. To address the above two major limitations, we conducted additional simulations to verify our findings under 1) more severely nonnormal cases (i.e., Gumbel copula with a chi-square [df ¼3] marginal distribution), and 2) other types of model misspecification (i.e., two-factor CFA models with omitted crossloadings). The additional simulations are presented in the Supplementary materials for interested readers. The results (Tables D1 and E1) showed that the major findings from this study still hold under the additional simulation conditions. Further studies are expected to explore the performance of RMSEA and verify the findings using more comprehensive simulation design. Finally, other approaches have been proposed to obtain the test statistics under nonnormal data [e.g., the asymptotically distribution-free, or the ADF estimator (Browne, 1984), the modified ADF estimator (Chun et al., 2018)]. It would be interesting for future methodological studies to develop new theories for computing RSMEA under nonnormal data with other robust test statistics. The results of this study have implications for empirical studies using factor analysis models. When fitting CFA models with normal data, the (average) sample RMSEA estimated using the normal theory method is generally close to the population value. However, the normal theory CIs only yielded acceptable coverage rates for small models. The BSL MV or Lai methods yielded more accurate CIs and therefore are recommended when fitting large CFA models with normal data. The normal theory RMSEA should not be used under nonnormal data unless the model size is very small. When fitting CFA models with nonnormal data, applied researchers should consider using either the BSL MV or the Lai method to estimate RMSEA. The Lai 1 method is recommended when very large models are fit under nonnormal data. We hope that the results from this study could provide useful information for applied researchers when evaluating CFA models with nonnormal data.