Where Did I Go Wrong with My Model? Ten Tips for Getting Results in SEM

Abstract Those new to structural equation modeling (SEM) often encounter problems with convergence, software errors, or poor fitting models. Some basic suggestions are offered to help reduce these problems and avoid some common error messages encountered when first testing models. The tips concern initial data management steps, specification of latent variables, avoidance of factor identification problems, special precautions when estimating means, addressing poor measurement model fit, using multistep testing, fitting full structural models, attentiveness to interpretation pitfalls, and troubleshooting software warnings and errors. The advice provided is intended to help save time, effort, and trials and tribulations often encountered by beginner modelers.


Introduction
The process of testing structural equation models comes with a variety of considerations that other types of analyses, such as regression analysis, do not possess, making these analyses appreciably more complicated to conceptualize, implement, and interpret. The purpose of this paper is to help beginners navigate through some of the likely difficulties encountered in the modeling process and to provide a guide to useful additional resources for beginners and more experienced users alike. The aim, however, differs from other valuable articles that provide introductions to structural equation modeling (e.g., Lei & Wu, 2007;Ullman, 2006), address statistical issues (e.g., Chen et al., 2001;Kenny & Milan, 2012;Newman, 2014), or identify good or bad uses of the approach (Kline, 2016;Chapter 18;McCoach et al., 2007;Mueller & Hancock, 2008). Accordingly, we assume basic familiarity with the structural equation modeling framework and concepts, including the ideas of mediation and path analysis, latent variables, and model fit. The hope is that this paper will be a resource that will help fill in some of the knowledge gaps that newer modelers face and perhaps help them avoid some common and unnecessary frustrations that many experience when just starting out with SEM.
In several places, we refer to a research example drawn from the baseline data from the Later Life Study of Social Exchanges (LLSSE; Newsom et al., 2005;Sorkin & Rook, 2004) with three questions (observed indicators) of emotional support ("do things that were kind or considerate toward you," "cheer you up or help you feel better," "discuss personal matters or concerns"), three questions of instrumental support ("do favors for you," "provide you with aid and assistance," and "help you with an important task"), a depression measure (seven items from the Center for Epidemiological Studies -Depression scale, e.g., "I felt depressed"; CES-D; Radloff, 1977), and six items from a measure of positive affect (e.g., "that you were enjoying yourself"; Watson et al., 1988). Illustrations, complete with R code, as well as suggested further reading associated with each tip, can be found in the online supplemental materials.

Tip 1: Preventive Data Steps
To avoid complications or erroneous conclusions during the modeling process, some initial, preventive steps should be routine (Schumacker & Lomax, 2016). Basic data cleaning strategies, such as reclassifying "don't know" or "refused" values as missing, locating out of range values, reverse scoring appropriate items, identifying highly redundant variables, and removing duplicate cases, can be overlooked but should not be taken for granted. Detecting, correcting, or possibly removing outliers is another necessary step, as such cases can be a source of nonnormality and potentially misleading results, especially with smaller sample sizes (Dixon, 1953).
Reading data into some SEM software programs can be a fairly elaborate process requiring attention to several details. Data errors can occur, sometimes without easy detection, because values have been incorrectly read by the package. If the data are in text form, comma, or tab delimited files (as is required in Mplus or is often the case in R), one should ensure that the list of variables is complete and specified in the correct order in the SEM syntax. To simplify the data input process, it often may be advisable to eliminate unneeded variables when constructing a data file. Descriptive statistics obtained from the SEM program should be matched exactly and completely to those obtained from the general data analysis package using the original source data set (e.g., SPSS, SAS, Stata).
Prior to beginning the modeling process, variable distributions should be inspected. The maximum likelihood (ML) estimator for continuous variables used in structural equation modeling assumes multivariate normality (see Bollen, 1989, Appendix 4A). Nonnormality is detrimental to fit and, without appropriate estimation to account for it, misguided decisions may be made about model modifications or erroneous conclusions may be reached about the validity of a model (Curran et al., 1996). Univariately normal distributions do not guarantee multivariate normality, so examining multivariate skewness and kurtosis, as assessed by Mardia's coefficients or similar indices, can be informative (DeCarlo, 1997). Although it is important to be knowledgeable about the distributions of the variables involved, routine use of robust estimates, such as those provided by Satorra-Bentler scaled chi-square and corrected standard errors (Satorra & Bentler, 1988), is sometimes recommended given that there is a sufficient sample size (approximately 250 or more). Robust adjustments will produce fit and standard errors equivalent to ML when data are normally distributed (Curran et al., 1996;Finney & DiStefano, 2013), so there is little harm in implementing them and potentially serious harm in not implementing them in most cases.
Before beginning, modelers should make sure that the sample size is sufficient for the particular model. The type of variables that will be analyzed, and, consequently the type of estimator used, will have implications for the sample size required (see Tip 8 for more on estimators). Continuous and multivariate normal variables will require fewer cases generally than nonnormal or noncontinuous variables. Minimum sample sizes recommended by authors vary, often from 100 to over 500 depending on the complexity of the model, type of estimation method used, and presence of missing values (Finney & DiStefano, 2013;Wolf et al., 2013). Minimum sample size recommendations are typically concerned with avoiding estimation problems and ensuring adequate power for model fit, but power analysis is invaluable for estimating whether there will be sufficient power for assessing model fit or testing parameters for significance.
Another factor to consider in the analysis planning and data preparation stage is missing data. An initial step if missing values are present is to be certain that values or symbols in the data set are correctly identified in the software program as missing. If missing value codes are inadvertently counted as valid values, error messages may occur, or erroneous results using these values still might be printed but should not be trusted. Many software programs will use full information maximum likelihood (FIML) by default (e.g., Amos: Arbuckle, 2014;Mplus: Muth en & Muth en, 1998-2017lavaan: Rosseel, 2012), an approach that assumes data are at least missing at random (MAR).
MAR means that values of the variables in the data set can be related to a probability of missingness on a variable, but the values of the variable with missing values (if they were all known) cannot be related to the probability of missingness on that variable. Missing completely at random (MCAR), a stricter standard than MAR, means that neither values of the variables with missing values (if they were all known) nor values of other variables in the data set can be related to the probability of missingness. There is typically no way to know for certain, however, that the MAR assumption has been violated or met (Schafer & Graham, 2002), but exploring missing data patterns and understanding potential reasons for missingness can provide valuable context when interpreting model results. If there are many missing data patterns, with none substantially more common than others, it may signal that there are a variety of reasons for the missing values, potentially signaling less reason for concern (Little & Rubin, 2020). Valuable information can be gained, for example, if it is discovered that initial values of health in a longitudinal study are related to dropout in a later wave of data collection suggesting violation of the MAR assumption. Variables potentially associated with the probability of missingness are known as auxiliary variables, and their inclusion in the FIML analysis (see Enders, 2022) may increase the likelihood of meeting the MAR assumption and improve estimates even when it has not been met.

Tip 2: Making a Good Latent Variable
Good latent variables include observed indicators that theoretically reflect the same underlying construct and are sufficiently intercorrelated empirically. Generally, latent variables need at least three indicators to be identified, assuming uncorrelated measurement residuals 1 . If indicators are not empirically related, they will not work well as latent variables (Bentler & Chou, 1988). It is difficult to state what constitutes a sufficient magnitude for inter-item correlations, but this is generally assessed with standardized loadings for items (see Comrey & Lee, 1992, for one such set of recommended cutoffs) and confirmatory factor model fit. Indicators may consist of a set of attitudinal scale questions, test items, or physical or observational measures that are measures of the same underlying construct and do not have to be restricted to continuous variable types or items from the same scale. Indicators can instead involve items from different sources as well as different variable types, provided appropriate estimation methods are used and as long as they are highly correlated and assess the same underlying construct (Bentler & Chou, 1988;Wheaton, 1987).
Latent variables have several important purposes. They allow the researcher to estimate and correct for measurement error in predictive relationships as well as allow researchers to assess underlying constructs that are not or cannot be directly observed or measured (Bollen, 2002). The notion is to capture a unified psychological, social, or other concept defined by a set of directly, but imperfectly measured, indicators. By far, the most common way that researchers tend to define a latent variable is by assuming indicators are predicted by the latent variable, consistent with general principles of classical test theory (Miller, 1995). These indicators are sometimes known as reflective indicators (Bollen & Lennox, 1991). For example, for the latent variable, emotional support, responses on the three emotional support questions are hypothesized to be caused by the latent variable (see Figure 1(a)).
Alternatively, although far less common, researchers occasionally define a latent variable by specifying that the indicators predict the latent variable. These indicators are known as causal (or sometimes formative) indicators (see Figure 1(b)). For example, socioeconomic status variables, such as income, education, and job prestige are not always highly correlated but the researcher may wish to examine their combined effect on other variables anyway. Causal indicator variables can be difficult to estimate without carefully placed constraints. Similar to reflective indicators, causal indicators are usually considered conceptually related though they may not require high intercorrelations. Causal indicator models, however, can be difficult to estimate without carefully placed constraints. A theoretical approach is usually recommended when deciding whether indicators are treated as reflective or causal (Edwards & Bagozzi, 2000;Markus & Borsboom, 2013). A distinction also can be made between causal and composite (or, often "formative") indicators (e.g., Bollen & Bauldry, 2011), where causal indicators are predictors of the latent variable with some unaccounted for variance and composite indicators accounting for all of the variance in the latent variable.

Tip 3: Factor Identification
One area where modelers can encounter difficulties is in setting scaling constraints for a latent variable, a central aspect of factor identification. Without proper constraints on the scaling of the variance of the latent variable, convergence problems (see Tip 4 for errors) will be encountered (Boomsma, 1985;Chen et al., 2001). Nonnegative degrees of freedom indicate that the model is theoretically identified, which occurs when the number of unknown elements to be estimated is equal to or fewer than the number of known elements in the covariance matrix (Raykov & Marcoulides, 2006). This is a necessary but not necessarily sufficient condition, and, in practice, determining that all portions of the model are identified can be more difficult in complex cases (Kenny & Milan, 2012). Insufficient information in a portion of the model may cause estimation difficulties or convergence problems even when the degrees of freedom are nonnegative (often referred to as "empirical underidentification"; Kenny, 1979).
A latent variable's mean or variance cannot be estimated without borrowing information from its observed indicators. Scaling constraints are needed to ensure that the variance (and mean if included) for the latent variable is defined (Kline, 2016; see also Tip 5 for more on identification of latent variable means). Many programs have default scaling while others may require scaling to be specified. For the factor variance, the most common approaches to scaling include either setting the first loading equal to one (known as the referent or marker variable approach) or setting the variance of the factor equal to one. The latter approach is only possible for exogenous variables-those not predicted by any variables in the model. Another approach to factor scaling is effects coding, which requires model constraint features from the software to fix the factor loadings to an average of 1 within each factor (see Little et al., 2006 for description). Variance scaling should only be set in one way, however, as setting the scale in multiple ways (e.g., setting both the factor variance and the loading as a referent variable) can lead to problems with convergence and alter the meaning of the results. All of these approaches will result in equivalent standardized parameter estimates and statistical tests, but the unstandardized loadings will differ among them (for accessible introductions to model identification, see Kline, 2016, chapters 6 and7, andKenny &Milan, 2012; for more detailed discussions, see Davis, 2009, andRigdon, 1995).

Tip 4: Warnings and Errors
Warnings and errors that users may encounter in some software programs are common for anyone new to SEM. Some warnings can be trivial-such as a warning about a certain number of cases being excluded because they have missing values on predictor variables-and can be ignored if the modeler is certain that they do not impact results. In most cases, however, it is important not to ignore these errors or warnings, or trust the output, as it may be incorrect. Warnings or errors regarding convergence failures or negative variances most commonly reflect problems with estimation of the model, which we refer to as improper solutions. Improper solutions tend to occur with smaller sample sizes, fewer structural constraints, and misspecified models (Chen et al., 2001;Gagne & Hancock, 2006;Gerbing & Anderson, 1987). Nearly all improper solutions are a consequence of problems with model identification, the basic principles of which need to be mastered. These warning or error messages should not be ignored. If results are printed, they may be partly or wholly erroneous.
Warning and error messages are inconsistent across software packages, so it is difficult to provide an exhaustive list.  Some general statements can be made, however. Empirical underidentification is a common problem that can occur due to scaling problems, insufficient information (e.g., too few indicators), high multicollinearity, or low factor loadings, and tends to occur because of a portion of the model rather than the model overall (Bentler & Chou, 1988;Ciesla et al., 2007;Kenny & Milan, 2012;Rindskopf, 1984). Certain error messages are clearly indicative of identification problems, including "negative error variance" or "negative thetaepsilon" (negative measurement residual variance), "nonconvergence" or "failure to converge," or "negative psi matrix" (negative latent variable variance). Nonconvergence is one of the most common errors in SEM and occurs because the model is misspecified or has serious logical problems, such as bidirectional relationships, loops, or nonsensical commands. Empirical underidentification, failing to set a scaling constraint, or low factor loadings are the most common causes (Bentler & Chou, 1988;Boomsma, 1985;Chen et al., 2001;Cole et al., 2007). Nonconvergence is also more likely to occur with small sample sizes or if there are problems with the data, such as data reading errors, invalid values, or extreme outliers (Boomsma, 1985). Although a clear statement may be reported in the output, such as "No Convergence" or "Model did not converge. Check your model," not all convergence issues are accompanied by error messages that explicitly mention convergence. More cryptic messages may suggest that there is a "nonpositive definite matrix" or that the "determinant is zero" or that the "rank is less than full." A nonpositive definite matrix error indicates that one of the model implied matrices involving error variances contains invalid zero or negative values, which most typically occurs due to an insufficient number of indicators, empirical underidentification, or estimation difficulties because there is a high proportion of missing values (Newman, 2014;Newsom, 2015). 2 In response to estimation problems, the software program may indicate that parameter values in the model were set automatically (e.g., a negative error variance has been set to 0). These values may often be unreasonable and signal another underlying problem with the model. Zero determinant or rank deficiency errors indicate a similar problem and can also occur because of empirical underidentification (Geweke & Singleton, 1980;Lopes & West, 2004) or high multicollinearity (Kenny & Milan, 2012;Lazaridis, 2007).
Finally, even without warning or error messages, it is important to closely examine the output for other potentially problematic issues. Heywood cases, a term referring to estimates that are out of bounds (e.g., negative error variance or standardized coefficients over 1.0; Boomsma, 1985;Chen et al., 2001), 3 may occur with or without an error message or warning, but always warrant attention. Heywood cases tend to occur because of an insufficient number of indicators or poor loadings (Bentler & Chou, 1988;Rindskopf, 1984), underidentification stemming from a failure to properly set scaling constraints for a latent variable, or because too many correlated measurement residuals have been added to a specified latent variable (Chen et al., 2001). Attempts to resolve such problems are sometimes made (either by the user or automatically by the software) by setting error variances to zero or a small positive value. Doing so may result in severely biased estimates and is not a valid substitution for an empirically or theoretically sound model (Chen et al., 2001).

Tip 5: Special Issues When Estimating Means
For many models, latent variable means may not need to be estimated as they are not of substantive interest to the research questions. When research questions only concern predictive relationships, means may not be necessary. For other models, such as latent growth curve models or multigroup models (Schroeders & Gnambs, 2018), factor means and intercepts are integral to the research question being investigated (e.g., Chou et al., 1998).
Means also must be estimated whenever FIML for missing data is used, and failure to ensure proper constraints for identification of factor means can lead to problems. Latent variable means are a product of the indicator variable means, the loadings, and the measurement intercepts , and, therefore, similar scaling constraints are needed to identify them. Identification of the factor mean requires that there be no more values estimated than the number of observed means provided. Failure to identify the latent variable mean with proper constraints is a common source of model nonconvergence, because there are insufficient degrees of freedom. For a latent variable with four indicators, there are four observed means, but the latent variable mean requires a fifth unknown parameter when there are only four known values available in the data. As with scaling constraints for the factor variance, there are several methods of constraint to identify the latent variable mean (see Newsom 2015, Chapter 1, for a discussion), but most commonly, one of the indicators is used as a referent by setting the measurement intercept to 0, as demonstrated in the illustration of Tip 5 in the online supplemental materials. For a clear interpretation, the indicator should be the same indicator used to scale the latent variable variance by setting the loading equal to 1. With this method, the latent variable mean is based on the observed mean used as the referent. Often, at least one measurement intercept is set to 0 in order to have an identified model. Different methods of constraint will have no effect on model fit or statistical tests, but there may be implications for interpretation of latent variable mean estimates or some types of statistical tests (Kang & Hancock, 2017).

Tip 6: Why Doesn't My Measurement Model Fit?
Measurement model fit informs the degree to which the covariances implied by the hypothesized model correspond to the observed data (i.e., the sample covariance matrix).
There are a number of potential sources of measurement model misfit, including, but not limited to, low correlations among indicators, cross-loadings, model misspecification, nonnormal distributions, and measurement artifacts (Bentler & Chou, 1988;West et al., 2012). Since misfit in the measurement model will carry forward to the structural model, it is important to investigate the fit of a measurement model before proceeding to the structural model.
The fit of the measurement model can be impacted when the factor structure is specified incorrectly (Bandalos, 2018). If items that do not belong together are included in the same factor, poor fit may occur, as demonstrated for this tip in the online supplemental materials. Fit also is negatively affected when the measurement residuals for the indicators are correlated with one another (often referred to simply as "correlated errors") but are not included in the model specification. Correlated measurement residuals represent an additional association not already accounted for by the factor and may arise when two or more items have a similar phrasing ("I am often anxious when testing structural models" and "I am often anxious when analyzing my data") or are keyed in the opposite direction of other items in the scale (i.e., negatively worded items). When model fit is close to acceptable, adding specifications that model these correlated errors will improve the fit of the model to a degree, but post hoc model modifications can capitalize on chance and increase risk of Type I error (MacCallum, 1986). In some instances, a good solution may be explicitly including a method factor in the model instead (Wothke, 1996). Correlated measurement residuals among several items also may indicate that additional underlying factors are appropriate and that the factor structure should be reexamined.
Another reason for lack of fit of a model is that the estimator is inappropriate for the types of measured dependent variables in the model. ML estimation should be used for continuous normal variables (Anderson & Gerbing, 1988;Jackson, 2001), MLM estimation (Hu & Bentler, 1999) should be used for continuous nonnormal variables with no missing data, and MLR estimation  should be used for continuous nonnormal variables. When variables are binary or ordinal (typically less than five categories), categorical estimation methods can be used, such as robust marginal maximum likelihood or weighted least squares with mean and variance adjustment (WLSMV; Bandalos, 2014;Finney & Distefano, 2013).

Tip 7: Following a Multistep Testing Process
Full structural models that include latent variables and predictive relationships may have poor fit for two reasons. One reason is that the measurement (latent variable) part of the model does not fit well (see Tip 6), where the inadequate fit of the overall model is misinterpreted as an indication that the causal structure is incorrect. For this reason, most authors recommend that a multistep process be used to correctly and more easily identify the sources of model misfit prior to testing the full structural model. Although there are different recommendations on how many and exactly which steps should be taken in building one's measurement and structural models (see Anderson & Gerbing, 1988;Hayduk & Glaser, 2000;Rosseel & Loh, 2021;Schumacker & Lomax, 2016 for reviews of specific multistep testing processes), the general recommendation is that the measurement portion of the model be tested prior to including predictive relationships in the structural model. Assuming the measurement model has been tested and fits well, a second reason for poor model fit may be that there are one or more misspecified paths in the structural model. Models are fit by comparing the matrix of relationships implied in the causal model to the matrix of relationships among the observed variables (Bollen & Pearl, 2013). Thus, a path (either directional or a bidirectional correlation) between two variables that is not specified to be freely estimated or is not estimated by default is implied to be 0, indicating no relationship between the variables, which can have meaningful impacts on the overall fit of the model if a relationship between variables actually exists in the data.
Even though a separate step of testing the measurement portions of the model may have been used and the factors each fit the data well, there may still be a lack of fit when testing a full structural model that involves latent variables. For example, latent variables for emotional support and positive affect in Figure 2 have indicators related to feeling cheerful (e.g., "cheer you up or help you feel better" and "That you were enjoying yourself"). Model fit may be poor in this case because there is a remaining association between the variables related to feeling cheerful that has not been accounted for by the association between latent variables. This residual correlation would not have been detected when the latent variables were tested separately. Correlated measurement residuals may indicate things as trivial as similar wording between items or method effects (Saris & Aalberts, 2003)  overall model fit and interpretation of the latent constructs . Thus, a typical model modification is to allow for correlations between measurement residuals within or across factors, particularly when shared method variance is expected (Hermida, 2015). Modification indices are useful when locating areas for model fit improvement, such as the addition of parameters, the removal of poorly performing indicators, and in some cases, even reconsidering the structure of the full model. Some multistep testing methods, however, may be better able to identify some of these issues than others (Hayduk & Glaser, 2000). In some cases, experience may help anticipate these sources of misfit, and, in others, modification indices may identify them. As more modifications are made to the proposed model, however, the researcher risks capitalizing on chance. It is generally recommended that modifications be made only if there are theoretical justifications for doing so and that they be reported transparently (MacCallum 1986;MacCallum et al., 1993). Not all statistically significant modification indices may be practically important, either. With larger sample sizes, many modifications may result in only incremental improvements in fit, so the researcher needs to attempt to gauge the magnitude of fit improvement. Informally, researchers often consider the percentage improvement in the size of the overall model chi-square, where the chi-square for the modification index is divided by the model chi-square. More precise methods have been proposed, such as computation of Cohen's w (Dziak et al., 2014;Newsom, 2015) or improvement in a relative fit index such as McDonald's noncentrality fit index (NCI; Fan & Sivo, 2009).

Tip 8: Why Doesn't My Structural Model Fit?
For cases in which the measurement model has already been determined to have good fit or there are no latent variables in the model, poor model fit of the full structural model can occur for several reasons. First, one should confirm that the estimator is appropriate for the type of dependent variable or indicator (see Tip 6). Second, it is important to consider whether the hypothesized model is correctly specified and actually testing the intended relationships. Additionally, some relationships, such as bidirectional or circular relationships, are impossible or inherently difficult to model (see Figure 3(a,b)). Although bidirectional (or "non-recursive") predictive relationships are often hypothesized, they are difficult to estimate and require particular assumptions and the use of strong instrumental variables (Duncan, 1975;Heise, 1975). Failure to take into account these special requirements will nearly always lead to model identification problems and convergence issues. Even without explicit use of bidirectional paths, non-recursive elements may lead to convergence failures or negative error variances. For example, a model specification that includes a directional path between two endogenous variables while also including correlations between their disturbances (see Figure 3(c)) will not be identified.
Another source of poor fit is the inadvertent omission of correlations between exogenous or endogenous variables.
Such relationships may be included by default for some software programs, but this is not universal. Omission of correlations between exogenous variables implies that they are independent, which may often be an unreasonable assumption (see Tip 8 illustration in the online supplemental materials). If so, the fit of the model will suffer (and their predictive paths will not represent partial regression coefficients). Omission of structural disturbances (i.e., residual variances) among outcomes (see Figure 4) may also be detrimental to fit. Exclusion of such correlations between disturbances, for example, will imply that there is no remaining relationship between these variables and may have an important impact on the overall model fit (Luskin, 1978). These are instances of local fit problems (specific portions that do not fit well) that can be detected if modification indices are examined.
2.9. Tip 9: Fit and Inference: Interpreting and Misinterpreting Your Results Assuming previously described issues have been ruled out, it may be necessary to consider that the poor model fit is indicative of theoretical problems in the model. Alternative models may be investigated by using nested or non-nested tests (Merkle et al., 2016;Preacher & Merkle, 2012). Alternative models should be theoretically derived and carefully considered as testing several arbitrary alternative models increases the risk of capitalization on chance. Model fit and coefficient estimates are always subject to sampling variability and the particulars of a sample's characteristics, so ultimate conclusions about the best model may need to await confirmation in an independent sample (Bollen, 1989;Preacher & Merkle, 2012).
Once an acceptable fitting model has been achieved, it is important to interpret the meaning of the results within the context of the research design and the limits of SEM as an analysis method (e.g., McCoach et al., 2007). When the causal structure is saturated (i.e., directional paths or correlations are estimated among all latent variables), for example, the resulting model fit will be indicative only of the fit of the measurement model, rather than structural model, and is unable to indicate anything about causality (Anderson & Gerbing, 1988;Schumacker & Lomax, 2016). On the other hand, a model that does not specify relationships among all of the latent variables (an overidentified model) can provide information about whether the causal directions implied are consistent with the data. As a simple example, a mediation model (emotional support ! positive affect ! depression) without a direct effect specified between the predictor, emotional support, and the outcome, depression, provides information about the plausibility of the specified mediational path by implying that the direct path between emotional support and depression equals zero (see the Tip 9 illustration in the online supplemental materials). If the resulting model fits the data well, results are consistent with complete (or full) mediation and supports the hypothesized causal order of the variables in the model (Asher, 1983;Blalock, 1962;Pearl, 1998). Alternatively, if the fit is poor, then the results are inconsistent with the hypothesized causal order of the variables. Such evidence is not definitive evidence of causality, of course, as there will usually be some alternative models that fit equivalently or there may be the possibility that confounding variables are omitted (Lee & Hershberger, 1990;MacCallum et al., 1993;McCoach et al., 2007).
Although structural equation modeling is often referred to as "causal modeling," it is generally advised that authors avoid strong causal language with nonexperimental or crosssectional designs. The strength of support for causal hypotheses (internal validity) varies, depending on whether particular causal criteria are met (Pearl, 2012). In general, internal validity varies by the extent to which the independent variable can be isolated from third variables, whether temporal precedence can be established, and other conditions determined by study design and model specifications (see Asher, 1983;Bullock et al., 1994;Pearl, 1998 for more comprehensive discussion of causal inferences). Fit for some models, in fact, is not particularly relevant to causality at all. For example, the fit of a latent growth curve model is mainly a function of the saturated mean structure and is impacted by many factors unrelated to causality including the functional form and heteroscedasticity (Bollen, 2007).

Tip 10: Model Troubleshooting Tips
The issues discussed above cover many common causes of model estimation problems. Being aware of these common causes of improper solutions or poor fit will no doubt save countless hours of frustration in the modeling process. Assuming that all of the aforementioned pitfalls have been considered, however, there may still be model misfit, errors, convergence problems, or improper solutions. Several general troubleshooting strategies may help when a modeler is still stuck. Of course, every model is different, so it is difficult to provide general words of advice that will apply in all situations. Nevertheless, there are several broad principles that can be outlined.
Frequently, there is a tendency to jump into a complex model too soon. Therefore, a good approach is to break down the larger model into smaller parts when one is having trouble and then build back up, adding one or two model components until the model runs. This may not only include separate testing of measurement portion of the model, as described previously (see Tips 6 and 7), but it also may include such approaches as moving from testing several factors to testing each factor singly, eliminating covariates, and focusing only on a portion of the model, such as examining only the predictor-mediator relationship rather than the full mediational model, or even examining only a subset of time points rather than all points in a growth curve model. It is important to distinguish this troubleshooting strategy from an exploratory approach. The purpose is merely to get some portion of the originally intended model to run when the larger model is not running. We are not, however, recommending making model modifications haphazardly until the model fits. The general troubleshooting approach of simplifying and breaking up into manageable portions analogously follows what computer programmers already have baked into their DNAusing logic and reasoning to try to isolate the cause of the specification problems. A process of elimination may also work, but routinely employing a build-up strategy from the beginning rather than a tear-down strategy after problems occur will likely save more time.
It is not uncommon to get lost in the model specifications of a more complex model, particularly when syntax is used. The model specified might not be the model intended, and this is a root cause of many problems for those just gaining experience. Although we urge modelers to use a detailed graphic of the model from the beginning of the process, revisiting or newly sketching a figure of the model, labeled with the appropriate variable names, can be a great help in identifying the culprit in the syntax. Although details, such as omitted semicolons or other required punctuation will lead to errors, the source of the problem may also be that the direction of the relationship between two variables is specified incorrectly from what was intended (e.g., x is caused by y in the syntax when it was intended for y to be caused by x as hypothesized), that the wrong variable name is used in a path specification, that a critical path has been omitted (e.g., a path from one of the covariates to the outcome is missing), that a loading has not been set to scale the latent variable, or that an extra path is included (e.g., y is causing x and x is causing y). With a model figure as the guide, go line-by-line through the code to make sure that everything specified in the syntax matches what is intended in the figure. Before jumping to the interpretation of the results, it should always be a routine to carefully check the output to make sure that all of the parameters that were set and freely estimated were as intended. Many syntax-based SEM software programs will produce a graph on the model being tested, so another possible approach is to generate a graph with each model tested to see whether the figure matches the intended model. With complex models, it is still easy to miss unintended or omitted paths, so the figures should be carefully reviewed before going forward. Even for software that is graphically based, such as Amos, it is still common to have errors or convergence problems because an intended path that was omitted or has not been set to a value when required for identification (e.g., Amos requires the user to name variables, add measurement residuals, and add structural disturbances).
If the problem is a poor fitting model, one has to always keep in mind that either the hypothesized model or a theoretical assumption behind the model is wrong. In which case, it may be a good next step to examine modification indices. Is something completely new being attempted or are similar models being consulted to make sure nothing unreasonable is being considered? When all of the above-mentioned problems involving unintentionally incorrect models have been ruled out, it is time to consider that the hypothesized model is not the right or best one. Then, the next step is to consider alternative models and compare their fit. Which models predicted by theory or past work might be better? Models with the same variables and cases can be compared through nested tests (see Tip 9), if one model contains a subset of the parameters in the other model (e.g., removing a causal path), but many of the interesting alternative models will simply have the same specified latent variable structure with the same measured variables but differ in the directions of the causal paths. Some of these models will be equivalent (MacCallum et al., 1993), but some may have different fit. Different fit in this circumstance reflects the consistency with the causal assumptions of the model. If the fit is different in such a case, use of the traditional chi-square is adequate for assessing the superior model and relative fit indices are not needed. Whether the model is nested or not, the magnitude of the difference in the fit and sampling variability should be taken into account (Preacher & Merkle, 2012). Some authors have recommended other methods of comparing nonnested models (e.g., Merkle et al., 2016;Preacher et al., 2007).

Conclusion
SEM brings enormous utility and flexibility to testing research hypotheses, but the approach necessarily involves considerable technical background knowledge to use it competently and correctly. Even with such knowledge, structural modeling can often be a challenging endeavor at the beginning, typically overcome only through raw experience coupled with sheer determination. The tips presented here are intended to help speed the transition between acquired technical knowledge and proficient implementation so that researchers can get on with the business of investigating research questions.