A Simple Simulation Technique for Nonnormal Data with Prespecified Skewness, Kurtosis, and Covariance Matrix

ABSTRACT We present and investigate a simple way to generate nonnormal data using linear combinations of independent generator (IG) variables. The simulated data have prespecified univariate skewness and kurtosis and a given covariance matrix. In contrast to the widely used Vale-Maurelli (VM) transform, the obtained data are shown to have a non-Gaussian copula. We analytically obtain asymptotic robustness conditions for the IG distribution. We show empirically that popular test statistics in covariance analysis tend to reject true models more often under the IG transform than under the VM transform. This implies that overly optimistic evaluations of estimators and fit statistics in covariance structure analysis may be tempered by including the IG transform for nonnormal data generation. We provide an implementation of the IG transform in the R environment.


Introduction
The use of Monte Carlo simulation to empirically assess estimators and goodness-of-fit statistics is widespread in structural equation modeling (SEM) research. The most common estimation method is normal-theory maximum likelihood (ML), which is the default choice in SEM software. Normal-theory ML also provides an associated goodness-of-fit test, which under ideal conditions of correct model specification, multivariate normality, and large sample size follows a chi-square distribution. However, in most real-world situations, data are drawn from nonnormal distributions (Micceri, 1989). It is therefore important to evaluate normality-based ML under realistic sample sizes and nonnormal distributions. In fact, for any estimation technique to be evaluated, normal-theory based or not, we need to generate data with methods that can provide many different kinds of multivariate nonnormal distributions. The tradition in SEM simulation is to set up a nonnormal data-generating process by specifying a given covariance matrix and given skewness and kurtosis in the univariate marginal distributions. This is natural since SEM is based on analysis of the covariance structure. However, in the nonnormal case, there are many distributions that have a given covariance matrix. Similarly, the skewness and kurtosis of a univariate distribution do not fully define a univariate distribution. Thus by specifying the covariance matrix, and, for each observed variable, the skewness and kurtosis values, there are potentially CONTACT Njål Foldnes njal.foldnes@bi.no BI Norwegian Business School, PB ,  Stavanger, Norway many data-generating processes that can attain the target covariance matrix, skewnesses, and kurtosis. The data-generating process of Vale-Maurelli (Vale & Maurelli, 1983) has been the default choice for researchers conducting SEM simulation studies. However, the Vale-Maurelli (VM) transform represents only one of many data-generating processes that attain the prespecified covariance structure and univariate skewness and kurtosis of the observed variables.
The aim of the current article is to introduce a new simulation technique that generates data that are different in important ways from data obtained with the VM transform. We demonstrate that using the VM transform may result in a too-optimistic evaluation of normaltheory-based test statistics in SEM. A more moderate evaluation of the performance of these popular test statistics is obtained when data are generated by the simulation technique proposed in this article. For example, we show that at fixed levels of skewness and kurtosis in the observed variables, the performance of the ML chi-square goodness-of-fit test T ML rejects true models more often when data are simulated with our proposed method compared to when data are generated with the VM transform. Hence, a researcher using only the VM transform may be led to conclude that T ML performs quite well under nonnormality. However, such a conclusion is contingent on the special kind of nonnormality offered by the VM transform. A less optimistic evaluation of the appropriateness of T ML under nonnormality may have resulted had the data been generated by the new method instead of the VM transform. Researchers can more broadly evaluate the finite-sample performance of statistical techniques under nonnormality by employing this new simulation technique in addition to the VM transform.
Let us remark that although convenient, the traditional SEM approach of specifying the covariance matrix and some finite-order moments of the univariate marginal distributions does not fully identify a multivariate distribution. There is a more general concept of dependency among random variables than the covariance matrix; namely, the copula. The importance of the copula stems from the following unique decomposition of any continuous multivariate distribution into two parts:
The first part is a specification of the univariate marginal distributions of the multivariate distribution; that is, the distribution of each observed variable. The second part pertains to the dependence structure among these observed variables. This structure may be modeled by the copula of the distribution, which describes the dependency among the observed variables. For example, the copula associated with multivariate normality is defined as follows. Given a random multivariate normal vector X = (X 1 , . . . , X d ) , we may define a new vector U X = (φ( X 1 −μ 1 σ 1 ), . . . , φ( X d −μ d σ d )) obtained from X by applying the standard normal cumulative distribution function φ to each standardized univariate marginal in X. The copula of X, known as the normal copula, is the cumulative distribution function of the vector U X . This procedure may be applied to any nonnormal random vector Y , whose copula is the distribution of the vector Vuolo (2015) for an introduction to copulas in sociology.
Note that multivariate normality occurs only when all univariate marginal distributions are normally distributed and the associated copula is normal as defined above. Violation of the normality assumption can occur if at least one univariate marginal distribution is nonnormal, if the copula is nonnormal, or both. For example, it may be the case that all univariate marginal distributions are normally distributed but that the copula is not normal, which will result in a nonnormal multivariate distribution with normal univariate marginal distributions. A practitioner investigating the observed variables one by one will in such a case get the false impression that the data are normally distributed. Conversely, it is possible to have nonnormal univariate observed variables but with a normal copula. In this latter case, the distribution will appear highly nonnormal when investigating each observed variable, but its dependence structure-that is, its copula-is identical to the multivariate normal case.
In fact, this latter situation closely describes the VM transform, which is implemented in software packages like Mplus (Muthén & Muthén, 2012), EQS (Bentler, 2006) and lavaan (Rosseel, 2012). The VM transform delivers distributions with prespecified univariate skewness and kurtosis and a prespecified covariance matrix. See Curran, West, and Finch (1996) for an example of an extensive Monte Carlo study that uses this method. In the VM transform, one samples from an underlying multivariate normal distribution, whose covariance matrix is found with a numerical routine. Then for each marginal variable Z, a resulting observed variable Y is generated from Z by applying a Fleishman polynomial; that is, a third or higher order polynomial whose coefficients are determined to match the target skewness and kurtosis values (Fleishman, 1978). There has recently been some critique of the VM transform, both empirically (Astivia & Zumbo, 2014) and theoretically (Foldnes & Grønneberg, 2015). In the latter work, it is shown that the truly multivariate aspect of the VM transform-that is, its copula-is very closely related to the multivariate normal copula. Moreover, attempts to generalize the VM transform by including Fleishman polynomials of higher degrees (Headrick, 2002) do not transcend this limitation. The copula-based simulation approach proposed by Mair, Satorra, and Bentler (2012) offers an alternative to the VM transform, where the resulting copula is nonnormal. However, controlling the univariate marginal distributions is not possible with their method.
In the present article, we present and investigate a new method for simulating nonnormal data with exact control of the covariance matrix and univariate skewness and kurtosis and with truly nonnormal copulas. The new method hence provides an alternative to the VM transform and gives rise to a different set of nonnormal distributions from those offered by the VM transform. We illustrate that the proposed simulation method produces distributions under which normal-theory ML inference performs differently than it does under the data generated by the VM transform, in spite of both distributions sharing the same covariance matrix and the same univariate skewness and kurtosis.
We propose a new transform that generates random samples from a vector Y of p observed variables. The transform represents the observed variables as Y i = s j=1 a i j X j , for i = 1, . . . , p, where the a i j are constant scalars and the X j , j = 1, . . . s, are mutually independent random variables referred to as independent generator (IG) variables. In practice, the vector Y will contain all observed variables in a SEM model that we wish to generate data for, while the X j variables may be unrelated to the model, propagating randomness into Y . We say that any Y that can be represented stochastically in this way has an IG distribution. These distributions are convenient for simulating multivariate nonnormal data because it is straightforward to independently draw random samples from the univariate distributions of the X j and then use the linear combinations to generate data for Y . The algebraic simplicity of the stochastic representation also makes it possible to develop analytical results. Foldnes, Olsson, and Foss (2012) used IG distributions to theoretically study the loss of power of robust test statistics with increasing kurtosis.
To investigate whether normal-theory ML inference is robust to various violations of the normality assumption, researchers need to generate data that depart from the normal case in significant ways. In SEM simulation, a researcher with a target model in mind usually specifies the population parameters of the model and calculates the implied covariance matrix of the observed data. Then simulations are executed from a nonnormal distribution whose covariance matrix equals the implied covariance matrix. The task of nonnormal data generation with given covariance matrix is important in related fields of statistics as well. Many methods rely on some form of covariance structure analysis; for example, repeated-measures designs (Berkovits, Hancock, & Nevitt, 2000;Molenaar & Rovine, 1998), multilevel random coefficients models (Rovine & Molenaar, 2000), and analysis of covariance (ANCOVA) (Headrick & Sawilowsky, 2000). Evaluating such methods under nonnormal data hence requires a simulation from an underlying distribution whose covariance matrix equals a given target matrix and whose univariate marginal distributions are partially or fully specified. In the present article, we consider the traditional case in which only the third-and fourth-order moments are prespecified for each univariate marginal distribution.
The goals of this article are threefold. The first goal is to introduce and illustrate a new simulation technique based on IG variables. The second goal is to demonstrate the need for nonnormal data simulation techniques that differ from the often used VM transform. We do this by Monte Carlo, showing that popular SEM fit statistics are sensitive to the underlying type of nonnormality, even for fixed levels of univariate skewness and kurtosis. In a realworld setting, we do not know the exact nature of our nonnormal data, so we should be careful to include different kinds of nonnormality in our simulation conditions. Our third goal is to present analytical results for IG distributions. We establish exact conditions for the validity of normal-theory based test statistics under nonnormal IG distributions. We also demonstrate that the IG distribution has a nonnormal copula, in contrast to the VM distribution, again showing that the IG transform produces conditions that may complement the VM transform.
This article is organized as follows. In the next section we present the IG transform, summarized in a fivestep algorithm. This is followed by a section in which the five-step algorithm is illustrated on an empirical example. We then present a technical section developing analytical results for the IG transform, beginning with a nontechnical summary of these results. This is followed by a section with a Monte Carlo study that evaluates the rejection rates of two popular test statistics under data produced by both the IG and the VM transform. In the last section, we discuss limitations of the IG transform and give conclusive remarks. Sample code written in the R environment is available as online supplementary material for methodologists that wish to implement the IG transform.

The IG transform
We investigate distributions obtained from linear combinations of independent nonnormal variables: where X is a column s-vector of generator variables, while Y is a column p-vector of generated variables. A researcher may form Y from the observed variables in the model under scrutiny. The topic of the current section is how we may choose the A matrix and distributions for the X j so that the observed variables have the desired covariance matrix, skewness, and kurtosis. The IG variables X j for j = 1, . . . , s, contained in the vector X, are assumed to be mutually independent, and A is a p × s (s ≥ p) matrix of constant coefficients. We assume without loss of generality that X j have zero mean and unit variance for all j. In the following, we refer to data generation by way of (1) as the IG transform. For a given target covariance matrix for Y , we employ some matrix A such that AA = .
This will ensure, given the unit variance and mutual independence of the IG variables, that Y has the target covariance . In a straightforward application of Equation (1), one can set s = p and choose A as a square root matrix of . However, a researcher investigating a particular model might want to investigate how skewness and kurtosis in various parts of the model influence estimation and goodness-of-fit statistics. Then a more elaborate A can be constructed from the model at hand as follows. A SEM model can be formulated (Bollen, 1989) as where ξ, ζ, , and δ are latent vector variates and the observed variate is Y = (Y 1 , Y 2 ) . The covariance matrices of ξ, ζ, , and δ are denoted by , , , and δ , respectively, and are assumed to be positive definite. The matrices B and contain regression coefficients, while 1 and 2 contain factor loadings. Note that (2) contains the multivariate regression model as a submodel, when the Y i are measured without errors and B = 0.
We decompose these matrices by where the A i are square root matrices. We set ξ = A 1 X 1 , ζ = A 2 X 2 , = A 3 X 3 , and δ = A 4 X 4 where the variables contained in the vector X = (X 1 , X 2 , X 3 , X 4 ) are mutually independent, each with variance one and zero mean. Consequently we can cast (2) into the form of (1): Thus, we can control skewness and kurtosis in each random vector ξ, ζ, , and δ of the proposed model by applying the IG transform to each of these. The distribution of the observed vector Y will then still be an IG distribution. After A in Equation (1) has been determined, we wish to find suitable distributions F j for the X j such that each Y i has a given skewness and kurtosis. The skewness α V of a random variable V with expectation μ V and standard deviation σ V is a measure of asymmetry and involves the third-order moment: Kurtosis is a measure of the propensity of a distribution to produce outliers (for a recent discussion of the interpretation of kurtosis in terms of tail extremity, see Westfall, 2014). Kurtosis involves fourth-order moments. The relative, or excess, kurtosis The issues of skewness and kurtosis can be handled separately. For skewness, the pairwise independence between the X j makes it a rather uncomplicated exercise in algebra to obtain an expression for the skewness of Y i : Hence, the skewness of the Y i depends linearly on the skewnesses in the generator variables. Given userspecified skewness values α Y i , i = 1, . . . , p, there are p equations of type (4). Together they constitute a linear system in p equations with s ≥ p unknowns; namely, the α X j , j = 1, . . . , s. In general, such a system is consistent and can be easily solved for the α X j by using statistical or mathematical software like R, Mathematica, or Matlab. An expression similar to (4) might be obtained for kurtosis, again relying on the mutual independence and unit variance of the IG variables. By some straightforward algebra, we obtain With given user-specified kurtosis values β Y i , we again end up with a system of p linear equations with more unknowns (β X j , i = 1, . . . , s) than equations. And again, we can solve this system by using routines in a software package like R.
We can now summarize the IG transform simulation method: 1. The user specifies a target covariance matrix and target skewness ( 3. The systems of linear equations (4) and (5) are solved for α X j and β X j , respectively. 4. For each IG variable X j , a univariate distribution F j is specified such that X j has zero expectation, unit variance, skewness α X j , and kurtosis β X j . 5. For a given sample size, draw independently random realizations of X j , i = 1 . . . , s. Apply the linear transformation (1) to obtain a random sample from Y . The IG transform described stochastically defines a large class of distributions, where steps 2, 3, and 4 can be executed in various ways. In step 2, many A are possible. If s > p, there might be several possible solutions for α X j and β X j in step 3. In step 4, there are different candidate distributions for X j with given skewness and kurtosis, for example through Fleishman polynomials, the Johnson family of distributions (Johnson, 1949) or the Pearson family of distributions (Pearson, 1895). However, the focus in the present article is to introduce and illustrate a new simulation method that matches target skewness, kurtosis, and covariance and is fast and easy to simulate from. The flexibility contained in the IG class of distributions may be a topic of future research.

Illustration: Bollen's political democracy model
In this section, we demonstrate how to simulate data for the well-known political democracy model for developing countries, described in the textbook by Bollen (1989). The associated R code is available as supplementary material. The model contains four measures of political democracy measured twice (1960 and 1965) and three measures of industrialization measured once (1960) and is depicted in Figure 1.
Suppose we want to generate data from a population that fits perfectly with the model. In our case, we have access to the original data set (Bollen, 1989, p. 428, Table 9.4) that contains n = 75 developing countries, so we can use the model-implied covariance matrix as the target matrix =ˆ : In addition to specifying the target covariance matrix, the researcher also specifies skewness and kurtosis for each of the 11 observed variables. We specify the following skewness values for Y 1 , Y 2 , . . . , Y 11 : (0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2). We specify the following excess kurtosis values: (1, 1, 1, 1, 3, 3, 3, 3, 7, 7, 7). For example, for Y 5 we desire a distribution with skewness 1 and excess kurtosis 3.
For example, the IG variable X 5 should have skewness 4.07 and excess kurtosis 17.36.
The fourth step involves finding univariate distributions for the IG variables X j that have the required skewness and kurtosis. For example, we must define a distribution F 5 such that the skewness and kurtosis of X 5 equal 4.07 and 17.36, respectively. There are different ways of doing this; one could, for example, use Fleishman polynomials as done in the VM transform. However, for this illustration we use the Pearson distribution system, which allows the user to specify mean, variance, skewness, and kurtosis in order to obtain a corresponding Pearson distribution. For example, F 5 is a member of the Type I Pearson family of distributions, which are transformations of Beta distributions, so that the probability density of To attain zero mean, unit variance, skewness 4.07, and kurtosis 17.36, we use R (Becker & Klößner, 2013) to calibrate the parameters: a = 0.05, b = 1.07, λ = −0.32, and s = 6.99.
The fifth and last step is to draw independently for each X j random samples of a given sample size N. These 11 samples are combined as columns to form an N × 11 matrix D. The final simulated data set is then obtained as the N × 11 data matrix D · A , where each row represents a single draw from the random vector Y .

Analytical investigations of IG distributions
The material contained in this section is quite mathematical in nature and may be of interest primarily to technical readers. Applied researchers may wish to skip the technicalities. For those readers, we provide the following summary of our analytical investigations.

Summary of analytical results
An important use of nonnormal data generation in SEM is to evaluate the finite-sample performance of estimators and fit statistics under a variety of distributional conditions. Normal-theory-based ML is the most popular estimation method, which also provides an associated fit statistic, T ML . An important issue is whether T ML can still be trusted when data are nonnormal. It is well known that under some specific combinations of nonnormal data and model conditions, T ML retains its asymptotic chisquare distribution. These combinations involve certain exact relationships between the multivariate fourth-order moments of the data and the derivatives of the modelimplied covariance matrix and are characterized in a body of literature referred to as asymptotic robustness (AR; e.g., Amemiya & Anderson, 1990;Browne & Shapiro, 1988;Satorra & Bentler, 1990;Shapiro, 1987). If the IG transform is to be used to realistically assess T ML , for example in terms of Type I error rates, it is important that AR does not hold. In this section, we therefore investigate under what conditions the IG transform will violate AR. With the results obtained in this section, we can affirm whether a given IG transform in conjunction with a given model will violate AR. This is done by examining an explicit matrix that must be zero in all elements for AR to hold. A researcher employing the IG method for data generation could check whether this matrix is nonnull and hence be confident that the simulated data will distort the distribution of the ML chi-square fit statistic. We are able to deduce this result because the IG transform is analytically tractable due to the independence of the IG variables. We also establish in the context of IG distributions that whenever the well-known Satorra-Bentler correction is equal to one-that is, when there is no correction-AR must hold, and hence both the ML chi-square and the Satorra-Bentler chi-square are to be trusted under reasonably large sample sizes. Whether this result extends to other classes of nonnormal distributions is presently unknown.
It is also important to establish that the IG transform is able to generate data with a nonnormal copula so that the resulting IG distribution departs from normality both in terms of each observed variable and in terms of their dependence structure. In the second subsection, we show that IG distributions, in contrast to those obtained with the VM transform, in general have a nonnormal copula. Hence, the IG transform may serve to modify previous performance evaluations of estimators and fit statistics in SEM that were based on the VM transform, or other nonnormal data simulation techniques based on the normal copula, like the NORTA method (Cario & Nelson, 1997) and its extensions (Kowalchuk & Headrick, 2010).
Applied researchers may skip the technicalities contained in the next two subsections without loss of continuity and proceed to the next section, where the IG transform is compared to the VM transform in a Monte Carlo study.

Asymptotic robustness
IG distributions belong to a class of distributions referred to as Data Model 1 by Yuan and Bentler (1999), where AR conditions were derived for four test statistics. These results were based on a crucial assumption (Yuan & Bentler, 1999, Condition 4.1) relating the data generation method and the model structure. This condition is sufficient for AR to hold. However, a researcher seeking to evaluate normal-theory-based statistics under nonoptimal conditions should avoid AR. Thus, what we need is a necessary condition for AR. By violating this condition in the simulation setup, the researcher will know that AR is violated.
To develop AR conditions under (1), we need to define some central vectors and matrices and operations upon them. For illustrative purposes, we exemplify in the following these definitions on a simple model, referred to as Model S. Model S has one factor ξ and three observed variables Y 1 , Y 2 , Y 3 , with var(ξ ) = 1 for identification. We constrain all three loading coefficients to be equal: λ i = λ for i = 1, 2, 3. As is common in parallel measures (Lord & Novick, 1968), we also constrain some error variances to be equal: ψ 2 1 = ψ 2 2 , where ψ 2 i is the variance of the measurement residual corresponding to Y i . Hence, the free parameters in Model S are λ, ψ 2 1 , and ψ 2 3 . These parameters constitute the parameter vector θ = (λ, ψ 2 1 , ψ 2 3 ) . The model-implied covariance matrix of Model S is This matrix is symmetric and contains redundant elements. For a symmetric p × p matrix G, let vech(G) represent the column vector consisting of the p * = p(p + 1)/2 nonduplicated elements of G; that is, the elements along and below the diagonal. Thus, we can represent (θ) by the column vector

σ(θ) = vech( (θ))
= λ 2 + ψ 2 1 , λ 2 , λ 2 + ψ 2 1 , λ 2 , λ 2 , λ 2 + ψ 2 3 , containing the p * = 3 · 4/2 = 6 unique elements of (θ). The same procedure may be applied to the sample covariance matrix S, which is then more economically represented as s = vech(S). We let p denote the number of observed variables and q the number of free parameters. An important matrix is the Jacobian matrix of derivatives (θ) = ∂σ(θ)/∂θ , where each entry of σ(θ) is partially differentiated with respect to each parameter in θ. For model S we have Next we consider a central element in SEM: the sampling distribution of the covariances; that is, the distribution of s. The asymptotic covariance matrix of √ ns is denoted by and should not be confused with the regression coefficient matrix in Equation (2). To exemplify, let us consider an IG distribution tailored to simulate data for Model S. We fix all the parameter values λ, ψ 2 1 , and ψ 2 3 to 1, such that the target covariance matrix is = 2 1 1 1 2 1 1 1 2 . Then we set A = There is a general formula for when data come from an IG transform (Browne & Shapiro, 1988, Theorem 2.1): HereÃ is the p 2 × s matrix whose jth column is a j ⊗ a j , with a j being the jth column vector of A. The ⊗ symbol denotes the Kronecker product. Then a j ⊗ a j consists of stacking p copies of a j , where the ith copy is being multiplied by a i j , for i = 1, . . . , p. For Model S, with the A given above, this gives 0 0 1/2 3/2 0 1/2 1/2 0 1 0 0 1/2 1/2 0 1/2 1/6 4/3 C is the s × s diagonal matrix that has the excess kurtosis of the IG variables on the diagonal, and zeros elsewhere. The constant p 2 × p * matrix K p consists of elements with values in {0, 1 2 , 1} and is defined in, for example, Browne (1974, Section 2). It has the property that vech(G) = K p vec(G) where vec(G) is the vector formed by stacking the columns of G. For Model S, p = 3, and Finally, Equation (6) contains N , defined as the asymptotic covariance matrix of √ ns for multivariate normal data, which is given by N = 2K p ( ⊗ )K p . Returning to Model S, with fixed parameter values λ, ψ 2 1 , and ψ 2 3 all equal to 1, and with excess kurtosis in the IG variables denoted by β i = β Y i for i = 1, 2, 3, Equation (6) Setting β i = 0 for i = 1, 2, 3 in the previous expression recovers N . Recall that the elements of s are ordered as s 11 , s 12 , s 22 , s 13 , s 23 , s 33 . Thus, for example, the covariance between the sample covariance of Y 2 and Y 3 and the variance of Y 2 is approximately equal to 4 + (β 1 + 3β 2 )/4 divided by the sample size n, provided the sample is large. The normal-theory ML estimateθ minimizes With multivariate normality T ML = (n − 1) · F ML (θ) is asymptotically distributed as a chi-square with d = p * − q degrees of freedom. With nonnormal data, T ML is however asymptotically distributed as a mixture of chisquares: where the χ 2 1 are mutually independent chi-squares with one degree of freedom, and N . AR is hence equivalent to γ j = 1, j = 1, . . . , d. Returning to Model S, with the three parameters fixed to one and with excess kurtosis in the IG variables β 1 = 2, β 2 = 4, and β 3 = 6, the nonzero eigenvalues of U are γ 1 = 2.25, γ 2 = 1, and γ 3 = 1. Hence, for large samples produced by the specified IG transform, T ML obtained from testing Model S will have an asymptotic mean of 4.25, which is larger than the mean d = 3 of the reference chi-square distribution. Hence, although Model S has perfect fit to the target covariance matrix , the T ML -based test will reject Model S too often. In fact, at the nominal significance level 0.05, model S will have a rejection rate of about 0.14 in large samples. The fact that T ML tends to inflate with increasing kurtosis is well known. However, as demonstrated with Model S, working with IG distributions allows for an exact calculation of the asymptotic distribution of T ML . One way to remedy the inflation of T ML is to scale by a factor c = d/trace(U ), as proposed by Satorra and Bentler (1988). Then the asymptotic mean of T SB = c · T ML equals the degrees of freedom d. Shapiro (1987) showed that AR holds if and only if there exists some p * × q matrix D such that Moreover, under the slightly less general representation = N + E for some symmetric matrix E, normaltheory-based estimators like ML are asymptotically efficient within the class of minimum discrepancy function (MDF) estimators (Shapiro, 1987, Corollary 5.3). An MDF estimatorθ is defined as the minimizer of F (S, (θ)) where the function F has the following three properties: F (E, G) ≥ 0 for all E and G; F (E, G) = 0 if and only if E = G; F is twice continuously differentiable jointly in E and G (Browne, 1982).
In Yuan and Bentler (1999), a crucial assumption, therein referred to as Condition 4.1, is that vech(a i a i ) is contained in the column space of for all i. We reformulate this condition as P = 0, where Here, c is an orthogonal complement of (θ); that is, a p * × (p * − rank( )) matrix of full column rank where each column in c is orthogonal to each column in (θ). For Model S, one orthogonal complement is given by so that P = ( 0 0 0 1/2 −1/2 0 3/2 −3/2 0 ), which contains nonzero elements. Yuan and Bentler (1999) showed that P = 0 is a sufficient condition for AR. Our first proposition gives a simple alternative proof that P = 0 entails AR. In addition, we show that the condition P = 0 implies that normaltheory estimators are asymptotically efficient. Proposition 1. Consider the covariance model (θ), and assume that it is correctly specified for an underlying IG distribution of Type (1). Then, if P = 0, asymptotic robustness holds. Also, if P = 0, then normal theory estimators (e.g., ML, GLS) are asymptotically efficient within the class of minimum discrepancy function estimators.
Proof. c K pÃ = 0 implies that the column space of K pÃ is a subset of the column space of . Hence there exists a matrix L such that K pÃ = L. Substitution in the second term on the right-hand side of (6) gives K pÃ CÃ K p = LCL , and = N + E where E = LCL .
A necessary condition for AR was not given in Yuan and Bentler (1999). The following proposition identifies a necessary condition. It also states that if all the X j have more, or all have less, kurtosis than the normal distribution, then P = 0 is actually equivalent to AR. (1). Then PCP = 0. Moreover, if either β X j > 0 for all j or β X j < 0 for all j, then P = 0. (6) and (8) that some D exists such that D + D = K pÃ CÃ K p and premultiplication with c and postmultiplication by c gives the required result. Now assume that β X j > 0 for all j; that is, that the diagonal elements of C are all positive. The diagonal elements of PCP are quadratic forms p j Cp j where p j is the jth row of P. Now p j Cp j = 0 implies that p j = 0 for all j. The same argument holds for the situation with β X j < 0 for all j.

Proof. By AR, it follows from Equations
As a corollary, we obtain an interesting observation. In general, if the Satorra-Bentler correction c is inactivethat is, if c = 1-we know that the sum of the eigenvalues γ j is equal to d. In general, this does not imply that we are in an AR situation, with γ j = 1 for all j = 1, . . . , d.
The following corollary shows, however, that under an IG distribution with excess kurtosis in the IG variables, c = 1 is indeed equivalent to AR. (1) where either β X j > 0 for all j or β X j < 0 for all j. Then the Satorra-Bentler correction c equals one if and only if AR holds.
We have seen that when all IG variables have more, or all have less, kurtosis than a normal variable, then PCP = 0 and P = 0 are equivalent conditions, both again equivalent to AR. A natural follow-up question is whether P = 0 is also necessary for AR in the general case where some IG variables have more kurtosis, and some less, than the normal distribution. In the following, we use Model S to construct a counterexample. For Model S, we noted earlier that P = Hence, in this case P is not zero, but if β 1 = 1, β 2 = −1, and β 3 = 1/16, then AR holds. Thus P = 0 implies AR only in the case where all IG variables have positive, or all have negative, excess kurtosis.

The copula of the IG transform
Consider a continuous multivariate distribution, with univariate marginal distributions F i (y) = P(Y i ≤ y), i = 1, . . . , p. Sklar (1959) showed that a full description of the random vector Y = (Y 1 , Y 2 , . . . , Y p ) may be uniquely obtained by separating the marginal distributions F i from the dependence structure; that is, the copula. More precisely, the probability integral transform applied to each component of Y gives the random vector (U 1 , . . . , U p ) = (F 1 (Y 1 ), . . . , F p (Y p )), with uniform marginal distributions. The copula C of Y is the joint cumulative distribution function of (U 1 , · · · , U p ). Thus a copula is a cumulative distribution function with uniform marginal distributions. We refer to Joe (2014) for a thorough review of copula theory.
Similar to the IG transform, under the VM transform the user may specify kurtosis and skewness for each univariate marginal, together with a covariance matrix. The resulting distribution, given the nonnormal nature of the univariate marginal distributions, appears to be highly nonnormal. It has however been shown (Foldnes & Grønneberg, 2015) that the dependence structure of the VM transform is closely related to that of a multivariate normal distribution. This is not surprising, given that the Fleishman polynomials in the VM transform do not contain interactions between the normally distributed generator variables Z for Y . That is, in the VM transform each output variable Y i is obtained as a polynomial in a single corresponding normally distributed variable Z i . The generator vector Z = (Z 1 , . . . , Z p ) is multivariate normal, and the VM output Y = (Y 1 , . . . , Y p ) inherits some basic properties of Z.
Compare this to the IG transform in Equation (1), where each Y i is a function of potentially all the generator variables X j . Although Equation (1) is a simple way to obtain dependence between the Y i -that is, by assuming linearity and independence-the resulting distribution for Y has a nonnormal copula. To the best of our knowledge there exists no general results on the copula obtained from the IG transform. However, as shown in the following, this copula is not in general normal. Consider the two-dimensional case p = s = 2 and assume that A is nonsingular. Suppose that the probability density function (pdf) of X 1 is nonzero everywhere, while the pdf of X 2 is nonzero only in an interval [b, ∞). For example, might X 1 be normally distributed and X 2 a scaled χ 2 . Fix Y 1 to any value y 1 . If we assume that both a 11 and a 21 are nonzero, it follows from y 1 = a 11 x 1 + a 12 x 2 y 2 = a 21 x 1 + a 22 x 2 that y 2 = a 21 a 11 y 1 + (a 22 − a 12 a 21 a 11 )x 2 .
Since A is invertible, we have (a 22 − a 21 a 21 a 11 ) = 0, and suppose without loss of generality that this expression is positive. Then, given Y 1 = y 1 , the conditional density of Y 2 is nonzero only in [ a 21 a 11 y 1 + (a 22 − a 12 a 21 a 11 )b, ∞). Thus there exists y 1 , y 2 such that f 2|1 (y 2 |y 1 ) = 0, where f 2|1 denotes the conditional density of Y 2 given Y 1 = y 1 . Furthermore, since a 21 is nonzero and since the pdf of X 1 is nonzero everywhere, it follows that the pdf of Y 2 is nonzero everywhere: f 2 (y 2 ) > 0 for all y 2 . Then since f 2|1 (y 2 |y 1 ) = f 2 (y 2 ) · c 12 (F 1 (y 1 ), F 2 (y 2 )), it follows that the copula density c 12 of Y has c 12 (F 1 (y 1 ), F 2 (y 2 )) = 0 for some y 1 , y 2 . However, since the bivariate normal pdf is nonzero everywhere, the bivariate normal copula is nonzero for all interior points of the unit square, so c 12 cannot be a normal copula.

Monte Carlo illustration
In this section, we investigate empirically the performance of the two most popular fit statistics under the IG and VM simulation methods. We choose to focus on test statistics since model fit evaluation is a central task in SEM. Specifically, we consider the performance of T ML , which is the default statistic in most SEM software packages, and T SB , which is the most widely used fit statistic for nonnormal data. Our goal is to illustrate the IG transform in a specific example and to show that it produces distributions that differ from the VM transform to the extent that important SEM statistics are affected.
Since our purpose here is not to evaluate the IG method over a span of models, sample sizes, and nonnormality conditions, we employ a simple model, given in Figure 2, with two factors ξ 1 and ξ 2 , each with two indicators.
The population model has factor loadings equal to 1, 0.8, 1, 0.8 for Y 1 , Y 2 , Y 3 , and Y 4 , respectively. Both factor variances are equal to one, and the covariance between ξ 1 and ξ 2 is φ = 0.2. The residual variances of Y 1 , Y 2 , Y 3 , and Y 4 are all equal to 0.4. The model has only three free parameters; namely, the elements of the covariance matrix Cov (ξ 1 , ξ 2 ). All factor loadings and residual variances are fixed in the model to their population counterparts, so there are seven degrees of freedom. The resulting covariance matrix of the observables, and a corresponding triangular square root matrix, are given by We remark that other square root matrices of could have been used, for example a symmetric A, but for the sake of simplicity we do not proceed to study these.
r Two simulation methods: VM and IG For the two levels of nonnormality in the Y i variables, i = 1, . . . , 4, we calculated skewness and kurtosis values in the generator variables X 1 , . . . , X 4 . This was done with a numerical routine for the equation systems (4) and (5), as shown in the supplementary online material. The resulting skewness and kurtosis values for X 1 , . . . , X 4 are given in Table 1. Each generator variable X j was then simulated independently, with zero mean, unit variance, skewness, and excess kurtosis as specified in Table 1. To this end, we employed the Pearson family of distributions (Pearson, 1895), which allows the user to specify the first four moments. We note that other kinds of distributions that match these moments could have been used and that the sensitivity of the IG transform to variation in the univariate distributions of the IG variables is a direction for future research.
Results are give in Table 2. In each condition, we report for T ML and T SB the mean, variance, and rejection rate at the α = 0.05 level. There are 2,000 replications in each cell, so that the maximal standard error for the rejection rates is 0.5 2 /2000 = 0.011. The main observation is that fixing the covariance matrix and both univariate skewness and kurtosis still leaves a lot of flexibility that will influence the behaviour of T ML and T SB . Both statistics differ in mean, variance, and rejection rates between the VM and IG transforms. Under normal data, the asymptotic mean and variance of these statistics equal those of a χ 2 distribution with seven degrees of freedom. Thus, the asymptotic mean and variance are 7 and 14, respectively. As expected, due to excess kurtosis, T ML is inflated, with rejection rates larger than the nominal levels. The nonnormality correction in T SB only partly ameliorates the inflation. In all conditions, the behavior of both T ML and T SB is sensitive to the underlying transform. The mean and variance are higher under IG than VM, resulting in consistently higher rejection rates for data generated with the IG transform compared to the VM transform. Hence, it seems that data obtained from the IG transform are farther removed from normality than data obtained with the VM transform, even when controlling for covariance matrix and skewness and kurtosis in the marginal distributions. For example, in the n = 500 condition with severe nonnormality, one gets a much more positive impression of the performance of T SB under the VM transform than under the IG transform (rejection rates are 9.1% and 15.0%, respectively). Since the difference between the IG and VM transforms lies solely in the copula, we have empirically demonstrated that the underlying copula of a distribution can significantly affect SEM goodness-of-fit testing.

Limitations of the IG transform
It is likely that the class of IG distributions contains a quite limited subset of the class of all nonnormal multivariate distributions. The question whether a random vector is generated by some mechanism based on linear combinations of mutually independent random constituents is hard to verify. Hence, there may be types of multivariate nonnormality that may occur in practice but are not obtainable through the IG transform. A more practical and critical limitation is the application of the IG transform for large dimensionality. If d is large, say d > 50, the observed variables will be linear combinations of a large set of independent generator variables, and the resulting sum will tend to be normally distributed, due to the central limit theorem. To counteract this convergence to normality, the IG variables must possess extreme skewness and kurtosis. In theory this is not a problem, but in practice it is our experience that it may be difficult to simulate from univariate distributions with such extreme third-and fourth-order moments. For example, with a target kurtosis of 10, it may be necessary for some IG variables to have a kurtosis of 100. We may speculate that available statistical software routines will not be fully able to generate data with such a high kurtosis. This concern is a topic for further research.
One might wonder whether the two systems of Equations (4) and (5) can be solved for skewness and kurtosis values, respectively. Hovewer, these are linear systems of equations, where a solution does not exist only if the coefficient matrix is singular, in which case the matrix A may be replaced to obtain a solution. Given solutions for skewness and kurtosis in (4) and (5), another problem may arise. Not all combinations of univariate skewness and kurtosis are feasible, and we may end up in such a situation for one of the observed variables. In such a case, one could again replace A. Whether this poses a serious limitation for the IG transform is a topic for future research.
Note that while the VM transform defines one specific distribution for a given target covariance matrix, and given univariate skewness and kurtosis, the IG transform defines a much larger class of distributions. This generality reflects the fact that specifying the covariance and third-and fourth-order moments of the marginal distributions only partially specifies the multivariate distribution. The IG class of distributions, for given covariance and univariate moments, is spanned by two independent choices. First, the choice of A, among the many different square and nonsquare matrices whose product AA equals the target covariance matrix. Each A implies a particular multivariate distribution, so that two different A's will result in two different distributions. Second, each set of distributions for the IG variables implies a particular multivariate distribution. There are many univariate distributions for a given skewness and kurtosis, and a researcher may use Fleishman polynomials, Pearson distributions, or Johnson distributions. Choosing two different sets of IG distributions will result in two different distributions. Finally, we remark that the IG transform can easily be extended to higher order moments-for example, fifth-order-which would involve solving systems of linear equations of the the same kind as Equations (4) and (5).

Conclusion
We have studied the use of independent generator variables for simulating nonnormal data with a prescribed covariance matrix and skewness and kurtosis in each univariate marginal. We have shown analytically and empirically that the IG transform generates data with different multivariate characteristics from the VM transform, even when the two transforms share the same covariance structure and univariate skewness/kurtosis. While the VM transform results in data whose copula is closely related to the normal case, the IG transform produces data with a nonnormal copula. A simple Monte Carlo study demonstrated that two popular SEM goodness-offit statistics perform worse with data from the IG transform than when data are generated with the VM transform. This demonstrates that the use of the IG transform may offer new insights on the robustness of SEM inference to violations of the normality assumption, relative to those gained through the use of the VM transform.
The proposed IG transform may be applied in simulation studies designed to test the sensitivity of normaltheory-based estimators and test statistics like T ML to conditions of nonnormality. A researcher could use the IG transform for data generation and make certain that the necessary condition in Proposition 2 is violated. This will most likely prevent an overly optimistic evaluation of the distributional robustess of normal-theory SEM inference.

Article information
Conflict of Interest Disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.
Ethical Principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.

Funding: This work was not funded.
Role of the Funders/Sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.