Products of Variables in Structural Equation Models

Abstract A general method is introduced in which variables that are products of other variables in the context of a structural equation model (SEM) can be decomposed into the sources of variance due to the multiplicands. The result is a new category of SEM which we call a Products of Variables Model (PoV). Some useful and practical features of PoV models include the estimation of interactions between latent variables, latent variable moderators, manifest moderators with missing values, and manifest or latent squared terms. Expected means and covariances are analytically derived for a simple product of two variables and it is shown that the method reproduces previously published results for this special case. It is shown algebraically that using centered multiplicands results in an unidentified model, but if the multiplicands have non-zero means, the result is identified. The method has been implemented in OpenMx and Ωnyx and is applied in five extensive simulations.


Introduction
The expected covariance matrix of a Structural Equation Model (SEM) which is composed of a linear combination of variables has long been known (Wright, 1921) as well as its equivalence to a method of path coefficients (Wright, 1934). This equivalence was formalized into what is called the Reticular Action Model (RAM) by McArdle and McDonald (1984). Although linear combinations of variables are powerful and useful, nonlinear models have become increasingly important as latent interaction models, latent moderation models, and nonlinear dynamical systems models have become necessary in the current data-rich research environment.
When a variable in an SEM is the outcome of the product of two variables and at least one of the product terms is a manifest variable, there are a variety of methods that can be used to estimate coefficients in the SEM (e.g., Mehta & Neale, 2005;Neale, 1998;Schumaker & Marcolides, 1998). But when both of the product terms are latent, the problem becomes more difficult. Methods have been proposed that require nonlinear constraints on measurement models (the product indicant technique, Kenny & Judd, 1984), approximating non-normal distributions using mixture distributions (the latent moderated structure technique, Klein & Moosbrugger, 2000;Moosbrugger et al., 1997), multiple group approaches and estimating latent scores so that they can be dealt with as manifest variables (Schumacker, 2002), and variations on unconstrained approaches (unconstrained and quasi-maximum likelihood techniques, Marsh et al., 2004Marsh et al., , 2007, the approach proposed and elaborated by Amemiya (2001, 2003), and semiparametric approaches introduced by Bauer (2005) and Bauer and Hussong (2009). This is only a small sample of the large literature on nonlinear SEM, but most of these methods apply only in limited cases or are difficult to implement in current SEM software.
In the current article, we take a different approach and present a novel method for the decomposition and estimation of the variance and covariance of products of variables in SEM models specified in RAM notation. The approach presented here makes the assumption that the non-normality of a variable constructed as the product of other variables can be completely accounted for if a model can account for all sources of variance that are included in the product and that those sources of variance are multivariate normally distributed. The method is currently implemented in OpenMx (Neale et al., 2016) and Xnyx (Oertzen et al., 2015) and requires only a one-line change in the R model script.
We begin our argument by providing an extension to RAM path diagrams (see Boker & McArdle, 2005, for an introduction to RAM matrices and path diagrams) to show how the specification of these models can be a minimal change from current user interfaces. By adding one new node to the palette of RAM path diagrams, we can add products of variables into existing SEM models. We hope that authors of other SEM packages will find that this method is straightforward to implement and has many advantages for end users.
To start, consider the construction of a linear combination: estimated coefficients are constants that are multiplied by variables and then all of these products are summed together and assigned to an outcome variable. For instance, a simple bivariate regression using mean-centered multivariate normal variables can be written as, where x i and y i are the predictor variables and e i is the residual at each row index i, and b 1 and b 2 are constants. If the variance of x, y, and e are V x , V y , and V e , respectively and the covariance between x and y is C xy , a RAM path diagram isomorphic to this bivariate regression can be drawn as in Figure 1a. Compare Equation (1) with the path diagram and note that two things are happening when the three one-headed arrows attach to the box representing the manifest variable z. In the equation, b 1 x i , b 2 y i , and e i are first summed. Second, the result of that sum is assigned to the variable z i . In the equation, summation is represented by one symbol, "þ," and assignment is represented by another symbol, "¼ :" The path diagram fuses together these two symbols from the equation. This is not a problem as long as only linear combinations are used. In that case, summation and assignment are always used together. However, if one wishes to use an n-ary 1 operator other than summation, the assignment operator and the n-ary operator must be disambiguated in the path diagram.
In Figure 1b, a new n-ary operator path diagram node is proposed: the summation operator is represented by a plus sign within a circle. If this summation operator is treated just like a latent variable with zero unique variance, then the algorithmic application of the RAM path tracing rules (Boker et al., 2002) will correctly produce the components of covariance of the expected covariance matrix of any SEM model that is composed solely of linear combinations of variables. Note that in Figure 1b, the summation operator could have been mixed together with standard additionand-assignment while the standard RAM tracing rules would continue to produce the correct components of covariance. In many published SEM path diagrams, the plus sign in a circle is drawn without the plus sign and is called a "dummy variable" whose only purpose is to sum its input arrows and send the sum to any output arrows. We suggest that "dummy variables" are actually summation operator nodes since they do not have any residual variance. Now that the assignment operator has been distinguished from the summation operator, it is possible to propose a new n-ary operation node: multiplication. Consider the simplest case, a product of two predictor variables, where the variables x and y are multivariate normal with mean zero and variances V x , V y , and covariance C xy , and the residual e is independent normally distributed with mean zero and variance V e . A path model for Equation (2) is displayed in Figure 2a, where the n-ary operator multiplication is represented by an asterisk within a circle. Although the summation operator could be treated as if it were a latent variable with zero residual variance, the multiplication operator cannot, as will be demonstrated below. Thus, we will need to specify that an n-ary operator is a fourth type of node in a path diagram: (i) squares represent manifest variables; (ii) circles represent latent variables; (iii) triangles represent constants for means and intercepts; and (iv) n-ary operators are represented by a small circle surrounding the selected operator symbol. This distinction is necessary for two reasons that will be described in more detail later in the article: different n-ary operators have different identity operations and also trigger different path-tracing rules. 1 Operators often take one or more elements from a set back onto that set. A unary operator maps one member of a set onto the set; whereas a binary operator maps two elements of a set onto that set, and a n-ary operator maps n elements onto that set.
Diagramming the problem in this way has numerous advantages. Since scalar multiplication is commutative, neither the vector x nor y has a privileged place in the multiplication, just as one would expect algebraically. This is not explicit in some diagrammatic representations for moderation where a moderating variable is singled out. This might be perceived as just a philosophical difference. However, placing both multiplicands of a commutative binary operation on equal visual footing clarifies that moderation is just interaction without a direct effect of the moderating variable and emphasizes that the SEM network of variables can be perturbed at either multiplicand with an equivalent downstream effect. In addition, by diagramming the model in this way, it sets up how the user interfaces for specifying SEM models that include products of variables can be implemented as a minimal change from current SEM conventions.

Variance of the Product of Two Variables
Given two vectors x and y, each with N elements, Goodman (1960) first derived the variance of the product of two variables and showed that this was an unbiased estimator. If Dx ¼ x À lðxÞ and Dy ¼ x À lðxÞ, the exact variance is given by varðxyÞ ¼ covðDx 2 , Dy 2 Þ þ varðxÞvarðyÞ þ lðxÞ 2 varðyÞ þ lðyÞ 2 varðxÞ þ ðlðxÞlðyÞÞ 2 þ 2lðxÞlðyÞcovðx, yÞ þ 2lðxÞcovðDx, Dy 2 Þ þ 2lðyÞcovðDx 2 , DyÞ À covðx, yÞ 2 : (3) Bohrnstedt and Goldberger (1969) applied this result to the bivariate normal case and showed that the variance of the product, varðxyÞ, can be estimated as (Bohrnstedt & Marwell, 1978, Equation 15) varðxyÞ ¼ lðxÞ 2 varðyÞ þ lðyÞ 2 varðxÞ þ 2lðxÞlðyÞcovðx, yÞ þvarðxÞvarðyÞ þ covðx, yÞ 2 : (4) Bohrnstedt and Marwell (1978) also derived reliability for this estimator when the observed product variable included a normally distributed error term. Now consider the regression equation where x and y are mean-centered, multivariate normal, and the vector of residuals, e, is mean-centered, normally distributed, and independent of the predictor variables x and y. This may be drawn as a path diagram as shown in Figure  2a. A new set of tracing rules will be required when this product symbol is used. However, note that the total variance at the product symbol can be estimated from Equation (6). The downstream contribution of this variance to z will be scaled by b 2 1 : We can thus linearly decompose the variance of z into terms due to the contribution of the error term y and the product of x and y such that varðzÞ ¼ b 2 1 lðxÞ 2 varðyÞ þ lðyÞ 2 varðxÞ þ 2lðxÞlðyÞcovðx, yÞ Â þ varðxÞvarðyÞ þ covðx, yÞ 2 þ varðeÞ: (6) Figure 2. Path diagrams of a product of two variables. The asterisk surrounded by a circle represents the product of the variables connected to it by incoming arrows. (a) Mean-centered variables with a single product term. (b) Equivalent path diagram with variances sources isolated to be independent normally distributed variables with mean zero and variance one.
When lðxÞ ¼ lðyÞ ¼ 0, Equation (6) simplifies to We next derive a general estimator for the variance decomposition of products of variables and apply it to the example problem in Equation (5) to demonstrate how the known estimator for the variance of the product of two variables is a special case result of the general estimator. This method relies on the model in the equivalent path diagram shown in Figure 2b.

Expectations by Method of Moments for Products of Variables
Suppose we are given an SEM as a path diagram in which all nodes with incoming edges are additionally labeled as the sum or product of their incoming edges. For the time being, we will assume that the graph is acyclic (recursive). It may be that the results can be extended to cases with cycles as well, provided that for all node values, an infinite series calculated along each cycle converges.
We are interested in all moments of the vector of all observed variables. The following describes how those can be computed analytically assuming a fixed set of parameters. The main idea is to represent the variance sources of the SEM as independent standard-normally distributed sources and then represent all variables as a polynomial over these sources. Then, all moments become expectations of polynomials, which in turn are sums of monomials. In this way, the computation of the moments is reduced to computing the expectation of a monomial of independently standard-normally distributed variables-a problem that has a known computational solution.
At the highest level, the algorithm proceeds in three steps, 1. All variables in the SEM are represented by a linear combination of some independently normally distributed variables w 1 , … ,w n with known variances such that the covariance matrix of all variables is the symmetrical matrix S from the RAM matrix formulation. 2. Progressing top-down in the asymmetrical graph of the path diagram, polynomial representations of all variables in the w 1 , … ,w n are computed. 3. Polynomial representations of all requested moments are computed and evaluated into numbers.
Suppose that we have an SEM represented in standard RAM notation such that the model-expected covariance matrix of the observed variables, C xx is calculated as where for all variables, both latent and manifest, A is the matrix of regression coefficients, S is the matrix of variances and covariances, and I is the identity matrix. The matrix F filters out the latent variables so that C xx contains only the model-expected covariance matrix of the observed variables. To transform the model into a model with only independent variables, we will operate on S, the matrix of model variances and covariances both latent and manifest.
We first compute the Eigenvalue decomposition of S, Let W ¼ w 1 , :::, w n be independently normally distributed variables with zero mean and variances given by the diagonal entries of D, the Eigenvalues of S, where n is the number of total variables in the SEM. Then QW is an n-dimensional random variable with zero mean and covariance So each variable can be expressed by a linear combination of the W variables with the corresponding row of Q as weights, plus a constant term that gives the mean of that variable.
Using only product and sum operators, every variable in the model (both observed and latent) x i can now be represented as a polynomial of the original observed variables. However, the mathematics of expectations are greatly simplified by using multivariate polynomials of independent (i.e., uncorrelated Gaussian) variables. Thus, we express each variable as a multivariate polynomial, f i , in the independently normally distributed variables w 1 , :::, w n : If there are m variables in total, the k ¼ ðk 1 , :::, k m Þ-th moment of the joint distribution of the vector X ¼ ðx 1 , :::, x m Þ is where the expectation is taken with respect to the roots w i . In particular, M k ðXÞ is the expected value of a polynomial in w i , where the coefficients are a combination of the regression weights in the SEM. Let this polynomial be g ¼ Q m i¼1 f k i i : This polynomial is a sum of monomials in w i . Since the expectation is linear, we can separate the computation of the expectation to the single monomials, and thus reduce our problem to computing the expectation of a monomial of independently normally distributed variables, i.e., an expectation of the form E w e 1 1 Á Á Á w e n n À Á which is given by the product of the expectations of the single variables with their exponents, For completeness sake, we give a quick proof: Proof.
EðW ei i Þ Thus, we are left with computing the higher-order moments of independent standard-normally distributed variables with zero mean and unit variance, where e is a positive integer. These moments are known (e.g., Papoulis & Pillai, 2002) to be

Example 1: Bivariate Product of Variables Regression
As an example, we will transform the SEM model in Figure 2a into the equivalent SEM model shown in Figure 2b such that all sources of variance and covariances are independent, normally distributed variables with mean zero and unit variance. The first step is to remove all covariances between variables by replacing them with unit variance sources. Thus, we add the latent variable w 2 and replace the covariance path with the value C xy in Figure  2a with regression paths from w 2 to x and y with values C 1 2 xy : Thus, the covariance between x and y can be calculated as C 1 2 xy Á 1 Á C 1 2 xy ¼ C xy : However, now the variances for x and y become residual variances that must be reduced by the total effect of w 2 which is C 1 2 xy Á 1 Á C 1 2 xy ¼ C xy : So, the residual variance of x is V x À C xy and the residual variance of y is V y À C xy : We can now replace the variance terms in Figure 2a with the independent normally distributed unit variance variables w 1 , w 3 , and w 4 and regression paths to x, y, and e, respectively. The regression weights for these variables become the square root of the residual variances for x, y, and e as shown in Figure 2b. The ith variable in the path diagram in Figure  2b can now be represented as a polynomial f i of what we will refer to as root nodes, i.e., the independent, normally distributed variables w 1 , :::, w n with zero mean and unit variance. This transformation of an SEM model into polynomials of root nodes can be applied to any SEM that can be represented as a RAM model, including models that have n-ary operators, such as introduced here. We will call models that conform to RAM conventions along with addition and product n-ary operators Products of Variables (PoV) models.
Returning to the example in Figure 2b, we can compute the moments, including all joint moments of any variable or pair of variables. Thus, we can decompose the variance of the product variable z ¼ b 1 xy þ e into components of variance and covariance of the other variables in the SEM as follows: We next transform the variables x, y, and e into their monomial equivalents, Next, we substitute Equations (20)- (22) into Equation (19) and then expand, collect, and cancel terms to find that which is the result in Equation (7) derived from the results of Bohrnstedt and Marwell (1978) and Goodman (1960).
The complete derivation of this result is lengthy but is included in the supplemental material for this article.
Following the same logic, we can derive the expected covariance matrix of the model z ¼ b 1 xy þ e as What becomes apparent here is that this model is unidentified, that is to say, b 1 and V e only appear in one cell of the matrix, and in that cell they form two parts of a sum. Thus, a smaller b 1 can be compensated by a larger V e and vice versa. In addition, since b 1 only appears as a squared term, it is unidentified with respect to its sign. Thus the maximum likelihood solution to this problem is not a single point but lies on a line where all values on that line are equally likely.
When pre-multiplying two variables and then adding them into a regression equation as a way of estimating an interaction, one is taught to always subtract the mean of each variable before multiplying them together. This preprocessing is done to remove spurious covariances between the multiplicands and the outcome. But in the case of the current PoV method of estimating the effects of products, we can account for these covariances and they provide a way to resolve the underidentification of product coefficient and error term. If we do not remove the means before entering the multiplicands into the model and then estimate the means as part of a full information maximum likelihood solution, the model becomes fully identified.
When the expected means and covariances are derived for the same model z ¼ b 1 xy þ e we find C xy Þl y C xy ðl y þ l x Þ 2 À l 2 x l 2 y Þ þ VðeÞ Again, the full derivation of this result can be found in the supplemental materials.

Simulations
We ran a series of simulations to test the performance of the implementation of the PoV method in OpenMx on five common use cases. The only new syntax required in scripting a PoV model is that one must add a line declaring productVars¼ with the names of any product nodes shown as a circle around an asterisk in the following path diagrams. Then one may use the standard mxPath() statements to specify the paths between named variables. All R scripts for each of the example models simulated and estimated below are included in the supplemental materials.
The simulation resulted in 99.8% convergence. The estimated parameters from each of the 9,720 data sets were compared to the generating coefficients for that data set. Mean relative bias was calculated as the signed difference between the generating and estimated coefficient divided by the generating coefficient. The mean relative bias for b 1 was 0.046 and for b 2 was <0.001. Assuming an alpha level of 0.95, coverage as calculated by 1.96 times the parameter standard error was 0.974 for b 1 and was 0.958 for b 2 . Thus, standard errors were correct for the direct effect, b 2 , but were conservative for the product effect, b 1 . The model was estimated on the same 9,720 data sets but with b 1 fixed to zero and the likelihood ratio test was calculated by subtracting the minus two log-likelihood of the full model from the model with the parameter set to zero. Again assuming an alpha level of 0.95 and a v 2 test with one degree of freedom, this resulted in a type I error rate of 0.0494. Thus, although the standard error for the product direct effect is conservative, the likelihood ratio test performs exactly as expected.

Simulation Two: Manifest Variable Moderation with Missing Values
One of the problems with current methods used to estimate moderation is that they are not able to account for missing data. To test whether the PoV method could account for missingness in the same way as full information maximum likelihood, we re-ran Simulation One where we substituted a random selection of 20% of the values of y and a separate random selection of the values of z with NA. Thus, this fits with the definition of values of y and z being missing completely at random. Figure 4 presents the results of this simulation.
This simulation resulted in 97.3% convergence. The mean relative bias for b 1 was 0.015 and for b 2 was <0.001. Assuming an alpha level of 0.95, coverage as calculated by 1.96 times the parameter standard error was 0.969 for b 1 and was 0.950 for b 2 . Clearly, full information maximum likelihood is working as expected when data are missing completely at random.
Missingness that can be accounted for by the model, the case of missing at random, was also tested in the simulation for this model. Here, missingness in y was set to be conditional on the value of x. When x was more than one standard deviation above its mean, y had a 50% chance of being missing. In addition, z had a 20% chance of being missing, but not conditional on any part of the model. Thus y was missing at random and z was missing completely at random. This simulation resulted in 97.4% convergence. The mean relative bias for b 1 was 0.049 and for b 2 was 0.003. Assuming an alpha level of 0.95, coverage as calculated by 1.96 times the parameter standard error was 0.811 for b 1 and was 0.855 for b 2 . Thus, the missing at random case increased bias by a very small amount but decreased coverage considerably in both the direct effect and the product effect.

Simulation Three: Latent Variable Moderation
A moderation model with a latent variable moderating a second latent variable and a latent outcome as shown in Figure 5 was simulated where the parameters took on one of the values b 1 ¼ fÀ0:5, 0, 0:5g, b 6 ¼ fÀ0:5, 0, 0:5g, l x ¼ fÀ0:5, 0:5g, l y ¼ fÀ0:5, 0:5g, C xy ¼ fÀ1, 0, 1g, and V e ¼ f0:1, 0:6, 1:1g: The variance of the predictor latent variables was set to V x ¼ V y ¼ 3:0 and the residual variance of the outcome latent variable was set to V z ¼ V e . The loadings for the latent variables were set to b 2 ¼ b x2 ¼ b y2 ¼ fÀ:05, 0:5g and b 3 ¼ b x3 ¼ b y3 ¼ fÀ:05, 0:5g and the unique variances were all set to the current value of V e . This resulted in 3 Â 3 Â 2 Â 2 Â 3 Â 3 Â 2 Â 2 ¼ 1296 conditions. Each condition was replicated 8 times, resulting in a total of 10,368 data sets, each with N ¼ 1, 000 simulated participants. Before simulating the data, a normally distributed random number (l ¼ 0, r ¼ 0:1) was added to each parameter to better cover the parameter space.
The simulation resulted in 99.9% convergence. The mean relative bias for b 1 was À0.092 and for b 6 was <0.001. Assuming an alpha level of 0.95, coverage as calculated by 1.96 times the parameter standard error was 0.975 for b 1 and was 0.958 for b 6 . Thus, standard errors were correct for the direct effect, b 6 , but were conservative for the product effect, b 1 . The model was estimated on the same 9,720 data sets but with b 1 fixed to zero and the likelihood ratio test was calculated by subtracting the minus two log-likelihood of the full model from the model with the parameter set to zero. Again assuming an alpha level of 0.95 and a v 2 test with one degree of freedom, this resulted in a type I error rate of 0.0221. Thus, both the standard error for the product direct effect and the likelihood ratio test were conservative. This was likely due to the fact that in the latent simulation we allowed the true product effect b 1 to take on values very close to zero.
The simulation resulted in 98.2% convergence. The mean relative bias for b 1 was 0.083, for b 6 was À0.001 and for b 7 was < À0:013: Assuming an alpha level of 0.95, coverage as calculated by 1.96 times the parameter standard error was 0.992 for b 1 , 0.983 for b 6 , and 0.984 for b 7 . Thus, standard errors were conservative for the product effect, b 1 , and the two direct effects, b 6 and b 7 .

Simulation Five: Squared Predictor Variable
A model with a predictor variable, x and its square, x 2 on a latent outcome as shown in Figure 7 was simulated where the parameters took on one of the values b 1 ¼ fÀ0:5, 0:5g, b 2 ¼ fÀ0:5, 0:5g, b 3 ¼ fÀ0:5, 0:5g, b 4 ¼ fÀ0:5, 0:5g, l x ¼ fÀ1:0, À 0:5, 0, 0:5, 1:0g, and V e ¼ f0:1, 0:35, 0:6, 0:95, 1:1g: The variance of the predictor variable was set to V x ¼ 3:0 and the residual variance of the outcome latent variable was set to V z ¼ V e . The unique variances were all set to the current value of V e . This resulted in 2 Â 2 Â 2 Â 2 Â 5 Â 5 ¼ 400 conditions. Each condition was replicated 25 times, resulting in a total of 10,000 data sets, each with N ¼ 1,000 simulated participants. Before simulating the data, a normally distributed random number (l ¼ 0, r ¼ 0:1) was added to each parameter to better cover the parameter space.  The simulation resulted in 99.9% convergence. The mean relative bias for b 1 was 0.008 and for b 4 was 0.001. Assuming an alpha level of 0.95, coverage as calculated by 1.96 times the parameter standard error was 0.999 for b 1 and 0.972 for b 4 . Thus, standard errors were conservative for the product effect, b 1 , and the direct effect, b 4 .

Discussion
Estimation of SEM models with Products of Variables is a general method that can provide unbiased estimates of parameters when predictor variables are normally distributed. Coverage of parameters of products is higher than expected when calculated using standard error estimates, but the likelihood ratio test appears to perform as expected.
The primary advantage of the PoV method is in its generality. Most alternative methods for estimating parameters of SEM models containing products focus on special cases and involve complicated constraints or estimation of parameters with non-normally distributed variability. The PoV method is easy to use as it has been implemented in OpenMx and Xnyx. The method can handle interaction and moderator variables that are either latent or have missingness. We expect that there are many more models other than the obvious moderation and interaction models for which PoV estimation can be used.

Interaction vs. Moderation
The use of the product calculation node in path diagrams has clarified for us the difference between the interaction between variables and the moderation of one variable by another. Examine the difference between Figures 5 and 6. These two diagrams are exactly the same other than the direct effect between y and z that appears in the interaction diagram, Figure 6, but does not appear in the moderation diagram, Figure 5. Thus, the only thing that distinguishes a moderating variable from the other multiplicand is its lack of a direct effect on the outcome. It becomes clear that when one has a product of two variables, say z ¼ x Á y in a model, either x or y or both may be considered to be a moderator depending on the presence or absence of a direct effect of x or y on z.
The reader may wonder why we did not use a classic interaction model with direct effects as an example model to illustrate the procedure. Figure 8 illustrates why this model is unidentified using the current method. The reason why it is unidentified as a structural model predicting covariances is apparent by counting seven free parameters while there are only six degrees of freedom in a 3 Â 3 covariance matrix. One must watch for local underidentification in models that include products of variables. The overall model may be globally identified as determined by counting parameters and degrees of freedom in the covariance matrix, but there still be areas of local underidentification resulting in sign reversals, such as those seen in simulations 1 and 4.
It may be more difficult to see why identification works when one premultiplies two manifest variables and creates a third variable for an interaction model. In this case, there is a 4 Â 4 covariance matrix and so there are no negative degrees of freedom. But why is the 4 Â 4 covariance matrix of full rank? This is due to the contributions of higher-order moments introduced by the multiplication formed at the individual data rows when the premultiplication is performed. This model is identified because a product of two normal distributions is not itself normal.

Product Calculation Nodes and the Identity Function
One feature of PoV is its reliance on a product node extension of RAM model matrices. Although OpenMx has chosen to call this a ProductVar, it is not really a variable. As such it does not have a mean or a variance, it is simply a node that indicates the multiplication operation. Thus although a product node occupies a row and column in the RAM matrices, OpenMx prevents the user from connecting a variance, covariance, or mean path to this calculation node. However, if one examines the RAM matrices, it becomes apparent that there is an automatically specified mean path to every product node with a fixed value of 1.0. Why is that?
In RAM, removing a path and setting a path to zero are identical operations. However, this is not the case in PoV. Note that the identity function for addition is to add zero whereas the identity function for multiplication is multiply by one. If there is no desired effect between the predictor and outcome in a normal SEM, a regression path from the predictor to the outcome is set to zero. However, if a regression path pointing from one multiplicand to a Figure 8. A manifest variable interaction model with both direct effects has more parameters than statistics and thus is unidentified. multiplication operator symbol is set to zero in a PoV diagram, then the other multiplicand is multiplied by zero. This may not be the intention of the modeler. When one wishes to use a likelihood ratio to test the difference between a model with and without a product, we recommend to use the outgoing path from the product node as the path to set to zero for the one degree of freedom minus two log-likelihood comparison.

Assumptions and Limitations
The algorithm in this article makes the strong assumption that the variables in a PoV allow the transformation of the model into an equivalent model in which all variance sources are represented as linear combinations of independent normally distributed standardized variables. This is a relaxation of the usual SEM assumption of multivariate normality. The outcome of a product of variables will not be normally distributed. However, the PoV assumption is that the non-normality of a product outcome variable can be completely accounted by the non-normality generated by multiplying multivariate normal variance sources. Thus, all variables that are not outcome variables must be multivariate normal. To be explicit, this means that all residuals must be normally distributed, including the residuals of variables that are outcomes of products of variables.
This article presents a novel addition to the already complex infrastructure of structural equation modeling. We acknowledge that there may be special cases that need to be tested before products of variable models can be recommended in those cases. We recommend using simulation and either OpenMx or Xnyx to verify models that include PoV structures that go beyond the common use cases presented here.
For instance, since the PoV relies on the normality of predictor variables, ordinal predictors might not be able to be used as a multiplicand. If one has ordinal manifest variables that one wishes to use in a product, we recommend a method, such as OpenMx definition variables (Neale, 1998;Neale et al., 2016). In principle, the method for estimating a normally distributed latent variable from ordinal indicators as implemented in OpenMx should be able to be used to create a latent multiplicand, but we have not yet fully tested this case.

Conclusions
Products of variables can be included in structural equation models and model expectations can be calculated using the methods introduced here. This means that products of combinations of manifest and latent variables are now possible to implement in standard SEM software in an automated and easy-to-specify way. Three important and commonly used models that will immediately benefit from PoV estimation are (i) models with interactions between latent variables, (ii) latent moderator models, and (iii) moderator models with missingness in the moderator variable.
Future work will certainly find novel and interesting uses for the PoV method. One case that we have begun to explore is the use of PoV to extract a "factor of paths," creating a latent variable that accounts for the commonality between coefficients that include person-specific variance. This could be construed to be a special case of multilevel models with random coefficients and thus we may find that PoV estimation may prove useful for multilevel models. A second case that we have begun to explore is nonlinear dynamical systems models. Differential equations with squared and cubic terms are at the heart of dynamical systems that exhibit bifurcation and/or chaotic dynamics. We expect that PoV estimation will prove useful in fitting these sorts of models to complex physiological and behavioral time series.
We believe that the algorithm presented here represents a paradigm shift for representing theories within the context of structural equation models. We look forward to seeing PoV estimation appearing as a feature in software other than OpenMx and Xnyx and with that in mind, we refer SEM software authors to our open-source code available on GitHub.