Comparing the Slack-Variable Mixture Model With Other Alternatives

There have been many linear regression models proposed to analyze mixture experiments including the Scheffé model, the slack-variable model, and the Kronecker model. The use of the slack-variable model is somewhat controversial within the mixture experiment research community. However, in situations that the slack-variable ingredient is used to fill in the formulation and the remaining ingredients have constraints such that they can be chosen independently of one another, the slack-variable model is extremely popular by practitioners mainly due to the ease of interpretation. In this article, we advocate that for some mixture experiments the slack-variable model has appealing properties including numerical stability and better prediction accuracy when model-term selection is performed. We also explain how the effects of the slack-variable model components should be interpreted and how easy it is for practitioners to understand the components effects. We also investigate how to choose the slack-variable component, what transformation should be used to reduce collinearity, and under what circumstances the slack-variable model should be preferred. Both simulation and practical examples are provided to support the conclusions.


INTRODUCTION
Mixture experiments are commonly used in many fields such as chemical, pharmaceutical, and food industries. In such experiments, the product is developed by mixing different ingredients or components. Different from other types of experiments, the design variables in mixture experiments are the proportions of different components. In this article, we focus on the mixture experiments with proportions as the only input variables. Other types of mixture experiments involving the total amount of the mixture or certain process variables (Piepel and Cornell 1985;Kowalski, Cornell, and Vining 2002;Goldfarb et al. 2004) are out of the scope of this article.
For a mixture experiment of q components, their proportions x 1 , x 2 , . . ., x q must satisfy x 1 + x 2 + · · · + x q = 1. (1) This constraint differentiates mixture experiments from other types of experiments. Not only does the experimental design region become constrained, but the resulting model from a mixture experiment also has to satisfy the constraint. The most frequently used models include the Scheffé model (S-model) and the slack-variable model (SV-model). The Kronecker model (Kmodel) is less commonly used but it is recommended in Prescott et al. (2002) because its Fisher information matrix has a smaller condition number than that of the S-model in some cases. The quadratic order S-, SV-, and K-models have the form S-model : E (Y ) = β 1 x 1 + β 2 x 2 + · · · + β q x q + β 12 x 1 x 2 + · · · + β q−1,q x q−1 x q , SV-model : E (Y ) = γ 0 + γ 1 x 1 + γ 2 x 2 + · · · + γ q−1 x q−1 K-model : E (Y ) = α 11 x 2 1 + α 22 x 2 2 + · · · + α qq x 2 q +α 12 x 1 x 2 + · · · + α q−1,q x q−1 x q . (4) There are several challenges in modeling mixture experiments. First, the regular polynomial regression models, such as the second-order polynomial E (Y ) = β 0 + β 1 x 1 + β 2 x 2 + · · · + β q x q + β 12 x 1 x 2 + · · · + β q−1,q x q−1 x q + β 11 x 2 1 + · · · + β qq x 2 q , contain the intercept term and thus cannot be directly used for mixture experiments. The linear constraint in Equation (1) must be imposed on the relevant terms in the model. Otherwise, the regular polynomial model matrix F generated based on the constrained mixture design will not be of full rank and thus the information matrix F F will become singular. Different representations of the constraint will lead to different models. To convert the regular second-order polynomial model into the S-model (2), the intercept β 0 is multiplied by x 1 + x 2 + · · · + x q , and the quadratic terms are replaced by x 2 i = x i (1 − q j =i x j ) yielding (2) after simplification. The SV-model can be obtained from the S-model by replacing one component proportion, for example x q , with 1 − q−1 i=1 x i . The variable x q is called a slack variable. The K-model (4) can be generated from the S-model by replacing each linear term with x i = x i (x 1 + x 2 + · · · + x q ), so it only contains the second-order terms.
Second, interpreting mixture models is quite difficult. For the S-and K-models, since all the proportions x i 's are involved in the models, increasing/decreasing one unit of x i means that the sum of the other q − 1 components has to be decreased/increased by one unit, but it is uncertain how each of x 1 , . . . , x i−1 , x i+1 , . . . , x q should be changed. So the first-and second-order model terms should not be interpreted as the main effects and interactions as they would be in nonmixture models. There are some useful techniques for interpretation of the S-model, such as the Cox and Piepel effect directions (see Smith 2005, pp. 257-268). But compared with these interpretation methods, the SV-model does offer a more straightforward way to understand the components effects for practitioners, which are to be discussed in detail in Section 2.
Third, also due to the constraint in Equation (1), model-term selection has to be done carefully. For the S-model, none of the first-order terms should be removed from the model; otherwise, the constraint Equation (1) is violated. For the same reason, none of the second-order terms in the K-model should be removed as they are all involved in the constraint via x i = x i (x 1 + x 2 + · · · + x q ) for i = 1, . . . , q.
Finally, the information matrix of mixture models can easily become ill-conditioned. Denote F n×k as the model matrix of n runs and k terms (including the intercept if the model contains it). The information matrix is then equal to F F, with (F F) −1 being the covariance matrix of the estimated coefficients (without the variance of the noise). It is well known that for the linear regression model if the information matrix is illconditioned, the least-square estimation becomes numerically unstable, that is a small perturbation in the response y will cause large change in the least-square estimates. But the unknown true relationship between the response and component proportions should not change much for a small perturbation. In the context of this paper, numerical stability refers to the numerical stability of the least-square estimation, which essentially depends on the condition of the information matrix. One obvious cause of the ill-conditioning problem is that for some mixture models, such as the S-and K-models, some columns of F are dependent on each other because of Equation (1). Other reasons are related to the experimental design. In some experiments, certain component proportions have to be varied in a narrow range or even close to zero. One such example is shown in Section 2 where the minimal range of components is [0, 0.5%] resulting in the condition numbers of the mixture models to be quite large. All of these factors can lead to collinearity of the model matrix. In some very complicated mixture experiments, such as the one in Piepel, Cooley, and Jones (2005), more linear constraints are involved and the collinearity problem can become more severe. Sometimes the design-related issues could be overcome by using a linear transformation of the design variables, such as the L-and U-, and modified-pseducomponents transformation. But there are no universal rules to deal with the ill-conditioning and collinearity problems as discussed in many papers including Montgomery and Voth (1994), Prescott et al. (2002), and Cornell and Gorman (2003). In our context, the three concepts of numerical stability, condition of the information matrix, and collinearity essentially refer to the same concept and sometimes we write them interchangeably. We use the common criteria of the condition number of the information matrix and the variance inflation factor (VIF) values to measure the numerical stability.
Different mixture models have different pros and cons. The S-model enjoys a great popularity because it incorporates the constraint Equation (1) in the simplest way but can cause some computational problems. Since it is very rare for the regression model not to include the intercept, some statistical software such as R does not recognize the S-model as a mixture model. For models without an intercept, R simply assumes that the mean of the response is 0 and the response is equal to 0 under the null hypothesis. Consequently, the other statistics and inferences calculated based on this assumption, such as R 2 , adjusted R 2 , F-ratio, and ANOVA, are incorrect. If users neglect this detail, they can be misled by the software output. The correct calculation of these statistics is very basic and can be found in Cornell (2002). In addition, it may not be possible to interpret the effects of the S-model and model-term selection of the effects is also restricted. The K-model sometimes has a less ill-conditioned information matrix than the S-model, but it shares all the other disadvantages of the S-model. In this article when comparing different models, the common criteria R 2 and R 2 adj , model size, and the estimated standard errorσ are used to evaluate the model fit, the simplicity, and the precision of the models, respectively. Prediction accuracy is another important aspect in our comparison and it is measured by the prediction error, defined by E(Y (X) −Ŷ (X)) 2 (Hastie et al. 2009) and can be estimated by new test data, which are unfortunately not available in all the examples here. Thus we use cross-validation, the simplest and most widely used method to estimate prediction error (Hastie et al. 2009).
In this article, we discuss how the SV-model can overcome these problems and under what circumstances it performs better than the others. In Section 2, we use one practical example to illustrate why the SV-model is so attractive to practitioners. In Section 3, we study which component proportion should be used as the slack variable to minimize the collinearity in the SVmodel. We compare the SV-model with the S-and K-models in Section 4 based mainly on the prediction accuracy and numerical stability. Examples and a simulation study are also shown in Section 4. Section 5 provides the conclusion. Some more extended simulation studies are contained in the supplement.

Filler and Its Effects
A typical example of what is often referred to as a filler is water within shampoo, detergent, and many other liquid products. For some solid products, flour acts as filler within products such as baked goods, lactose, microcrystalline cellulose, and cornstarch. Sugar is the common filler in drug tablets, caplets, and pills. Fillers are so common that they exist in most mixture products. There are two rules of thumb for telling if a component should be considered as a filler. First, fillers are usually not an active part of the formulation, meaning that they do not actively affect certain properties of the mixture product. Second, fillers often make up a large percentage of the entire mixture or have the largest range of possible proportion values. There are exceptions to these rules. However, the main way to determine a filler is to use expert knowledge to understand how the formulators think about their mixture components. If a component is used to "fill in" the mixture after the other components have been determined, then this "fill in" component should be treated as a filler.
The effects of the fillers over the entire mixture are different from the other components. Take sweet tea as an example. We can increase the sweetness of the tea by adding more sugar or by reducing the amount of the water. In both ways, the percentage of sugar is increased and the percentage of water is decreased. But when people explain why the tea has become sweeter, most would say that it is because the percentage of sugar is increased. One would say the sugar is the ingredient that affects the sweetness of the tea. But this does not mean that water has no effect. It only means that the water affects the sweetness negatively as a solvent.
For an experiment with a filler component x q , if x q is used to "fill in" the formula so that Equation (1) is satisfied, then both the design and analysis can be performed through classic nonmixture techniques. For the analysis, it is intuitive to let the filler proportion be the slack variable. As the slack variable, the filler proportion does not appear in the SV-model and we simply interpret all the other component effects with respect to the filler which greatly helps practitioners understand how each component affects the overall mixture property. For the experimental design, if the nonfiller components only have to meet lower and upper bounds without any other constraints, and the upper bounds are such that q−1 i=1 U i ≤ 1, we can then use classical experimental design methods such as fractional factorial designs, orthogonal arrays, and optimal designs for the nonfiller components and use the filler to fill in the mixture.

A Real Example
Following is a real mixture experiment for a hair product. There are eight active components and water, which is considered as the filler. The upper and lower bounds of each component proportion (in percentage) and the range of the response y are shown in Table 1. We have used a 61-run I-optimal design (no replicated points, generated by JMP Pro 10 R ) on x 1 ∼ x 8 for the complete quadratic SV-model Equation (5) with water as the Then, four replicates at the centroid of the constrained region are added. So, the total run size is 65. First, we fit the model in Equation (5) and then use stepwise regression to obtain a reduced model. BIC is used to select the model terms. The parameter estimates and model fit summaries are in Tables 2 and 3. We estimate the noise varianceσ 2 as the sample variance of the four replications and calculate the lack-of-fit test as SSE−3σ 2 df SSE −3 /σ 2 as in Table 3. The ANOVA F-test statistic of the reduced model with respect to the complete model in Equation (5) is 1.2179 with a p-value of 0.33. The analysis shows that the reduced model contains less than half of the parameters in the complete model. The design and analysis for this experiment are certainly routine as they follow a classical design and analysis approach. Any standard statistical software is sufficient for the task. Model-term selection is a necessary step since it produces a much more parsimonious model with smaller estimated variance and improved precision. If S-or K-model were used, the design and analysis would not be so convenient.
More importantly, the interpretation of the SV-model is clear and simple: when the proportion of a nonfiller component is varied, the proportions of other active components remain the same and only the filler amount is changed to make up the difference. Therefore, it is very easy to understand the role of an individual component. For the hair product example, Figure 1 shows the prediction profiler and confidence intervals of E(Y ) for all the factors based on the reduced model. From Figure 1, we can see that components x 1 , x 4 , x 7 , and x 5 affect E(Y ) linearly. More specifically, E(Y ) has strong positive trend with x 1 and x 4 , strong negative trend with x 7 , and less obvious negative trend with x 5 . The components x 2 , x 3 , x 6 , and x 8 strongly affect E(Y ) quadratically. The readers should be reminded that all the interpretations of the prediction profilers in Figure 1 are with respect to the water, meaning that each profile is plotted by varying the corresponding component proportion while the other nonwater components are fixed at the midpoints and only the amount of water is changed accordingly. If the other nonwater components' proportions are fixed at other values, there are some slight changes to the plot. This can be observed from an interactive prediction profiler plot.
The individual linear, quadratic, and interaction model terms can be interpreted in the same way as other classical regression models. The only difference is that the interpretations are with respect to the filler. The significant linear effect β i indicates significant main effect of the corresponding component. Similarly, a significant quadratic effect indicates that increasing the corresponding component while decreasing the filler the same amount, E(Y ) would change in a nonlinear fashion. The interactions can also be interpreted in the usual way.
Although it is not so in this example, sometimes a nonfiller component would disappear from the SV-model after the modelterm selection. In this case, we should not consider that this nonfiller component has no influence over the mixture property measured by the response. Instead, it inactively affects the mixture property as a solvent, similar to the filler, whose related effects are not contained in the model either. To further simplify the structure of the mixture, we can even combine this nonfiller component and the filler into a new filler. Such structure is known as the mixture-of-mixtures experiment (Kang, Joseph, and Brenneman 2011).
There is another hidden advantage of the SV-model. In the case described at the end of Section 2.1, if an SV-model is used then an orthogonal or optimal design can be applied on the nonfiller components resulting in a well-conditioned information matrix. In the hair product experiment, an I-optimal design (without the four replications) for the SV-model is used. The condition number (using the original scale) of the SV-model, K-model, and S-model are 77,373, 246,565, and 6,887,547, respectively, so the SV-model has the smallest condition number. If we use different linear transformation methods on the component proportions, Table 7 in Section 4 shows that the SV-model still has the smallest condition number compared to the other two models. Prescott et al. (2002) showed that the quadratic K-model has the smallest λ max of the information matrix compared to its equivalent counterparts, and they also conjectured that the K-model has better conditioned information matrix. However, having the smallest λ max cannot guarantee the smallest condition number, and the conjecture is not true as counterproved by the hair product example.

Literature Review and Comparisons With S-Model
The SV-model shows up frequently in the research literature. Recent papers include Cornell (2000) and Khuri (2005). The latest paper by Piepel and Landmesser (2009) provided a complete review of the SV-model. The authors summarized four situations where SV-models are often used in practice. However, the common view in the literature is that the SV-model "can be misleading in making conclusions and in developing models for response variables," thus other mixture models such as the S-model, K-model, or component-slope linear mixture model (Piepel 2006(Piepel , 2007 should be used for mixture experiments. However, as discussed earlier, the SV-model provides a simpler interpretation than the S-and K-model when the filler component is used as the slack variable, which is an advantage worth addressing. Certainly, we need to be careful not to misinterpret some model effects as being causal, so domain knowledge is very important in understanding the components properties from the SV-model. In the S-model, the effects are confounded due to the constraint in Equation (1). Therefore, there is no way to tell if a "significant" effect is actually statistically significant or it is due to its confounding effects. For example, assume the effect x i is "significant" for the linear S-model E (Y ) = q l=1 β l x l . If we change x i to x i + , the sum of the other proportions q j =i x j has to be changed to q j =i x j − . How should be divided into the rest of the q − 1 proportions? There are some  techniques for interpretation of the S-model, such as the Cox and Piepel effect directions (see Smith 2005, pp. 257-268), however in practice these interpretations are very difficult for practitioners. For an ordinary linear model, the mean response E(Y ) should be changed to E (Y ) + β i . But for the S-model, it is not clear how E(Y ) should be changed as we need to account for how x j for j = i should be changed individually. Therefore, the classic interpretation of the significant effects cannot be directly used for S-model or any other models that have equality constraints between effects. Great care is needed in interpreting the quadratic mixture effects as well.
To conclude Section 2, we recommend using the SV-model for mixture experiments when a filler component is identified and there are no lower and upper bounds (other than [0, 1]) on its proportion. Meanwhile, the upper bounds U i of the nonfiller components sum to at most 1 ( q−1 i=1 U i ≤ 1) and the lower bounds L i of the nonfiller components sum must be larger than 0 ( , to avoid the case that the mixture is made purely from the filler. Under these two conditions, we can use a nonconstrained design on the nonfiller components' proportions, which is a much easier task than the constrained mixture experimental design. It is then a natural choice to use the SV-model with the filler proportion as the slack variable and interpret effects with respect to the filler proportions. This method is called the slack variable approach in the literature.

CHOICE OF SLACK VARIABLE
If the filler ingredient is known, we can just use its component as slack variable in the SV-model. But there are cases when it is not clear which component should be the filler. The SV-model can still be used in order to alleviate the collinearity problem or to achieve more accurate predictions. The choice of the slack variable is crucial to the performance of the SV-model as shown in Cornell (2000).
If a filler component is not involved in the experiment, then it does not matter which component is used as the slack variable in the sense of interpretation. Therefore, we should choose the slack variable for the sake of numerical stability and prediction accuracy. Here, we only focus on the numerical stability of the model, since it is well known that model-term selection is more accurate for models with less collinearity.  To quantify the collinearity, two criteria are commonly used in the literature. The first one is the condition number of the information matrix. Let λ max and λ min be the maximum and minimum eigenvalues of the positive definite matrix F F. The 2-norm conditional number (CN) is cond( The other criterion is the variance inflation factor (VIF). The VIF associated with the regression coefficient β j for the jth column in F is given by VIF(β j ) = (1 − R 2 j ) −1 , where R 2 j is the coefficient of determinant for regressing F(, j) against all the other columns F(, −j ). Here, the notation A(, k) denotes the kth column of A and A(, −k) denotes A without the kth column. The larger the VIF is, the worse the collinearity is for the effect β j . To evaluate the overall collinearity level of a model, we propose the maximum VIF (maxVIF) and the mean variance inflation factor (MVIF), where k is the number of effects in the model excluding the intercept. For the same number of components and the fixed order of the SV-model, using any component proportion as the slack variable results in SV-models with the same number of effects. This allows us to compare the maxVIF and MVIF for each SV-model. Both maxVIF and MVIF are often used as indicators of the severity of collinearity. It can be shown that the expected value of the sum of the squared errors of the estimated coefficientsβ j 's (the standardized version) is proportional to MVIF, and the largest VIF is proportional to the largest squared error of the estimated coefficient (Kutner et al. 2004).
To choose the best SV-model using different components as slack variables, one can directly construct each possible SVmodel matrix and choose the best one according to the aforementioned criteria. It is not a difficult task if there are not many components in the mixture. But this is not an ideal method if there are a large number of components. To avoid the extra computation, we propose a more direct criterion that is only based on the design of the mixture experiment.

Correlation Criterion
Denote X as the design matrix for the mixture experiment of total q components and n experimental settings, and thus x ij is the j th component's proportion of the ith run. Define r ij as the correlation between the ith and jth columns of X, and calculate it by where X(, i) is the ith column of X,X i is the sample mean of X(, i), and 1 is a column of 1's. To evaluate whether the ith column is collinear with any other columns, we can use its average squared correlation with all the other q − 1 columns. Denote it as r 2 i , and it can be calculated by where E = I − 1/n1 * 1 and I is the n × n identity matrix. We choose the component that has the largest r 2 i as the slack variable.
The intuition behind this criterion is very straightforward. If the ith component has the largest r 2 i , it indicates that X(, i) is mostly correlated with some or all the other columns in X and thus it is likely to be heavily correlated with the quadratic order columns too. If we include this component in the model, we should expect the worst collinearity especially between the ith component's effects (linear or second-order) and all the other effects. This intuition can be explained using a simple 3-component mixture and the linear SV-model. If we use the 3rd component as slack variable, the SV-model is The MVIF for this model is Here, R 2 1 is the coefficient of determination for the simple linear regression equation X 1 = α 0 + α 1 X 2 . We know that for simple linear regression R 2 1 = r 2 12 . So, the MVIF for Equation (6) is In the same way, we can derive the MVIF values for SV-models using the 1st and 2nd component as slack variable, respectively Without loss of generality, let r 2 3 be the largest compared to r 2 1 and r 2 2 meaning that r 2 12 is smaller than both r 2 23 and r 2 13 . Thus MVIF 3 is the smallest among all and Equation (6) has the least collinearity. Apparently, the SV-model represented by Equation (6) also has the smallest maxVIF value which is equal to VIF(β 1 ) = VIF(β 2 ) = 1/(1 − r 2 12 ). This shows that using the component with the largest r 2 i as the slack variable can lead to the SV-model with the smallest collinearity, and thus the most numerically stable model. This proof cannot be extended to general mixture experiment settings. Thus, next we use four numerical examples to show that this criterion can effectively pick the best slack variable and lead to the most numerically stable SV-model.

Four Examples
In the following, we describe four examples that are chosen from the literature. In Table 4, we compute the correlation criterion for each component for all examples. It can be seen in Table 5 that choosing the component with the largest correlation criterion to be slack variable is consistent with using criteria maxVIF, MVIF, and CN. (2009) used an artificial example on mixtures of two drugs (x 1 and x 2 ), an enhancer (x 3 ), and a filler (x 4 ), where 0.01 ≤ x 1 ≤ 0.03, 0.01 ≤ x 2 ≤ 0.03, and 0 ≤ x 3 ≤ 0.02. An 18-point, face-centered cube design was used which contains 8 factorial points, 6 face centroids, and 4 center point replicates.
Example 3. Pharmaceutical Solution Example in Cornell (2000): Cornell (2000) described a study where the solubility of butoconazole nitrate, an antifungal agent, was studied as a function of the proportions of the cosolvents polyethylene glycol 400 (x 1 ), glycerin (x 2 ), polysor polysorbate 60 (x 3 ), along with water (x 4 ). Constraints on the component proportions were A 10-point D-optimal design was selected for fitting a quadratic model. The design consisted of 6 of the 10 extreme vertices and midpoints of 4 of the edges of the constraints region. Example 4. Vapor Pressure Example in Cornell and Gorman (2003): Cornell and Gorman (2003) showed a study involving three components and seven design points in the reduced region constrained by the inequalities 0.15 ≤ x 1 ≤ 0.5, 0.2 ≤ x 2 ≤ 0.7, and 0.15 ≤ x 3 ≤ 0.65. In this example, the component x 2 is water and would naturally be considered the filler. But the design of the experiment does not use the slack variable approach. Instead, the design is a transformed simplexcentroid design.
All four examples show that the largest criterion r 2 i chooses the most suitable slack variable. Unintentionally, all four examples include a filler component and we note that the criterion chooses the filler as slack variable in all examples. Example 1 clearly uses the slack variable approach where a facecentered cube design is used for the nonfiller components and the filler takes the rest of the proportion. Examples 2 and 3 use the software-generated D-optimal designs for the quadratic Smodel. Since the S-model and the SV-model are equivalent, the D-optimal designs for S-and SV-model are the same, as shown in Proposition 1 and its corollary in Section 4. Example 4 uses transformed simplex-centroid design that is also a D-optimal design for a quadric model.
To analyze mixture experiments, it is suggested to perform some linear transformation on the components' proportions to reduce the ill-conditioning problem. The correlation criterion is scalefree so transformation will not change the values of r 2 i 's. While the transformation will change the absolute values of maxVIF, MVIF, and CN, it will not change the sorted orders of the criteria of all the SV-models if the same transformation is used (see Table 6).

COMPARISON WITH OTHER MODELS
In this section, we compare the S-, SV-, and K-model that are commonly used in mixture experiments. Previously, we propose that the SV-model is more appealing in terms of interpretation. In this section, we focus on the numerical stability and prediction accuracy of the mixture models. First, we use two examples to show the criteria for comparing the three models.

Numerical Stability and Prediction Accuracy Examples
As suggested in the literature, we use linear transformation to alleviate the numerical instability. But different transformation methods work the best for different mixture models. For the SVmodel, we can scale the designs of the proportions into [-1, 1], which is the typical scale used in classical design and analysis of experiments. For the S-and K-model, L-, U-, and modifiedpseudocomponent transformations are recommended according to the literature. Thus, we try all four transformation methods and choose the best one for each mixture model. The modelterm selection process is done by backward stepwise regression based on both AIC and BIC. For the two examples, the reduced S-and SV-model are the same by both criteria.
Example 5. Hair Product Example: For the hair product example presented in Section 2, we first use different transformation methods for all three models and choose the best one according to maxVIM, MVIF, and CN in Table 7. Then, we apply the backward stepwise regression on the full SV-and S-models to obtain the reduced models. The full K-model is used because all terms are necessary to retain the constraint Equation (1). Table 8 shows the comparison of the reduced SV-, S-models (marked by asterisk) and the full K-model. R 2 and R 2 adj are the coefficient of determinant and the adjusted version, andσ is √ MSE from the ANOVA. LOOCV and 13-fold CV are the leave-one-out and 13-fold cross-validation prediction errors, which are computed by whereŷ −i is the prediction at x i with the model trained without the ith observation, andŷ −S l (x i ) is the prediction at x i with the model trained without the fold of data S l containing x i . The 13-fold cross-validation procedure is run five times and we choose the mean of the five CV errors to avoid random sampling effects. The reduced SV-model only has 20 terms, including the intercept, which is smaller compared with the 34-term reduced S-model and the 45-term full K-model. Although the reduced SV-model is the smallest in size, Table 8 shows that it is the best fit to the data and also the prediction accuracy.
Example 6. Additive in Lubricants Experiment Example in Snee (1975) and John (1984): Snee (1975) showed a mixture experiment that involves an additive x 1 and three lubricant blends x 2 , x 3 , and x 4 . The component proportions need to satisfy Later, John (1984) used these data to show how ridge regression can reduce the ill-conditioning problem for the S-model. Based on Table 9, scaling the components' proportions into [-1, 1] range works the best for all three models. Using this transformation, we compare the reduced SV-, reduced S-, and full K-model. The comparisons of the model size, R 2 , R 2 adj ,σ , and leave-one-out and 5-fold cross-validation prediction errors are shown in Table 10. It also confirms that the reduced SV-model is the most parsimonious, the best fit, and the most accurate model.

Simulation Study
We conduct a simulation study to compare the prediction performance of the SV-, S-, and K-model under different mixture experiment settings. In order to visualize the experimental designs, we only assume a three-component mixture. We use four different settings on the constraints of the component proportions.
In Cases 1-3, the component x 3 can be seen as the filler since it does not have upper or lower bounds other than 0 and 1. But only in Cases 1 and 2, the component proportions x 1 and x 2 can   Simulation study for mixture experiment 0 ≤ x 1 ≤ 0.1 and 0 ≤ x 2 ≤ 0.2: R 2 , R 2 adj ,σ , the leave-one-out cross-validation prediction error and 6-fold cross-validation prediction error.
be varied independent of each other since the sum of their upper bounds is less than 1. Although Case 3 still contains the filler, U 1 + U 2 > 1 and thus the design on x 1 and x 2 is not constraintfree. But we can still use the SV-model with x 3 as the slack variable. Case 4 is the classic mixture experiment setting with no additional restrictions on the upper and lower bounds. Thus any component can be treated as the filler.
The 12-run D-optimal designs for the four cases are shown in Figures 2-5 and are generated by JMP Pro 10 R . The constrained region for the first two settings is a parallelogram and thus the D-optimal design contains the centroid, the four vertices, and the four midpoints of the edges, which is also the central composite design. Three extra design points are replicated at three vertices marked in the plots. The D-optimal design for Case 3 contains five vertices, two midpoints on two edges, and one centroid. Three vertices are replicated twice. For Case 4, the D-optimal design is the simplex design with each design point replicated twice.
We use the quadratic S-model y = β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 1 x 2 + β 5 x 1 x 3 + β 6 x 2 x 3 + ε as the underlying true response function. To simulate the response, we randomly generate the true coefficients from β i ∼ Uniform (0, 5) for i = 1, 2, 3 and β i ∼ Uniform (0, 1) for i = 4, 5, 6, which is consistent with the hierarchical ordering principle (Wu and Hamada 2011). The noise follows ε ∼ N (0, 0.5 2 ) which is normal distribution with 0 mean and standard deviation 0.5. We simulate B = 100 datasets from these distributions. For each dataset, we first fit the full SV-, S-, and K-model and then use backward elimination with BIC to obtain the reduced the SV-and S-model. The densities of the five criteria R 2 , R 2 adj ,σ , leave-one-out, and 6-fold crossvalidation prediction error of the reduced SV-, reduced S-, and full K-model are shown in Figures 2-5. Simulation study for mixture experiment 0 ≤ x 1 ≤ 0.4 and 0 ≤ x 2 ≤ 0.5: R 2 , R 2 adj ,σ , and leave-one-out cross-validation prediction error and 6-fold cross-validation prediction error.
The first two cases forming the parallelograms are more challenging for modeling as the R 2 and R 2 adj are relatively small and widely spread, while the two criteria are much more improved for the Cases 3 and 4. For Cases 1 and 2, the density plots ofσ and two cross-validation errors show that the reduced SV-model has slightly better modeling precision and more accurate predictions than the reduced S-model. This advantage is less obvious in Case 3 and even disappears in Case 4. The full K-model has the worst prediction accuracy for all cases.
It should be noted that the simulation results returned in Figures 2-5 are only based on backward elimination with BIC. In the supplement, we use other model-term selection methods, including backward elimination with AIC, Lasso, and LAR procedure (Hastie et al. 2009). The latter two are very popular in data mining field. All of the simulation results show that different model-term selection methods perform similarly. Thus, the conclusions we have summarized can be applied to other similar model building methods. In the supplement, we conduct another simulation study for a 10-component mixture experiment, where the four model-term selection methods are used as well. Besides these four cases, we have also studied other scenarios. For example, we have studied the 4-component mixture case. We have also assumed the Scheffé cubic model as the true response, different distributions for the true coefficients in the model, and different constraint settings. We omit these results from the supplement due to space limit. From all these simulation studies, we can draw the following conclusions for mixture experiment with small or relative larger number of components.
1. Model-term selection should be performed to obtain more predictive models, which produce more accurate predictions. The K-model has the worst prediction accuracy since model-term selection cannot be used for the K-model. . Simulation study for mixture experiment 0 ≤ x 1 ≤ 0.5 and 0 ≤ x 2 ≤ 0.7: R 2 , R 2 adj ,σ , and leave-one-out cross-validation prediction error and 6-fold cross-validation prediction error.
2. If all the components are symmetric with respect to each other in both constraints and design, the SV-and S-model perform similarly, and there is no difference choosing any component proportion as slack variable. 3. If there is a filler component x q ∈ [0, 1], but q−1 i=1 U i > 1, then we cannot use a nonconstrained design for the nonfiller components. The SV-model is only slightly more accurate than the S-model in these situations. 4. If there is a filler component x q ∈ [0, 1], and q−1 i=1 U i ≤ 1, then the SV-model would be more precise and more predictive than the corresponding S-and K-models.

Numerical Stability
As shown in Section 4.1, apart from the choice of model forms, there can be other factors affecting the numerical stability of the information matrix. The design of the mixture experiment can be an important factor because it appears that the optimal design based on a given model form certainly favors that model compared to all the other model forms. But the following proposition and corollary show that this intuition does not hold for all optimal designs. Proposition 1. Based on the same design X for a mixture experiment, the model matrices of the S-model F 1 and the equivalent SV-model F 2 satisfy F 1 = F 2 P, where P is an elementary matrix and | det ( P) | = 1, and thus det(F 1 F 1 ) = det(F 2 F 2 ).
Corollary 1. Based on the same design X for a mixture experiment, Proposition 1 is true for the model matrices of any equivalent mixture models.
If the D-optimal design criterion is used (as in Section 4.2), it does not matter which model form we assume. As long as Figure 5. Simulation study for mixture experiment 0 ≤ x i ≤ 1 for i = 1, 2, 3: R 2 , R 2 adj ,σ , and leave-one-out cross-validation prediction error and 6-fold cross-validation prediction error. the models have the same order and the same number of terms (counting the intercept), the D-optimal criterion value remains the same for different models. Also, from Corollary 1, we can see that for any two equivalent models their model matrices can be transformed from one to the other by F 1 = F 2 P. Since F 1 F 1 = P F 2 F 2 P, the difference between the condition numbers of the information matrices is only caused by P. The elementary matrix P is fixed for any two equivalent model forms. But it is impossible to show that P can always enlarge or reduce the condition number of F 2 F 2 because it also depends on F 2 which, in turn, depends on the experimental design. For the examples in Section 4.2, the matrix P transforming the quadratic SV-model to the quadratic S-model is the same but it does not always increase or decrease the condition number for the S-model. For the simplex design in Case 4, P improves the condition number, as shown in Table 11 but not so for Cases 1-3. To compute Table 11, the model matrix is based on the original scale of the proportions in [0, 1].
Besides the elementary matrix P, different transformations of the component proportions can also change the conditioning of the information matrix. In Table 12 the condition numbers for Cases 1-4 are computed after the best transformation method is applied. We consider only the four transformation methods mentioned in Section 4.1. Comparing Tables 11 and 12, it is obvious that transformation is helpful to improve the condition number. Interestingly, the K-model has the smallest condition number  under the original scale for Case 1-3. But, after transformation, the SV-model has the smallest condition number for Cases 1-3. Also, for the SV-model, the [−1, 1] scaling consistently works the best for all four simulation cases, and for Examples 5 and 6. However, Case 4 is special as a simplex design is used and the design is symmetric to all three components. In Case 4, both the S-and K-models have smaller CN values than the SV-model. Due to these reasons, it is not possible to conclude that one model form always has the smallest condition number for all cases, which is why we have used examples and simulations to illustrate cases when the SV-model has better numerical stability than the other two models. We summarize the following results from the simulation (and which cannot be generalized further).
1. [-1, 1] scaling can effectively improve the conditioning of the SV-model. 2. If all the components are symmetric of each other in both constraints and design, the SV-model has the worst conditioning compared with S-and K-model. 3. If there is a filler component x q ∈ [0, 1], the SV-model after [−1, 1] scaling is numerically more stable than the S-and K-model with their best transformation methods applied, respectively, for any optimal design criterion.

CONCLUSION
In this article, we carefully study the properties of the SV-model and compare the SV-model with the S-model and K-model on three aspects: interpretation, numerical stability, and prediction accuracy. We recommend practitioners to use the slack variable approach when there is a filler component whose proportion is in [0, 1] and the sum of the upper bounds of the nonfiller proportions is no larger than 1 due to the following reasons. (1) The experimental design on the nonfiller components is not constrained and thus much simpler than the constrained mixture experimental design. (2) The interpretation of the SV-model is much easier and clearer for the experimenter to understand. (3) Model-term selection on the SV-model is not restricted and the reduced SV-model can lead to better prediction accuracy than the S-and K-models.
When using the SV-model one can choose the slack variable through expert knowledge (often formulators will have a filler ingredient already identified) or by using the correlation criterion proposed in Section 3. A reasonable transformation should be used on the design to improve the numerical stability. By simply scaling the proportions into [−1, 1] for the SV-model, we have a more effective transformation than the other transformation methods commonly used for the S-model and the K-model. Obviously det( P) = 1, and thus det(F 1 F 1 ) = det( P F 2 F 2 P) = det( P )det(F 2 F 2 )det( P) = det(F 2 F 2 ). For quadratic S-and SVmodels, replacing the component variable x q = 1 − q−1 i=1 x i , the vector of the terms in the S-model becomes x i = 1, x 1 , . . . , x q−1 , x 1 x 2 , . . . , x i x j , . . . , x q−2 x q−1 , x 2 1 , . . . , x 2 q−1 P = f 2 (x) P, where f 2 (x) is the vector of the terms in the equivalent SV-model, and P is the elementary matrix that performs elementary column operations. (1) Exchange the positions of certain elements in f 2 (x). (2) Multiply certain elements in f 2 (x) by −1.
(3) Add certain elements in f 2 (x) by k i e i , with k i can only be 1 or −1, and e i are some elements of f 2 (x). Then, f 1 (x) = f 2 (x) P. The model matrices are F l = [ f l (x 1 ), f l (x 2 ), . . ., f l (x n )] for l = 1, 2, so F 1 = F 2 P. What's more, P is always an upper diagonal matrix with diagonal entries equal to 1 or −1. So, det( P) = 1 or −1 and det(F 1 F 1 ) = det(F 2 F 2 ). For higher order or other forms of the mixture models, the proofs are similar.

SUPPLEMENTARY MATERIALS
In the supplement we provide more simulation study results to support the conclusions drawn in the article. We compare different variable selection methods for the simulation Cases 1-4. We also show another simulation case where there are ten components in the mixture. These results are consistent with the conclusions stated in Sections 4 and 5.

ACKNOWLEDGMENTS
This research was supported by the Procter & Gamble Statistics Innovation Grant from the Procter & Gamble Company. We thank the Editor, Associate Editor, and two referees for the valuable comments and suggestions. We also thank Robert Wells, retired Procter & Gamble Hair Care Principal Scientist, for motivating this work by being the first practitioner (many others followed) to question the need for the S-model based on many years of successfully using the SV-model when a large filler component is available.