QQ Models: Joint Modeling for Quantitative and Qualitative Quality Responses in Manufacturing Systems

A manufacturing system with both quantitative and qualitative (QQ) quality responses (as a QQ system) is widely encountered in many cases. For example, in a lapping process of the semiconductor manufacturing, the quality of wafer’s geometrical characteristics is often measured by the total thickness variation as a quantitative response and the conformity of site total indicator reading as a binary qualitative response. The QQ responses are closely associated with each other in a QQ system, but current methodologies often model the two types of quality responses separately. This article presents a novel modeling approach, called “QQ models,” to jointly model the QQ responses through a constrained likelihood estimation. The QQ models can jointly select significant predictors by incorporating inherent features of QQ systems, leading to accurate variable selection and prediction. Both simulation studies and a case study in a lapping process are used to evaluate the performance of the proposed method. Supplementary materials to this article are available online.


INTRODUCTION
In manufacturing systems, both quantitative and qualitative (QQ) quality responses are often used for quality control. We call such a system "a QQ system." QQ systems are widely encountered in various manufacturing processes. For example, in a lapping process of a semiconductor manufacturing, the wafers are lapped to improve the thickness uniformities and surface finish as shown in Figure 1 (Ning, Bian, and Liu 2012). The total thickness variation (TTV) is a quantitative quality response to characterize the range of the wafer thickness. The conformity of site total indicator readings (STIR) is a qualitative response with binary output, which is used to indicate whether the STIR is larger than the tolerance or not. The STIR is the span of deviation readings of the front surface in a predefined site, which represents the flatness of wafers. The detailed definitions of TTV and STIR can be found from the semiconductor manufacturing industry (O'Mara, Herring, and Hunt 1990). The TTV and the conformity of STIR are two key measurements for the quality of wafer geometrical characteristics. Both of them are affected by the same set of process variables in the lapping manufacturing, such as pressure, rotation speed, and lapping time (Marinescu, Uhlmann, and Doi 2007). Common root causes of QQ responses need to be identified for variation reduction and quality improvement. However, the association between the two variables is not clear from engineering domain knowledge.
Therefore, joint models of QQ responses are needed for quality-process modeling in the lapping process. For another example, in a solar cell lamination process, solar cells are grouped and laminated to panels. The panels may have both power loss and solar cell cracks (Pilla, Galmiche, and Maldague 2002;Paggi, Corrado, and Rodriguez 2013). The quality of the process often considers the power loss as a quantitative response and uses a binary response to indicate if the panel has cracked solar cells after the lamination. For quality improvement of solar cells, it is natural to consider a joint modeling of the power loss and the crack of the solar cells.
In manufacturing systems, the quantitative quality variables are widely used for quality control. Meanwhile, the qualitative quality variables also exist because of heuristic judgment or the limitation of sensor measurements. For example, the quantitative quality variables cannot be accurately measured in real time due to sensor limitations. Instead, some qualitative quality response variables are relatively easy to collect for facilitating the real time data collection. For the lapping process in Figure 1, the sensor is only capable to indicate whether the STIR is within the tolerance or not, which results in a qualitative quality measurement. With the qualitative quality measurement available, modeling such responses with respect to the corresponding predictor variables will be very useful because it can clearly provide useful information on whether the manufacturing is conforming or not (e.g., a binary indicator for a failed manufacturing is clearly nonconforming). Such information will be reflected in Figure 1. An illustration of a lapping process with QQ responses (redrawn from Ning, Bian, and Liu (2012), and Zhao et al. (2011) with authors' permission). the variable selection in our proposed methods, which will be discussed later.
Although both QQ quality responses exist in a manufacturing system, current methods often focus on developing qualityprocess models separately for the two types of quality responses. For quantitative quality variables, statistical models and engineering models are widely used to model the quality-process relationship based on observational data or design of experimental data (Fong and Lawless 1998;Shi 2006;Wu and Hamada 2009;Jin and Shi 2012). Quality control are performed based on these models, such as process monitoring (Hawkins 1991(Hawkins , 1993Woodall 2000;Woodall and Montgomery 2014;Montgomery 2001;Qiu 2014), diagnosis (Apley and Shi 1998;Zhou et al. 2003;Shi 2006), and control (Joseph 2003;Jin and Ding 2004;Shi 2006;Jin and Shi 2012). Modeling and quality control methods for qualitative quality variables are also proposed, such as in the area of design of experiments, statistical process control, and run-to-run control for categorical variables (May, Huang, and Spanos 1991;Spanos and Chen 1997;Wang and Tsung 2007;Lin and Wang 2011). Qiu (2014) provided a comprehensive summary on the recent development in statistical process control. In general, the above modeling and quality control methods do not account for the association between the two types of quality responses. In the literature of biometrical study, Fitzmaurice and Laird (1997) investigated a joint modeling for the association of QQ responses. In their work, a conditional model of quantitative response is considered conditioned on the qualitative response, leading to marginal regression models. However, the marginal regression models may not be useful for quality control in the manufacturing systems, since they may not provide an accurate prediction of the quality responses. In the literature of process monitoring, Qiu (2008) suggested monitoring the mixed responses based on categorizing the quantitative responses. Thus, the monitor-ing can be carried out based on the joint distribution of the categorized responses. It is possible to apply similar strategies to model the QQ responses without imposing the normality assumption. In manufacturing systems, it often requires to model the quantitative responses directly for quality control purpose, as the lapping process discussed earlier. Recently, mixed graphical models (Chen, Witten, and Shojaie 2014;Yang et al. 2014) have been proposed to study the general association of QQ responses. They mainly focus on the correlation or partial correlation of the responses, rather than the exact dependency of the QQ responses.
In this article, we propose a new joint modeling framework, called "QQ models," for both QQ quality responses in a QQ system. We focus on the qualitative response with binary output, though it is likely to accommodate the multi-level qualitative response as discussed in Section 6. We adopt a logistic regression model for the qualitative response. For the quantitative response, we consider the linear regression models conditioned on the qualitative response. To address the association of the QQ responses, we consider that the conditional linear regression models are different based on different values of the binary output. The proposed method enables a more direct and accurate prediction of the quality responses, rather than a prediction of the expectation of the quantitative response. Note that the prediction for the qualitative response is usually very informative in the manufacturing system. We will first construct the prediction of qualitative response, then predict the quantitative response conditioned on the estimated qualitative response. Recall the lapping process example: the QQ responses of TTV and STIR indicator are jointly determined by the lapping process variables and the quality covariates before lapping. When the STIR indicator is zero (i.e., STIR satisfies the tolerance), the lapping process is likely to be conforming. Thus, the lapping process is effective to reduce TTV, and the predictors related to process conditions may be more important to affect the TTV than the quality covariates before lapping. If the STIR indicator is one (i.e., STIR is too large to satisfy the tolerance), then the lapping operations may not be conforming. Thus, the quality covariates before lapping become important to affect the TTV. Therefore, the coefficients in the conditional regression models for TTV may vary depending on the values of the STIR indicator. It is beneficial to consider different conditional regression models for quantitative response under different values of the qualitative response, reflecting the intrinsic heterogeneity of the underlying models in the QQ system. This engineering perception is successfully validated by the results of the case study in Section 5.
Moreover, the proposed QQ models adopt a joint likelihood approach for parameter estimation and use the nonnegative garrote approach (Breiman 1995) for efficient variable selection. The nonnegative garrote approach accommodates the association of significant predictors among QQ models through flexible constraints. The proposed QQ models intend to encourage that the significant variables in the logistic regression model are kept as significant in at least one of the conditional linear regression models. It is because that the significant variables from modeling the qualitative response are expected to contribute an important role for modeling other quantitative quality responses. For example, in manufacturing scale-up, modeling the binary response whether the manufacturing is conforming or not will provide useful information. In this model, a significant variable should be important for quality control in general, and thus should also be kept as a significant variable for other quantitative quality responses. To address the computational consideration of parameter estimation, we also develop a fast computation algorithm by iteratively approximating the objective function using quadratic approximations.
The rest of the article is organized as follows. In Section 2, we detail the proposed QQ models for jointly modeling both types of responses. In Section 3, we develop an efficient algorithm for fast computation. In Section 4, simulation studies are conducted to examine the effectiveness in prediction and variable selection of the proposed QQ models. In Section 5, the proposed QQ models are applied to the lapping process as a case study. Finally, we draw our conclusions and discuss the future work in Section 6.

THE PROPOSED QQ MODELS
We start with one quantitative response y and one qualitative response z with binary output, though it is possible to extend the models for multiple responses, as discussed in Section 6. Let us denote the observed data are (x i , y i , z i ), i = 1, . . . , n, where y i ∈ R and z i ∈ {0, 1}. Here the predictor vector x = (x 1 , . . . , x p ) contains p predictors, which can be process variables or initial quality covariates. To jointly model the QQ responses y and z given x, we follow a joint probability density function f (y, z|x) = f (y|z, x)f (z|x), where f (·) denotes a general density function. The conditional model on y|z is considered to be linear regressions, while the model of z follows a logistic regression. Specifically, we propose a joint modeling of y and z as where η = (η 1 , . . . , η p ) is a vector of parameter coefficients, and where β (m) = (β (m) 1 , . . . , β (m) p ) , m = 1, 2. The above proposed model indicates that y|z = 1, x ∼ N (x β (1) , σ 2 ) and y|z = 0, x ∼ N (x β (2) , σ 2 ). We assume the same variances for the two conditional distributions of y|z = 1, x and y|z = 0, x. It means that the difference between two conditional distributions are mainly driven by their mean functions. If the two conditional distributions have different variances, one can easily have two different variance parameters, which will not change the nature of the model formulation. Recall that the relationship between the quantitative response y and its predictors depends on the output of qualitative response z. To accommodate this consideration, the proposed method incorporates two conditional linear regression models for y|z = 1, x and y|z = 0, x. If these two linear models are the same, that is, β (1) = β (2) , then the quantitative response y and the qualitative response z are independent. In this situation, one can model the quantitative response y regardless of the qualitative response z. Alternatively, it is important to take accounts of the effects of the qualitative response z when modeling the quantitative response y. Note that the proposed models in (2.1) and (2.2) provide a joint modeling of y|z, x and z|x. Now we can also derive the conditional model on z|y, x. The probability of z conditioned on y can be expressed as P (z = 1|y, x) = f (y|z = 1, x)P (z = 1, x)/f (y, x). Thus, we have It implies that the distribution z|y becomes where h(y, x) = exp((y − x β (2) ) 2 )/ exp((y − x β (1) ) 2 ). Therefore, the proposed method can provide both conditional models of y|z, x and z|y, x, which is more flexible for the manufacturing system. To estimate the parameters in the proposed models, we consider the joint likelihood estimation approach, which enables the parameter estimation to borrow strength from two models of QQ responses. The log-likelihood function is up to some constant independent of parameters. Now we can estimate the parameters by minimizing the negative log-likelihood function upon some constraints for pursuing sparse model structures. A sparse model structure only contains a few significant variables in the model, which is often called as variable selection (Miller 2002;Hastie, Tibshirani, and Friedman 2009). It aims to achieve a parsimonious model with meaningful interpretation. Specifically, we consider the nonnegative garrote approach to pursue a sparse model for both models of QQ responses. The nonnegative garrote approach is originally introduced by Breiman (1995) for linear models. Several researchers (Yuan and Lin 2007;Xiong 2012) have further developed the theoretical properties of nonnegative garrote approach. This approach is also used in other statistical models, such as logistic regressions (Makalic and Schmidt 2011) and additive models (Cantoni, Flemming, and Ronchetti 2011). In the nonnegative garrote approach, the key idea is to reparameterize the parameter coefficients such that flexible constraints can be imposed to pursue a parsimonious model. Let β (1) (2) k , and η k = τ kηk , whereβ (1) k ,β (2) k , andη k are some initial estimations of the model parameters, such as least squares estimation or marginal maximum likelihood estimation (MLE). The constraints of this optimization problem should encourage the sparsity of the QQ models, and reflect the engineering perception. Thus, we consider estimating the parameters through a constrained optimization problem as follows.
Here the parameters for optimization includes , and σ 2 . The parameters t k 's are nuisance parameters to encourage the strong association of the significant variables in the regression models for y|z and the logistic regression model for z. Specifically, if a t k = 0, the values of θ (1) k , θ (2) k , and τ k will be forced to be zeros simultaneously. It implies that the kth predictor variable is not significant in the QQ system. Thus, this variable is not selected by the QQ models, which encourages the sparsity of the QQ models. The constraints τ k ≤ θ (1) k + θ (2) k is to encourage a variable to be selected in at least one of linear regression models, if this variable appears to be significant in the logistic regression model. This constraint reflects the engineering perceptions that the predictors for modeling qualitative response are usually informative for quality control, and they are also expected to be significant for modeling the quantitative response in the manufacturing system.
Note that there is a tuning parameter M in (2.6), which needs to be specified based on the data. An appropriate selection of M can balance the trade-off between the model fitting and model parsimoniousness. If the value of M is set to be zero in (2.6), all values of the estimated parameters will be equal to zeros, that is, none of predictor variables will be selected in the model. If the value of M is set to be sufficiently large, the proposed method will select all predictors in the model. The estimation of parameters will be the same as the MLE approach. The common methods to select tuning parameters include cross-validation and information criterion approaches, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), and C p criteria (Burnham and Anderson 2002). In this work, we use the BIC to find an optimal value of the tuning parameter M. The BIC score for the proposed models can be written as where q is the number of nonzero estimates of parameters in the models. Specifically, we can generate a grid for M, such as For each grid point m j in C, we evaluate the corresponding BIC scores, and find the optimal choice of M with the minimal BIC score among all grid points in C.

COMPUTATIONAL ALGORITHM
Solving the proposed optimization problem in (2.6) is nontrivial because of the nonconvexity of the objective function. It is also likely that there are a large number of parameters to optimize. To address this challenge, we propose a quadratic approximation for the original objective function, thus to convert the optimization in (2.6) to a constrained quadratic optimization problem. The key idea is based on a sequence of local quadratic approximation. It is well known that the constrained sequential quadratic optimization can guarantee a global optimum with fast computation (Boyd and Vandenberghe 2004).
Denote X = (x 1 , . . . , x n ) T to be the design matrix, z = (z 1 , . . . , z n ) T to be the binary response vector. Without loss of the generality, we assume that the first n 1 observations with binary response z = 1, and the remaining n 2 observations with response z = 0, where n = n 1 + n 2 . Then, we can define y (1) = (y 1 , . . . , y n 1 ) T and y (2) = (y n 1 +1 , . . . , y n ) T . Similarly, we can correspondingly define X (1) and X (2) , respectively. Given an estimate of η, we first apply the quadratic approximation for the log-likelihood function l(η, β (1) , β (2) ) in (2.6). The detail of the quadratic approximation for l(η, β (1) , β (2) ) is described in Part A of online supplemental materials. Specifically, we approximate the original objective function in (2.6) by . . , p(x n , η c )(1 − p(x n , η c ))), and p = (p(x 1 , η c ), . . . , p(x n , η c )) provided by a current estimate η c . In this case, the above objective function of the approximate quadratic problem is clearly a convex function with respect to unknown parameters θ (1) . . , τ p ) , and t = (t 1 , . . . , t p ) . Since the parameter σ 2 is not involved in any constraint, by taking the derivative of the objective function with respect to σ 2 , we can obtain the estimate of σ 2 explicitly as follows, Note that the objective function in (3.1) is provided by an initial estimate η c = η 0 . We can choose a proper value of η 0 and update the estimations in an iterative procedure. The iterative algorithm is summarized as follows: Step 1: Set an initial estimate σ 2 = σ 2 0 > 0, also an initial estimate of η c = η 0 .
For the initial estimation of σ 2 and η c , we simply choose σ 2 0 as the residual variance by fitting a linear regression model for the quantitative response, and η 0 as the marginal logistic regression model parameters. For the initial estimation in the nonnegative approach for reparameterization, we use the leastsquare estimates forβ (1) k ,β (2) k , andη k . When the least-square estimates is not available, we would choose a ridge regression estimator (Hoerl and Kennard 1970) for the linear regression models and the logistic regression model.

SIMULATION
To evaluate the performance of the proposed method, we consider several simulation settings for generating the data with the underlying models in (2.1) and (2.2). Let I 1 be the index set of significant predictor variables in β (1) and I 2 be the index set of significant variables in β (2) . Denoteβ 1 = {β (1) k : k ∈ I 1 } and β 2 = {β (2) k : k ∈ I 2 }. Similarly, we denote by I l the index set of significant variables in η. Denoteη = {η k : k ∈ I l }. Specifically, we consider four examples as follows.
Example 1. I 1 and I 2 are the same, and the values ofβ 1 and β 2 are similar.
Example 2. I 1 and I 2 are different (i.e., the significant variables are different), but the values ofβ 1 andβ 2 are similar.
Example 3. I 1 and I 2 are the same, but the values ofβ 1 and β 2 are different.
Example 4. I 1 and I 2 are different, and the values ofβ 1 and β 2 are also different.
In each example, the index sets I l , I 1 , and I 2 are generated randomly but following the proportion of sparsity s. Here the value of s represents the proportion of nonzero entries in a parameter vector. The entry values of parameter vectorη are generated from uniform distribution U (−2, 2). In Examples 1 and 2, we first generate the parameter vectorβ 1 with each entry value from normal distribution N (2, 1). Then the entry values ofβ 2 are obtained by adding a small perturbation from N (0, 0.01) onto the entry values of generatedβ 1 . In Examples 3 and 4, the entry values of parameter vectorsβ 1 andβ 2 are generated independently from N (2, 1), respectively.
For each example, we generate a training set and a test set based on the models in (2.1) and (2.2). The n data points x 1 , . . . , x n in the training set are independent and identically distributed (iid) sample generated from N (0, ), where = (σ ij ) p×p with (i, j )th entry σ ij = ρ |i−j | and ρ being a correlation parameter. The n data points of the test dataset are iid sample generated from U (−2, 2). The sample sizes for the training dataset and the test dataset are n = 100. The σ 2 in the model (2.2) is chosen to be 1.
To systematically investigate the performance of the proposed method, we consider different scenarios of generating the simulation data by varying the predictor dimensionality p, correlation parameter ρ, and proportion of sparsity s. We choose two levels of p with the values p = 20 and p = 50, two levels of ρ with the values ρ = 0 and ρ = 0.5, and two levels of s with the values s = 20% and s = 50%. For every setting of each example, we conduct 50 simulation replicates.
To evaluate the accuracy of the estimated QQ models, we compare the proposed method with two benchmark models: separating modeling using BIC (SMBIC) and modeling with additional predictors using BIC (MABIC). The SMBIC approach ignores the association between the two types of responses. It fits a linear regression model for the quantitative response y and a logistic regression model for the qualitative response z separately, of which both use BIC for variable selection. The MABIC approach considers adding one of QQ responses as predictor in modeling the other response. Specifically, the MABIC is to fit a linear regression model for the quantitative response y by adding the qualitative response z as an additional predictor, and fit a logistic regression model for the qualitative response z by adding the quantitative response y as an additional predictor. The BIC is also used for the variable selection. For all three methods in comparison, the models estimated from the training set are used to compute the prediction errors based on the test set. Specifically, the prediction errors for the quantitative response y is measured by the root mean squared prediction error as RMSPE iβ (k) ) 2 , and the prediction error for the qualitative response z with binary output is measured by the misclassification error as ME = 1 n n i=1 I (z i =ẑ i ), wherê z i ∈ {0, 1} is the prediction of binary response z i based on the logistic regression in (2.1), and I (·) is an indicator function. Furthermore, we also calculate the accuracy of variable selection for the estimated models. Here we use the total number of falsely selected variables, denoted by γ , as the performance measure. The γ is defined by γ = FP + FN, where FP represents the number of false positives and FN represents the number of false negatives in variable selection. Table 1 reports the averages of RMSPEs (or MEs) and standard errors in parenthesis based on 50 simulation replicates. By using the efficient computational algorithm in Section 3, the average of computing time for the proposed method is 3.95 sec for each simulation replicate (based on a workstation with CPU Xeon Processor E5-2687W, 3.10 GHz, 64 GB RAM). We denote the estimated linear regression models by QQ lm in (2.2) and the estimated logistic regression model by QQ logit in (2.1) from the proposed QQ models. Similarly, we denote BIC lm and BIC logit as the corresponding estimated models from the SMBIC approach. We denote Add lm and Add logit as the corresponding estimated models from the MABIC approach. For the results of Example 1, the performance of the proposed QQ models is comparable to the SMBIC approach and the MABIC approach. Note that Example 1 considers β (1) and β (2) being similar. It implies that the quantitative response y does not depend much on the qualitative response z. Since there is no hidden association between the QQ responses, separate modeling QQ responses such as the SMBIC and the MABIC approaches would give accurate estimation of parameters. In contrast, Examples 2-4 consider the situations of β (1) and β (2) being different with respect to the significant variables and their values. It means that the two conditional models y|z = 1, x and y|z = 0, x are different, reflecting the dependency between the QQ responses. In these situations, the proposed QQ models generally outperform the SMBIC approach and the MABIC approach. In particular, the RMSPEs from the QQ lm are much smaller than those from the BIC lm and the Add lm . These findings can be explained by the fact that the proposed QQ models consider the dependency between the quantitative response y and qualitative response z through the joint probability p(y|z, x)p(z|x), while the SMBIC approach considers the probability independently as p(y|x)p(z|x) and the MABIC approach only considers a particular dependency of y and z by treating of one of them as predictor in the modeling. Therefore, the model structure for QQ lm based on p(y|z, x) is more favorable to obtain an accurate model in prediction. We also note that the MEs from the QQ logit are comparable to that from the BIC logit and the Add logit . Because the QQ logit is obtained based on marginal distribution p(z) for modeling the binary response, it is expected that QQ logit have comparable prediction performance to the BIC logit and the Add logit , both of which are also based on marginal distribution p(z|x) for modeling the binary response.
Under different design matrices and dimensionality of parameters, we also observe that the proposed QQ models give better prediction performance than the SMBIC approach and the MABIC approach for Examples 2-4. It is because that the proposed QQ models take the advantage of the association between the QQ responses regardless of the structure of the design matrix X. Under two levels of sparsity s with the same correlation parameter ρ and dimensionality p, we can see that the QQ lm has smaller RMSPEs than the BIC lm and the Add lm in both sparsity levels for p = 20. When the dimensionality becomes larger in p = 50, the QQ lm has much better performance than the BIC lm and the Add lm at sparse level s = 20%, while the QQ lm has slightly better performance than the BIC lm and the Add lm at sparse level s = 50%. One possible explanation is that when p = 50, the number of parameters reaches 150, which is more than the sample size n = 100. Such a situation would be in favor of QQ models with more sparse levels to gain better prediction accuracy.
Furthermore, Table 2 examines the performance of variable selection in terms of the number of false selection for the four examples. Here the number of false selection for the linear models, that is, QQ lm , BIC lm , and Add lm , is calculated by the average of the number of false selections with respect to the two conditional linear regression models on y|z = 1 and y|z = 0. The results in Table 2 show that the proposed QQ models generally have better variable selection accuracy than the SMBIC approach and the MABIC approach. In Examples 1 and 3, note that the linear models of y|z = 1, x and y|z = 0, x have the same significant variables. In these situations, it appears that the QQ logit has better selection accuracy than the BIC logit and the Add logit , while the variable selection accuracy of QQ lm is comparable to the BIC lm and the Add lm . An intuitive explanation is that when the linear models of y|z = 1, x and y|z = 0, x have the same significant variables, the conditional model y|z, x in (2.2) only reflects the role of z through the values of estimated coefficients, not on the role of what the significant variables are. It makes the variable selection accuracy of QQ lm comparable to those of BIC lm and Add lm . In Examples 2 and 4, when the linear models of y|z = 1, x and y|z = 0, x become different in significant variables, one can clearly see that the proposed method gains the superiority of variable selection accuracy for both QQ lm and QQ logit . Moreover, as the dimensionality p increases in these examples, the advantages of the proposed method on variable selection become more significant, especially when s = 20%. With the sample size n = 100 fixed in this study, the increase of p would make the QQ models having more advantage when the underlying model is sparse.
To check whether the fitted QQ models are over-sparse or under-sparse, we also report the number of false positives and the number of false negatives in Part B of the online supplemental materials. The results show that the number of false positives for the QQ models is small in general, indicating that the fitted QQ models would not be under-sparse. In addition, the number of false positives from the QQ models is much smaller than those from the SMBIC and the MABIC approaches in most cases. For the number of false negatives, we can see that under the dimensionality s = 20% and p = 20, the number of false positives for QQ models is generally smaller than those from the SMBIC and the MABIC approaches. But when the sparsity s = 50% and p = 50, the number of false positives for QQ models becomes relatively large, indicating that the fitted models can be over-sparse to some extent. One possible explanation is that  p = 50 means that the number of parameters in QQ model is 150, larger than the sample size n = 100. In this situation, the QQ models tend to pursue sparse models to get accuracy on the predication. In this work, the proposed QQ models assume a linear model for the quantitative response y with normal distributed errors. To evaluate the robustness of the proposed QQ models, we first conduct a set of simulations to check the normality assumption of the linear model residuals = y − zx β (1) + (1 − z)x β (2) . Specifically, we generate the simulation data following the aforementioned procedure, except changing the distribution of the linear model residuals from a normal distribution to a Chi-squares distribution (skewed) and a t-distribution (heavy tail) with degrees of freedom 5, respectively. We also scale the Chi-square and t-distributions to have the same variance σ 2 = 1 as the normal distribution used in linear model residuals. The prediction and variable selection performance of the QQ models and the SMBIC approach are reported in Part B of the online Supplemental Materials. We found that the QQ models generally have more accurate prediction and variable selection than the SMBIC approach in Examples 2-4, when the underlying distributions deviate from a normal distribution. The QQ models are robust to the normality assumption of residuals to some extent, since the prediction and variable selection performance of the QQ models for chi-squares distributed residuals and t-distributed residuals are comparable to those of the QQ models for normally distributed residuals. Second, we evaluate the robustness of the linear model assumption for the quantitative response y. Specifically, we generate the simulation data following the aforementioned procedure, but taking cubic of the regression mean in (2.2) as z(x β (1) ) 3 + (1 − z)(x β (2) ) 3 . Thus, the underlying model for the quantitative response becomes nonlinear. We fit the data using the QQ models and SMBIC approach with a linear model for the quantitative response. From the results in Part B of the online supplemental materials, both the QQ lm and the BIC lm yield large prediction errors due to the improper assumptions of the model structure. But the QQ lm gives relatively smaller RMSPEs than the SMBIC approach, which may be explained by the flexibility of the QQ models: the quantitative response is conditioned on the different values of z. For the variable selection performance, the QQ lm does not clearly outperform the BIC lm in various cases. It appears that when the linear model structure assumption is not valid, the proposed QQ models may not perform well. Some discussions are provided in Section 6 on how to address this issue.

CASE STUDY IN THE LAPPING PROCESS
To further illustrate the merits of the QQ models, we analyze the data from a real case study in the lapping process. Recall the lapping process introduced in Section 1: there are 10 predictors and two quality responses, which are summarized in Part C of the online supplemental materials. In this process, there are four process variables: pressure, rotation speed, lapping time for low pressure, and lapping time for high pressure. In addition, there are six initial quality covariates before the lapping process. This collected dataset contains 254 wafer observations, where 203 wafers have the STIR indicator as 0 (good), and the remaining 51 wafers have the STIR indicator as 1 (bad). To evaluate the performance of the proposed QQ models, we randomly partition the dataset into a training dataset (50%) and a test dataset (50%) with a stratified sampling strategy. Such random partitions repeat for 50 times. For each partition, the proposed QQ models and the SMBIC approach are applied to model the quantitative response TTV and the qualitative response STIR indicator based on the training dataset. The prediction errors of the test dataset and the number of selections for each predictor variable of the training dataset are computed for both methods. Figure 2 shows the boxplots of RMSPEs (MEs) for the QQ models and the SMBIC approach under the 50 replicates of data partitions. We did not compare the performance of the MABIC approach here because neither of QQ responses will be available and they need to be predicted in real practice. From Figure 2(a), it is clear that the proposed QQ models have much smaller RM-SPEs than the SMBIC approach. Figure 2(b) also shows that the misclassification errors of the proposed method are also smaller than that from the SMBIC approach. As shown in Figure 2(c) and 2(d), the smaller misclassification error of the QQ models is mainly due to the smaller false positive classification error; while the false negative classification errors are comparable for the two methods. The results indicate that the proposed joint modeling of QQ responses enhanced the prediction performance in this case. Table 3 reports the average numbers of selected variables and their standard errors (in the parenthesis) for both methods. Here the number of selected variables for the linear models, that is, QQ lm and BIC lm , is calculated by the average of the number of selections with respect to the two conditional linear regression models on y|z = 1, x and y|z = 0, x. From the results in Table 3, we can see that the quality covariates x 5 − x 10 are often selected by QQ lm and QQ logit . While the BIC lm and BIC logit consistently ignore the quality covariates in the models. In fact, the quality covariates are expected to be important in the model if the STIR indicator is 1. From the engineering perception, the lapping process is likely to be nonconforming if the STIR indicator is 1. In this case, the quality covariates representing the initial quality of wafers become important factors for after-lapping wafer quality. The proposed QQ models successfully capture such engineering perception. In contrast, the SM-BIC approach considers modeling the two responses TTV and STIR indictor separately, which cannot unveil the significance of quality covariates in this case study. In summary, the proposed QQ models successfully discover this hidden information, which can be further used for quality control and process improvement.
We also evaluate the model assumption for this real-data example. The Q-Q normal plots for the residuals in the model y|z = 1, x and the residuals in the model y|z = 0, x are provided in Part C of the online supplemental materials. From the Q-Q plots, we can see that the distribution of the residuals would be close to normal distribution after a linear transformation. It appears that the normality assumption of the residuals may not strictly hold. Note that the prediction performance of the regression models in QQ models is still better than the benchmark models. It appears being reasonable to use the proposed models for prediction. This is also consistent with the conclusion drawn from the simulation, that the proposed QQ models are robust to the normally distributed residual assumption. We also examine the residuals over fitted responses in the model y|z = 1, x and the model y|z = 0, x; see Part C of the online Supplemental Materials. It is clear that for both models the residuals have constant variance, and form several clusters along fitted response. This is mainly because the lapping data are collected from the design of experiments settings, where the fitted responses are clustered. In addition, the random patterns appearing in the residual plots indicate that the linear model assumption seems to be reasonable for modeling y.
For the two conditional models y|z = 1, x and y|z = 0, x, we assume that the two models have the same error variance, that is, var(y|z = 1, x) = var(y|z = 0, x). To check this assumption, we check if the standard errors in the model y|z = 1, x and the model y|z = 0, x are the same by using the F-test. Based on the fitted QQ models, the standard error for the model y|z = 1, x is 0.3135, and the standard error of the error terms for the model y|z = 0, x is 0.2219. The p-value of the F-test is 0.0002, indicating that the standard errors are different in this case study. Recall the model in (2.2): the proposed method can be easily extended to allow the two conditional models having two different variances without changing the nature of the problem.
To check the logistic model assumption in modeling z|x, we performed model diagnostics by using χ 2 test for deviance. For the logistic regression obtained from the QQ models, the null deviance is 433.91, and the model deviance is 398.13. The pvalue of the χ 2 test is 1.7 × 10 −8 . Thus, it shows that the logistic regression model obtained from the QQ models has a reasonable goodness of fit for the real data.

DISCUSSION
A QQ system is widely encountered in manufacturing processes, such as a lapping process, a solar cell lamination process, and nanostructure synthesis. With both QQ responses of quality closely associated, the two types of quality responses often share the same set of potential root causes. Therefore, a joint modeling framework is needed for integrating both types of quality responses. In this article, we propose a joint likelihood modeling framework called QQ models. The proposed models consider the joint probability of the QQ responses, while the constrained likelihood estimation approach is used for parameter estimation and variable selection. A fast algorithm of parameter estimation is also developed by quadratic approximation to facilitate fast computation. We use both simulation studies and a real case study from the lapping process to demonstrate the effectiveness of the QQ models. Especially in the case study, the proposed QQ models yield better prediction and more meaningful variable selection, which reflect the inherent features of the real manufacturing process. Note that for complex QQ systems, the quality-process relationship can be highly nonlinear. The proposed method can be extended to the nonlinear models. One possibility is to incorporate projection pursuit (Friedman and Stuetzle 1981) into the proposed QQ models to allow more flexibility on the predictive functions. Specifically, we can extend the models in (2.1) and (2.2) as z|x = 1, w.p. p(x) 0, w.p. 1 − p(x) with p(x) = exp(v(x η)) 1 + exp(v(x η)) , y|z, x ∼ N (zh 1 (x β (1) ) + (1 − z)h 2 (x β (2) ), σ 2 ), where h 1 , h 2 , and v are nonparametric functions. The nonparametric forms of h j and v allow flexible functional structures to be estimated. For the parameter estimation, an efficient algorithm can be developed by iterating the estimation between functions h 1 , h 2 , v and parameters η, β (1) , β (2) . Another possibility of extending the proposed QQ models is to adopt the nonparametric methods (Qiu 2014) to relax the normality assumption, which can be an interesting topic for our future research.
Although we focus on one quantitative response and one qualitative response in this work, the proposed method can be generalized to the case of multiple responses. Suppose the multiple quantitative responses are y 1 , . . . , y m and multiple binary responses are z 1 , . . . , z t . For example, a multi-level qualitative response can be transformed into a set of dummy binary responses. For m > 1 and t = 1, we can generalize the QQ models by multi-response regression (Breiman and Friedman 1997) as (y 1 , . . . , y m )|z, x ∼ N(zx B (1) ) + (1 − z)x B (2) , ), where B (1) , B (2) are coefficient matrices, and is a covariance matrix. For m > 1 and t > 1 with multiple binary responses, considering all 2 t conditional models (y 1 , . . . , y m |z 1 = 1, . . . , z t = 1), . . . , (y 1 , . . . , y m |z 1 = 0, . . . , z t = 0) may only work for a small t. Alternatively, we can develop a two-stage approach by first using independence screening (Fan and Lv 2008) to select important qualitative responses; that is, to conduct t independent conditional models (y 1 , . . . , y m |z 1 ), . . . , (y 1 , . . . , y m |z t ) and select u < t qualitative responses z j 's with best fitting. Then the full conditional models will be applied for the selected u qualitative responses. For the joint distribution of u qualitative responses, we can consider multi-logit models (McCullagh 1980) and Ising mod-els (Ravikumar, Wainwright, and Lafferty 2010) as possible solutions.
The proposed method can be extended to accommodate interaction effects of the predictors. The hierarchical and heredity principles (Wu and Hamada 2009) can also be incorporated under the nonnegative garrote approach for variable selection. Other types of variable selection and group variable selection techniques, such as lasso (Tibshirani 1996) and group nonnegative garrotes (Yuan and Lin 2006), will be investigated for the proposed method. A Bayesian modeling framework will also be developed to integrate information for both types of responses.
We would like to point out that there can be other possibilities to model the joint distribution of QQ responses. In the proposed QQ models, we have considered the joint distribution f (y, z|x) = f (y|z, x)f (z|x) for model construction. Alternatively, one may consider the model construction based on f (y, z|x) = f (z|y, x)f (y|x). However, it is not clear how to appropriately quantify f (z|y, x) in a general formulation, which can be an interesting topic for future research. When the association of QQ responses becomes more complicated than the conditional distribution, it may not be appropriate to use the QQ models. Other techniques such as nonparametric methods can be potentially useful to deal with the related research problems.

SUPPLEMENTARY MATERIALS
The online supplementary materials include the description of quadratic approximation in (2.6) of the main paper, simulation results in Section 4 of the main paper, and the list of variables and analysis results in Section 5 of the main paper.