Robust Parameter Design With Computer Experiments Using Orthonormal Polynomials

Robust parameter design with computer experiments is becoming increasingly important for product design. Existing methodologies for this problem are mostly for finding optimal control factor settings. However, in some cases, the objective of the experimenter may be to understand how the noise and control factors contribute to variation in the response. The functional analysis of variance (ANOVA) and variance decompositions of the response, in addition to the mean and variance models, help achieve this objective. Estimation of these quantities is not easy and few methods are able to quantity the estimation uncertainty. In this article, we show that the use of an orthonormal polynomial model of the simulator leads to simple formulas for functional ANOVA and variance decompositions, and the mean and variance models. We show that estimation uncertainty can be taken into account in a simple way by first fitting a Gaussian process model to experiment data and then approximating it with the orthonormal polynomial model. This leads to a joint normal distribution for the polynomial coefficients that quantifies estimation uncertainty. Supplementary materials for this article are available online.


INTRODUCTION
Robust parameter design (RPD) is a quality improvement tool for designing products to be insensitive to noise factor variations. There is increasing interest in designing robust products with computer simulations. Because these simulations can be time-consuming, surrogates are constructed via computer experiments to facilitate exploration and optimization of the simulators.
Gaussian process (GP) models (Sacks et al. 1989;Currin et al. 1991;Santner, Williams, and Notz 2003) are widely used as surrogates for expensive simulators. A GP model is a prior for the functional relationship between the output and inputs of a simulator. The prior process is updated with data from a computer experiment, giving a posterior process that is employed for statistical inference. Many existing methods for robust design with deterministic simulations use the GP model as surrogate for finding optimal control factor settings. Apley, Liu, and Chen (2006) quantified the effects of uncertainty about the true output on an objective function for smaller-the-better quality characteristics. Williams, Santner, and Notz (2000) and Lehman, Santner, and Notz (2004) introduced sequential experiment design procedures based on expected improvement (EI) criteria for finding robust settings. Tan and Wu (2012) proposed methods for constructing accurate point and interval estimates of expected quadratic loss. Other types of models have also been employed. For example, Welch, Yu, and Kang (1990) used quadratic response surface models.
The functional ANOVA and variance decompositions of the simulator provide important insights into the contribution of each noise factor to the variance, and influence of each control factor on the mean and variance. Chen, Jin, and Sudjianto (2006) showed that variance decompositions useful for sensitivity analysis and RPD can be conveniently computed if the surrogate is expressed in terms of tensor product basis functions. Oakley and O'Hagan (2004) showed how a GP can be analytically decomposed into functional ANOVA and variance components. Many other methods for estimating the functional ANOVA and variance decompositions have been proposed in the sensitivity analysis literature. Sudret (2008) and Crestaux, Maître, and Martinez (2009) demonstrated that these decompositions can be easily computed with the use of polynomial chaos expansions.
In this article, we propose an orthonormal polynomial modeling method for RPD with computer experiments. The model is a linear combination of polynomials that are orthonormal with respect to the assumed independent input distributions. Its main advantage is to allow an insightful analysis for RPD to be conveniently performed. In particular, the location and dispersion effects of control factors and the main effects and interactions of noise factors can be easily quantified and understood. Questions commonly faced by practitioners such as ranking of noise factors based on their contribution to the variance, and ranking of control factors based on their influence on the mean or variance can be answered with the proposed method. Hence, it is a useful addition to the existing toolbox of methods for robust design with computer simulations, which consists mainly of optimization techniques.
For expensive simulators, an orthonormal polynomial model of the simulator would need to be developed based on a limited number of function evaluations. This can create substantial uncertainty about the true function. We propose two related methods to derive an orthonormal polynomial model that allows uncertainty about the true function to be quantified via a normal distribution of the model coefficients. The methods involve fitting a GP model to the experiment data first and then approximating the GP model with an orthonormal polynomial model.
There are three key computational advantages of working with the proposed orthonormal polynomial model. First, the set of variance decomposition-based indices proposed by Chen, Jin, and Sudjianto (2006) and some additional sets that we introduce, which we call RPD indices because they are useful for understanding variation in RPD, are simple ratios of sums of squares of the model coefficients. Point estimates and credible intervals for the RPD indices are easily obtained by simulation. Second, the variance of the response at any given control factor setting can be easily decomposed into a sum of contributions from main effects and interactions of noise factors. Third, the response mean and variance functions can be estimated via simple formulas. Posterior realizations of the functions are obtained by simulating the model coefficients from their posterior distribution. This gives smooth estimates that are easy to optimize.
The three computational advantages are important. Direct computation of point and interval estimates of variance decompositions and RPD indices of the GP model is challenging. One direct route is to first obtain the functional ANOVA components of the GP (given in Oakley and O'Hagan 2004), which are GPs, and then simulating random realizations of these components on a set of points to approximate the variance of each realization with respect to the inputs. This is computationally cumbersome. Finding optimal control factor settings with an expected loss function estimated with the GP model is often an important problem. The straightforward method of estimating the objective function via Monte Carlo simulation and using a stochastic approximation algorithm (Nemirovski et al. 2009) is inefficient. The methodology proposed in this article overcomes the computational problems just described. We propose two computationally efficient methods for closely approximating the GP model with an orthonormal polynomial model. Both of the proposed methods do not involve computation of integrals. The model coefficients are expressed as linear functions of the GP realizations on a set of points and this yields multivariate posterior normal distributions for the model coefficients. The tractable posterior distributions for the model coefficients are used to obtain point and interval estimates of RPD indices and the response mean and variance functions. To the best of our knowledge, only the work of Chen, Jin, and Sudjianto (2006) has addressed variation decomposition for RPD. Although Chen, Jin, and Sudjianto (2006) advocate the use of multivariate tensor product basis functions, they do not use orthonormal polynomials. Moreover, their approach can only provide point estimates of variance components and RPD indices, assuming a tensor product metamodel is obtained by some existing method. We address the variation decomposition for RPD problem more comprehensively by introducing new RPD indices that allows detailed comparison of control factors and noise factors, and a new approach that allows construction of point and interval estimates for all indices. A more detailed comparison of the proposed method with the work of Chen, Jin, and Sudjianto (2006) and Oakley and O'Hagan (2004) is given in Appendix C.
The rest of the article is organized as follows. Section 2 reviews the functional analysis of variance (ANOVA) decomposition. Section 3 defines the RPD indices. Section 4 introduces the orthonormal polynomial model. The functional ANOVA and variance decompositions, RPD indices, and response mean and variance obtained with the model are given. Section 5 introduces two methods for approximating a GP model with an orthonormal polynomial model. Section 6 gives an example to illustrate the proposed methodology. Concluding remarks are given in Section 7.

FUNCTIONAL ANOVA AND VARIANCE DECOMPOSITIONS
Let the computer model inputs be x = (x 1 , . . . , x c , x c+1 , . . . , x d ) T , where x 1 , . . . , x c are control factors, x c+1 , . . . , x d are noise factors, and all factors are continuous. Assume that all inputs are assigned independent distributions and let w i be the density function of x i , which has support χ i ⊂ R. Thus, the joint density of the x i 's is Note that the distributions of the control factors do not represent variation of the control factors in the actual system. They can be thought of as either representing uncertainty in the actual control factor values that will be chosen by the engineer or a mechanism for averaging certain quantities of interest in RPD (defined in Section 3) over the values of the control factors.
Let the functional relationship between inputs and output of the computer model be denoted by f : χ → R. Assume that where χ i j f (i 1 , ... ,i k ) (x i 1 , . . . , x i k )w i j (x i j )dx i j = 0 for all j = 1, . . . , k (Efron and Stein 1981). In this article, we shall call f (i) a main effect, and f (i 1 , ... ,i k ) an interaction between k factors. The functional ANOVA decomposition allows us to decompose the total variance of f , that is, where Estimation of the functional ANOVA and variance decompositions is a key objective in the global sensitivity analysis literature because sensitivity indices are defined in terms of the variance components (Saltelli, Chan, and Scott 2000). There are many articles on Monte Carlo simulation and metamodeling methods for estimating sensitivity indices (e.g., Sobol 2001;Oakley and O'Hagan 2004;Sudret 2008;Crestaux, Maître, and Martinez 2009).

QUANTIFICATION OF VARIATION FOR ROBUST PARAMETER DESIGN
Aside from global sensitivity analysis, the functional ANOVA and variance decompositions are also useful for understanding variation in RPD. However, this application has received little attention in the RPD literature.
In RPD, the functional ANOVA and variance decompositions of f for fixed control factor settings are of interest because they give the contributions of noise main effects and interactions to the response variance at given control factor settings. These conditional decompositions can be expressed in terms of the full decompositions given in Section 2. For fixed x 1 , . . . , x c , the functional ANOVA decomposition of f is given by Note that the main effect of noise factor x t is g (t) and the interaction between noise factors x t 1 , . . . , x t l , l ≥ 2 is g (t 1 , ... ,t l ) .
The mean of f , that is, the response mean, is The variance of f , that is, the response variance, is The response variance at a given control factor setting can be broken down into a sum of variances due to main effects and interactions between noise factors. The contribution of the interaction between noise factors x t 1 , . . . , x t l to the response variance is given by Thus, if it is of interest to assess the strength of the contribution of x t 1 , . . . , x t l to the variance at a particular choice of (x 1 , . . . , x c ), we can examine the quantity V (t 1 , ... ,t l ) (x 1 , . . . , x c )/V(x 1 , . . . , x c ). We can quantify the contribution of the interaction between noise factors x t 1 , . . . , x t l to the total variance by taking the average contribution of the interaction to the response variance over the control factor settings and expressing it as a fraction of the total variance V , that is, This is because the total variance can be decomposed into a sum of contributions from the response mean and response variance, that is, This behavior of N (t 1 , ... ,t l ) is sensible because if the average response variance is small compared to the variation in the response mean, we can expect less improvement from reducing the variance than adjusting the mean.
The usefulness of control factor s for neutralizing noise variation can be measured by In addition, the contribution of control factor s to variation in the mean can be measured by Note that A (s) + B (s) is the total sensitivity index (Saltelli, Chan, and Scott 2000) of x s , and the indices N (t) (average contribution of the main effect of noise factor t to the response variance) and A (s) were first introduced by Chen, Jin, and Sudjianto (2006). However, Chen, Jin, and Sudjianto (2006) motivated N (t) in a different way and they do not consider the interactions of noise factors, that is, It is clear from the above discussion that a few other sets of indices can be defined and used for RPD. For example, instead of (12), we can use a more detailed set of indices that measure the contribution of main effects and interactions of control factors to the variation in the mean separately. To the best of our knowledge, we are the first to introduce and use the indices V (t 1 , ... ,t l ) (x 1 , . . . , x c )/V(x 1 , . . . , x c ), N (t 1 , ... ,t l ) for l ≥ 2, and B (s) for RPD. In this article, we refer to V (t 1 , ... ,t l ) (x 1 , . . . , x c )/V(x 1 , . . . , x c ), N (t 1 , ... ,t l ) , A (s) , and B (s) collectively as RPD indices. Note that RPD indices defined with respect to fixed values of (x i 1 , . . . , x i k ) are also useful. We denote the indices N (t 1 , ... ,t l ) corresponding to fixed values of (x i 1 , . . . , x i k ) by N (t 1 , ... ,t l ) |(x i 1 , . . . , x i k ), {t 1 , . . . , t l } ∩{i1, . . . , i k } = φ (similarly for A (s) , B (s) , and V (t 1 , ... ,t l ) (x 1 , . . . , x c )/V(x 1 , . . . , x c )). These indices are useful in cases where the experimenter decides to fix some control factors or takes action to eliminate some of the noise factors.
The quantities M, V, V (t 1 , ... ,t l ) /V, N (t 1 , ... ,t l ) , A (s) , and B (s) provide information not available by an optimization of an expected loss function. Optimization of an expected loss function only provides an optimal control factor setting. In contrast, M, V, and V (t 1 , ... ,t l ) /V give the mean, variance, and fraction variance due to interactions between noise factors, which can be more useful. For instance, if further variance reduction is necessary at a chosen control factor setting, the experimenter would be able to select the right set of noise factors to eliminate based on the V (t 1 , ... ,t l ) /V's. Note that if noise factor t 1 and t 2 are eliminated, then the expected fraction of variance that is eliminated is [V (t 1 ) + V (t 2 ) + V (t 1 ,t 2 ) ]/V. The indices N (t 1 , ... ,t l ) allow us to assess the importance of interactions between noise factors x t 1 , . . . , x t l . If a variety of products corresponding to different choices of the control factors is to be produced, eliminating noise factors associated with large N (t 1 , ... ,t l ) 's will help improve the quality of all product varieties. The indices A (s) and B (s) quantify the influence of control factor s on the variance and mean, respectively. The information provided by these quantities is useful for advancing scientific understanding of the system and making decisions for improving system performance. For example, if A (s) is small but B (s) is large, control factor s can be used to adjust the mean.

ORTHONORMAL POLYNOMIAL MODEL
This section introduces the orthonormal polynomial model for RPD. The idea of using orthonormal polynomial models for uncertainty propagation and sensitivity analysis has been explored by many authors and such models are commonly called polynomial chaos expansions. Since the seminal work of Ghanem and Spanos (1991), there is increasing interest in the use of polynomial chaos expansions for approximating solutions of stochastic differential equations. There are two types of approaches for constructing these approximations: the stochastic Galerkin and collocation methods (Le Maître and Knio 2010;Xiu 2010). Sudret (2008) demonstrated that global sensitivity analysis is made very convenient with polynomial chaos expansions. Assume Note that if β α = χ f (x)ψ α (x)w(x)d x, thenf P is the projection f P Pr of f onto the space of polynomials of degree at most P . Convergence of the projection f P Pr to f requires certain conditions on w i .i = 1, . . . , d. Assume that the measure corresponding to w i is uniquely determined by its moments m ik , k ∈ N 0 . Then, the set of orthonormal polynomials (Riesz 1923;Riera and Malumbres 1995). If all w i are uniquely determined by their moments, then the tensor product basis Reed and Simon 1980, p. 50). This implies that In this article, we shall approximate f byf P and estimate the functional ANOVA decomposition of f with the corresponding decomposition off P . It is easy to see that iff P =f 0 + d k=1 1≤i 1 <···<i k ≤df (i 1 , ... ,i k ) denotes the functional ANOVA decomposition off P , then Thus, the variance components arê Given thatf P converges to f ,f (i 1 , ... ,i k ) converges to f (i 1 , ... ,i k ) also. This result is stated in Proposition 1 and proven in Appendix A. Lemma 1 gives a result used to prove Proposition 1.
Remark 1. The projection f P Pr minimizes ||f P − f || 2 w among allf P . If each w i is uniquely determined by its moments and if f P = f P P r , then Proposition 1 holds.
Note that if it is decided to fix some of the x i 's,f P collapses into another P degree orthonormal polynomial model. For example, if we fix x 1 , . . . , x j , we get the collapsed model where γ (α j +1 , ... ,α d ) = (α 1 , ... ,α j )∈N j 0 : The functional ANOVA and variance decompositions for this collapsed approximation can be computed in the same way as forf P . Quantities such aŝ V(x 1 , . . . , x c )|(x 1 , . . . , x j ) andN (t 1 , ... ,t l ) |(x 1 , . . . , x j ) can be computed from (21) in the same way as the corresponding unconditional versions are computed fromf P .
Techniques for computing ψ i j for any given w i have been developed (Gautschi 2004). Thus, the methods developed in this article will work for almost any assumed densities w 1 , . . . , w d . However, the Matlab codes provided in the online zip file only work for the case where w 1 , . . . , w d are beta distributions on [−1, 1], in which case χ i = [−1, 1], i = 1, . . . , d and χ = [−1, 1] d . Note that it is appropriate to set w 1 , . . . , w c to uniform distributions when we are interested in all control factor settings in [−1, 1] c . In addition, the beta distribution should be sufficiently flexible to approximate many noise distributions on a bounded interval. If w 1 , . . . , w d are beta distributions, each of the ψ i j 's belongs to the class of Jacobi orthonormal polynomials. Given , that is, a beta density with parameters B i 1 and B i 2 , the Jacobi orthonormal polynomials ψ i j can be generated from the recurrence relation b j ψ i Three methods, that is, least-square regression, spectral projection, and interpolation (Le Maître and Knio 2010; Xiu 2010), have been employed to estimate the coefficients β α of the approximationf P from experiment data. However, all of these methods can only give point estimates and not interval estimates for β α , that is, they do not allow quantification of uncertainty in estimating β α from a finite number of function evaluations. Confidence intervals for the β α 's constructed via linear regression techniques are not meaningful because the output is deterministic. In this article, we take an approach based on GP models to estimate P and β α . Note that GP models are widely used in computer experiments and they allow quantification of uncertainty in estimating the true function with finite data. Our approach involves fitting a GP model to the experiment data and then approximating the GP model with an orthonormal polynomial model. This automatically generates a joint distribution of the β α 's that quantifies uncertainty in these coefficients. Steinberg and Bursztyn (2004) demonstrated that GP models can often be approximated quite well by polynomials, which lends support to the approach described in this paragraph. GP modeling is reviewed in Appendix B. In this article, we shall use a stationary GP model with Gaussian correlation function. Given data Y from a design D = {x 1 , . . . , x n }, uncertainty about the true function f is captured by the posterior process Y (·)| Y ,λ,σ 2 ,θ ∼ GP μ ·|λ,θ , C ·, ·|σ 2 ,θ , where μ(·λ, θ ) is the posterior mean function, C ·, ·| σ 2 , θ is the posterior covariance function, andλ,σ 2 ,θ are estimates of the mean λ, variance σ 2 , and correlation parameter θ, respectively.

APPROXIMATION OF GP MODEL WITH ORTHONORMAL POLYNOMIAL MODEL
In this section, we propose two methods to approximate β given P . Each method is implemented in a procedure that increments P and checks a goodness-of-fit statistic to determine if P large enough forf P to be a good approximation of the posterior process (23), which we shall denote as Y (·) ∼ GP (μ(·), C(·, ·)) in this section for simplicity of notation. The methods employ a total of 2N points x 1 , . . . , x 2N taken from a Sobol sequence. However, other quasi-Monte Carlo point sets can be used. The first N points are used to construct the approximation while the next N points are used to validate the approximation. For simplicity of notation, we suppose that the ψ α 's are ordered in some way so that . The first method, which is the weighted least-square (WLS) method, approximates β by a WLS regression of Y N,1 = (Y (x 1 ), . . . , Y (x N )) T on φ 1 = (ψ j (x i )) 1≤i≤N,1≤j ≤m with weights w(x 1 ), . . . , w(x N ), assuming that w ∈ (0, ∞) almost everywhere in χ . The WLS approximation is where W 1 = diag{w(x 1 ), . . . , w(x N )} and φ 1 is assumed to have full column rank. Note that points x i in the Sobol sequence giving w( w , which as stated in Lemma 1, is the sum of the errors of approximating the functional ANOVA components of f with those off P . We denote the P th degree orthonormal polynomial modelf P with coefficient vector β LS byf P LS , that is,f P LS (x) = ψ(x)β LS . For the WLS method, N ≥ m is necessary for φ 1 to have full column rank. Taking x 1 , . . . , x N from a Sobol sequence is reasonable because Sobol sequences have the property that lim N→∞ (Niederreiter 1992). This would ensure thatf P LS converges to the projection f P Pr as N → ∞ if the WLS method is applied to f directly. In Appendix D, we prove that can be made arbitrarily small by making P and N sufficiently large, where the expectation is with respect to Y (·).
The second method, which is the interpolatory weighted leastsquare (IWLS) method, approximates β by a WLS regression of Y N,1 on φ 1 with weights w(x 1 ), . . . , w(x N ) subject to the constraint that φ D β = Y , where φ D = (ψ j (x i )) 1≤i≤n,1≤j ≤m . This constraint forces the orthonormal polynomial model to interpolate the experiment data. Because the posterior GP (23) interpolates the data with probability 1, we can expect the added constraints to the WLS regression to improve the accuracy with which the orthonormal polynomial model approximates the posterior GP. Assuming that φ 1 has full column rank and φ D has full row rank, it can be shown that the IWLS approximation of β is given by where η LS = Lμ N,1 + ξ and LS = LC N,1 L T . We denotê f P ILS (x) = ψ(x)β ILS . A necessary condition for φ 1 to have full column rank and φ D to have full row rank is N ≥ m ≥ n.
We assess the accuracy of the approximationf P LS of Y (·) using the validation points x N+1 , . . . , w not biased by the use of the points where Y 2N = (Y (x 1 ), . . . , Y (x 2N )) T , and Since the function Y (·) is random, Q (LS, P , N) is a random quantity. Thus, we take the expectation of Q (LS, P , N) as a measure of discrepancy betweenf P LS and Y (·), that is, where Similarly, we can measure the accuracy off P ILS with This gives the following measure of discrepancy between f P ILS and Y (·): We can use the expected total sum of squares , as a yardstick for the magnitudes of E[Q (LS, P , N)] and E[Q (ILS, P , N)]. The ETSS is given explicitly by We consider P and N to be adequately large for the WLS method if E[Q (LS, P , N)]/ETSS is small. The WLS method is not strongly affected by the choice of N ≥ m as long as P is sufficiently large and φ 1 is of full column rank. This is due to the absence of random error in observing Y (·). The following iterative procedure is employed to select P for the WLS method.
1. Set P = min{p ∈ N : ( d + p d ) ≥ 2n} and specify the target accuracy T . 2. Set N = 2( d + P d ) and x 1 , . . . , x 2N equal to 2N points from a Sobol sequence. If φ 1 is not of full column rank, increase N by one until φ 1 has full column rank.

Check whether the condition 100E[Q(LS, P , N)]/
ETSS < T is satisfied. If the condition is not satisfied, increase P by 1 and go to Step 2. Otherwise, go to Step 4. 4. Compute η LS and LS using all 2N points.
Remark 2. We initiate P large enough so that there are sufficiently many ψ i 's to closely approximate the n experiment data points and to quantify uncertainty about the true function. This makes sense because the GP model is a population of random functions that interpolates the experiment data points.
Remark 3. In our simulations, we observed that the choice N = 2( d + P d ) always gives a φ 1 with full column rank when x 1 , . . . , x N are points from a Sobol sequence.
We consider P and N to be adequately large for the IWLS method if E [Q (ILS, P , N)] /ETSS is small. As with the WLS method, the IWLS method is not strongly affected by the choice of N . However, we require that m and N be large enough so that φ D has full row rank and φ 1 has full column rank to ensure that β ILS is well defined. In this article, the following iterative procedure is employed to select P for the IWLS method.
1. Set P = min{p ∈ N : ( d + p d ) ≥ 2n} and specify the target accuracy T . If φ D is not of full row rank, increase P by one. Repeat until φ D has full row rank. 2. Set N = 2( d + P d ) and x 1 , . . . , x 2N equal to 2N points from a Sobol sequence. If φ 1 is not of full column rank, increase N by one until φ 1 has full column rank.

Check whether the condition 100E[Q(ILS, P , N)]/
ETSS < T is satisfied. If the condition is not satisfied, increase P by one and go to Step 2. Otherwise, go to Step 4. 4. Compute η ILS and ILS using all 2N points.
Remark 4. In our simulations, we observed that the choice P = min{p ∈ N : ( d + p d ) ≥ 2n} always give a φ D with full row rank when D is a randomly generated maximin Latin hypercube design (these designs are obtained by randomly generating a number of Latin hypercube designs and choosing the design that performs best with respect to the maximin criterion).
The uncoded inputs z 1 , z 2 , z 3 , z 4 , z 1 , z 2 , z 3 are described in Table 1. Each input is coded to take values in [−1, 1] and the standardized inputs are denoted by x =  (x 1 , . . . , x 7 , and x 7 = z 3 . The noise factors x 4 , . . . , x 7 are assumed to Oil viscosity Noise 3 × 10 −6 10 × 10 −6 z 1 Deviation from nominal recess radius Noise −2 2 z 2 Deviation from nominal bearing step radius minus recess radius Noise  Table 2 presents the posterior mean, and 98% credible intervals for each of the in-dicesÂ (s) andB (s) constructed based on 10,000 randomly drawn values of β LS and β ILS . Results for the WLS and IWLS methods are similar. We also compute estimates of A (s) and B (s) given by the posterior mean of the GP using Equation (35) of Chen, Jin, and Sudjianto (2006), which we call the CJS formula. A code written in Matlab is used and all univariate integrals are evaluated with the Matlab quad function. Estimates of B (s) based on the CJS formula agree well with the WLS and IWLS point estimates but estimates of A (s) based on CJS formula seem to be significantly smaller than the WLS and IWLS point estimates. However, because point estimates of A (s) are all small, they are not of practical interest. Thus, the discrepancies between the point estimates, which may be attributed to differences in estimators (see Appendix C), should not be of concern.
The CJS formula does not give interval estimates. We see that the credible intervals forÂ (s) are wide relative to their posterior means but the credible intervals suggest that allÂ (s) are small with high probability. In contrast, the posterior means of all B (s) are large and their credible intervals are wide. Thus, it is of interest to obtain more accurate estimates of allB (s) to allow correct ranking of the control factors based on their influence on the response mean.
When the design is D 70 , both WLS and IWLS methods (T = 5) give a model of degree 4, with N =  Table 3 presents the posterior mean and 98% credible intervals forÂ (s) andB (s) constructed based on 10,000 randomly drawn values of β LS and β ILS . Estimates of each A (s) and B (s) obtained with the WLS and IWLS methods are in close agreement and the credible intervals are fairly narrow. Estimates of B (s) based on the CJS formula agree well with the WLS and IWLS point estimates.
Since the E(Â (s) |Y )'s and upper credible limits for theÂ (s) 's are all small, none of the control factors have a strong interaction with a noise factor. Based on the E(B (s) |Y ) values, control factor x 2 (nominal bearing step radius minus recess radius) has the strongest effect on the mean. Control factor x 1 (nominal recess radius) has slightly less influence on the mean. The effect of x 3 (nominal oil flow rate) on the mean is smallest among the control factors but significant. By comparing Table 2 and Table 3, we see that the point estimates of B (s) in Table 2 are inaccurate. Furthermore, they are misleading since they indicate that control factor x 3 has a stronger effect on the mean than control factor x 1 . The credible intervals given by the WLS/IWLS method in Table 2 do warn us that the estimates are inaccurate. Table 4 presents the posterior mean, and 98% credible intervals for each of the indicesN (t 1 , ... ,t l ) . We see that estimates of N (t 1 , ... ,t l ) obtained with the WLS and IWLS methods are in close agreement and the width of the credible intervals indicate that there is not much uncertainty about the indices. The main effects of noise factors x 4 , . . . , x 7 are the only nonnegligible effects and the main effect of noise factor x 4 (oil viscosity) stands out as the most significant noise effect; it is responsible for about 44% of the total variation in f .
To validate the results in this example, we perform a Monte Carlo simulation using the Sobol program in the sensitivity package in R using two random samples of inputs of size 30,000 to estimate Sobol indices up to order three, which requires 1.92 × 10 6 evaluations of f . Estimates of all second-and   N (6) , and N (7) , respectively. This confirms that the point estimates in Tables 3 and 4 are accurate.
Since the response is the load carrying capacity of the bearing, it is a larger-the-better quality characteristic. For both WLS and IWLS methods, maximization of E(M|Y ) − 3[E(V|Y )] 1/2 over [−1, 1] 3 gives the optimal control factor setting (−1, −1, 1). The reason that the optimal nominal recess radius x 1 and optimal nominal bearing step radius minus recess radius x 2 equal the minimum values of the factors is because the minimum values give a small film thickness R 1 , which in turn causes the inlet pressure to be high. The 98% credible intervals for the mean and variance at the optimal setting given by the WLS method are [9.76, 12.59] and [3.07, 11.68], respectively. Similar intervals are given by the IWLS method.
It is of interest to understand which noise factor main effects and interactions contribute most to the response variance at (−1, −1, 1). Table 5 gives the posterior mean and 98% credible intervals for theV (t 1 , ... ,t l ) (−1, −1, 1)/V(−1, −1, 1)'s. The values obtained using both WLS and IWLS methods are in good agreement. About 80% of the variance at (−1, −1, 1) is due to the main effect of noise factor x 4 (oil viscosity). The credible intervals indicate that this point estimate may not be accurate but since the lower credible limits obtained with the WLS and IWLS methods (about 55%-56%) are considerably large than upper credible limits of the otherV (t 1 , ... ,t l ) (−1, −1, 1)/V(−1, −1, 1)'s, we can be quite sure that the main effect of noise factor x 4 contributes the most to the variance at (−1, −1, 1). Thus, to further reduce variation at (−1, −1, 1), the engineer should institute measures to reduce variation in oil viscosity.
Note that if the computer simulation is very time consuming, computational savings can be realized by adopting a sequential design approach that stops when the credible intervals for the RPD indices are narrow. This example illustrates the utility of credible intervals of RPD indices for judging the accuracy of the estimates of the indices. A batch sequential design approach can be implemented based on the nested Latin hypercube design (Qian 2009).

CONCLUSIONS
This article proposes an orthonormal polynomial model for robust design with independent noise factors. Simple expressions for the functional ANOVA and variance decompositions of the model are available. This enables easy computation of quantities called RPD indices that help assess the contributions of main effects and interactions of noise factors to the variance, the potential for variance reduction by adjusting a control factor, and the contribution of the control factors to variation in the mean. The response mean and variance are also easily computed and optimized with the model. We propose two novel methods (WLS and IWLS) to take into account uncertainty about the true function. Both methods yield a normal distribution for the model coefficients. Consequently, point and interval estimates of RPD indices and the mean and variance models can be easily constructed. Variance components of the posterior mean of the GP can be obtained with existing methods. However, this is not the case with the variance components of the sample paths of the posterior GP, which are needed for constructing interval estimates. The methodology in this article obtains the probability distribution of the variance components of the sample paths. Examples are given to demonstrate the usefulness of the proposed methodology.
The IWLS is preferable to WLS for deterministic simulations since it interpolates the data and is more accurate near the design points. While the article focuses on experiments on deterministic simulators, the proposed method can be applied in a straightforward fashion to stochastic simulations or physical experiments. The WLS method should be preferable for stochastic simulation, assuming the stochastic kriging model (Ankenman et al. 2010) is employed, because the simulation output is subject to random errors. For physical experiments, a Bayesian linear model approach (George and McCulloch 1993) can be employed.
The WLS, IWLS, and CJS formula-based methods take negligible time to produce the results in this article. The proposed approach performs very well for smooth functions in low-tomoderate dimensions. When d is large, it can be time consuming or memory intensive if a high-degree polynomial model is needed to approximate the GP model. However, in highdimension problems, the GP model is able to predict reliably with a small design only if the true function is very smooth or a handful of inputs are significant. Thus, a variable selection step can be added to the WLS/IWLS method to identify a sparse polynomial that best approximates the GP model. A referee raised the question of the accuracy and robustness of the proposed method for nondifferentiable high-dimension problems. Appendix F demonstrates that the proposed approach performs acceptably well in some nondifferentiable and fairly high-dimension problems, and works excellently for smooth and fairly high-dimension problems.

SUPPLEMENTARY MATERIALS
Appendices.pdf: This file contains Appendices A-F. Appendix A contains proofs of Lemma 1 and Proposition 1. Appendix B gives a review of GP modeling. Appendix C compares in detail the proposed and alternative methods for estimating RPD indices. Appendix D contains a proof of the convergence of the WLS method. Appendix E illustrates estimation of the response mean and variance. Appendix F provides numerical examples of estimation of RPD indices for moderately highdimension functions.
Online ZIP file: This file contains Matlab codes for implementing the methodology proposed in this article. Data for all examples are provided.

ACKNOWLEDGMENTS
This research was supported by City University of Hong Kong Start-Up Grant 7200364 and Early Career Scheme (ECS) project no. 21201414 sponsored by the Research Grants Council of Hong Kong. We thank the associate editor and three referees for comments that helped improve the article significantly.