Stochastic Polynomial Interpolation for Uncertainty Quantification With Computer Experiments

Multivariate polynomials are increasingly being used to construct emulators of computer models for uncertainty quantification. For deterministic computer codes, interpolating polynomial metamodels should be used instead of noninterpolating ones for logical consistency and prediction accuracy. However, available methods for constructing interpolating polynomials only provide point predictions. There is no known method that can provide probabilistic statements about the interpolation error. Furthermore, there are few alternatives to grid designs and sparse grids for constructing multivariate interpolating polynomials. A significant disadvantage of these designs is the large gaps between allowable design sizes. This article proposes a stochastic interpolating polynomial (SIP) that seeks to overcome the problems discussed above. A Bayesian approach in which interpolation uncertainty is quantified probabilistically through the posterior distribution of the output is employed. This allows assessment of the effect of interpolation uncertainty on estimation of quantities of interest based on the metamodel. A class of transformed space-filling design and a sequential design approach are proposed to efficiently construct the SIP with any desired number of runs. Simulations demonstrate that the SIP can outperform Gaussian process (GP) emulators. This article has supplementary material online.


INTRODUCTION
Due to advances in computing, there is widespread interest in the development and use of engineering and physical models. Many of the models are formulated as differential equations, and solved numerically using methods such as the finite element method. One of the main purposes of these models is to predict one or more outputs as a function of a set of inputs. However, because solving the models is time consuming, it is difficult to use the models to predict the outputs at a large number of input settings, which is essential for purposes such as uncertainty propagation, sensitivity analysis, robust design, and optimization. Thus, metamodels are constructed using data from a computer experiment to approximate the computer output.
Polynomial metamodels are used in almost every field of engineering (mechanical design, manufacturing, aeronautics, electronics, bioengineering, nuclear engineering, etc.) as surrogates of complex computer models. They are advantageous for some applications due to their simplicity. Response surface methodology (Box and Draper 2007) has been used in computer experimentation to build low-order polynomials metamodels for prediction and optimization (Simpson et al. 1998;Wang and Shan 2007). Nonintrusive uncertainty quantification methods, which are methods that treat the computer model as a black box, are currently receiving a lot of attention from researchers. Many of the nonintrusive methods use polynomial metamodels. Since the seminal work of Ghanem and Spanos (1991), there is increasing interest in the use of polynomial chaos methods for approximating solutions of stochastic differential equations. Polynomial chaos methods approximate the computer output using polynomials that are orthogonal with respect to the input distribution (Xiu and Karniadakis 2002). Consider the case with input u: if {ψ 0 , ψ 1 , . . .} is a sequence of polynomials that are orthogonal with respect to the density w of u, then Y (u) = ∞ α=0 β α ψ α (u) is a polynomial chaos expansion of Y . In practice, truncated polynomial chaos expansions with estimated coefficients are employed. This gives simple approximations of the output with very useful properties that can significantly simplify the tasks of global sensitivity analysis (Sudret 2008;Crestaux, Le Maître, and Martinez 2009) and robust parameter design (Welch et al. 1990;Chen, Jin, and Sudjianto 2006). The use of truncated polynomial chaos expansions can in these applications allow straightforward computation of the functional analysis of variance (ANOVA) decomposition, sensitivity indices, and mean and variance models.
There are two types of approaches for constructing polynomial chaos expansions: the stochastic Galerkin and collocation approaches (Le Maître and Knio 2010;Xiu 2010). The construction of polynomial chaos expansions or interpolating polynomial metamodels via computer experiments for approximating the solution of stochastic differential equations is called stochastic collocation. This approach has three variants: least squares, interpolation, and projection. Some authors (e.g., Le Maître and Knio 2010) use the term nonintrusive polynomial chaos methods to refer to the construction of polynomial chaos expansions using least squares and projection, and reserve "stochastic collocation" for the construction of interpolating polynomial metamodels. A popular method for stochastic collocation by interpolation is the sparse grid method first employed by Xiu and Hesthaven (2005). Stochastic collocation methods have been widely used to quantify uncertainty in engineering model predictions (e.g., Agarwal and Aluru 2009;Lazarov, Schevenels, and Sigmund 2012).
Polynomial interpolation has traditionally been a research area of applied mathematicians. Univariate polynomial interpolation (Phillips 2003;Trefethen 2013) is a well-understood subject. However, many issues in multivariate polynomial interpolation are still not well understood. Research on this subject has focused on three areas: characterization and construction of point sets so that a unique polynomial interpolates the data (Chung and Yao 1977;Narayan and Xiu 2013), selection of an interpolating polynomial space given a set of points (De Boor and Ron 1990;Narayan and Xiu 2012), and the use of sparse grids (Barthelmann, Novak, and Ritter 2000). Interpolation error bounds have also been derived (Sauer and Xu 1995;Barthelmann, Novak, and Ritter 2000). However, these bounds are not useful in practice because they depend on the function being interpolated. A review of the recent literature on multivariate polynomial interpolation is given in Gasca and Sauer (2000).
In the statistical literature, the stationary Gaussian process (GP) model with continuous or differentiable sample paths (Sacks et al. 1989;Currin et al. 1991) is widely used as a metamodel. It captures interpolation uncertainty as an infinitedimensional posterior process and is one of a handful of models that capture the uncertainty probabilistically. Articles that address the problem of constructing interpolating or nearly interpolating polynomial metamodels include Pistone and Wynn (1996), Holliday et al. (1999), Bates, Giglio, and Wynn (2003), and Bates, Maruri-Aguilar, and Wynn (2014). However, these polynomial interpolators do not allow quantification of interpolation uncertainty.
Despite the popularity and usefulness of interpolating polynomials for approximating deterministic computer models, statistical methods for quantifying polynomial interpolation uncertainty and designing experiments to reduce this uncertainty are lacking. This research develops a stochastic interpolating polynomial (SIP) model, which is an orthonormal polynomial-based model with the capability to capture interpolation uncertainty through the Bayesian posterior of the output. It enables probabilistic statements to be made about interpolation errors based on the data. The SIP can be used to provide point estimates and credible intervals for the output and other quantities such as sensitivity indices that depend on the output. This allows assessment of the effect of interpolation uncertainty, which is extremely important in practice. It is seldom possible to reduce interpolation uncertainty to a negligible level because the computer runs are expensive (which is the reason for constructing the metamodel in the first place). Practitioners must take into account interpolation uncertainty when basing decisions on metamodel predictions or risk being misled by the metamodel. For example, it is demonstrated in Apley, Liu, and Chen (2006) and Tan and Wu (2012) that ignoring interpola-tion uncertainty can give misleading results in robust parameter design.
A class of transformed space-filling design is proposed for constructing the SIP. These designs can be of any size, which gives the experimenter flexibility in choice of design size. A sequential design approach that reduces interpolation uncertainty adaptively as more data becomes available is also developed. The approach chooses follow-up runs one at a time to eliminate output uncertainty at points with maximum prediction variance. In contrast, the sparse grid method, which is widely used to construct interpolating polynomials, require large design sizes for high degree polynomial interpolation and contain large gaps between allowable sizes. Consequently, the SIP can require less computation time and cost compared to sparse grids to achieve the same level of accuracy.
This article is organized as follows. Section 2 introduces the SIP model. Section 3 discusses specification of prior distributions for the model parameters. Section 4 proposes initial designs for uni-dimensional and multi-dimensional input spaces. In Section 4.1, the performance of the SIP is also compared with Lagrange interpolation for univariate input spaces. A sequential design approach is introduced in Section S2 of the online supplement. Section 5 compares the performance of the SIP with the GP model in multi-dimensional input spaces using test functions that include real physical models. Concluding remarks are given in Section 6.

STOCHASTIC INTERPOLATING POLYNOMIAL
The SIP is a polynomial that interpolates the data. Despite the lack of experimental error, interpolation uncertainty can be quantified through a Bayesian approach that introduces a prior for the functional relationship between outputs and inputs, as in GP modeling. This approach is taken in the proposed research. While it is possible to work with almost any polynomial basis functions, we shall work with polynomials that are orthonormal with respect to probability distributions as this is convenient and useful. Let w 1 , . . . , w d be probability density functions and {ψ i j : j ∈ N 0 = {0, 1, . . .}}, where ψ i j is a polynomial of degree j , be the sequence of polynomial basis functions in R that are orthonormal with respect to the probability density w i . These functions can be easily generated with a three-term recurrence relation (Gautschi 2004).
Let χ ⊂ R d be the design region, u = (u 1 , . . . , u d ) ∈ χ be the possibly centered and scaled computer input, and The symbol N + (d, P ) refers to a set of d-tuples of nonnegative integers whose total sum is smaller than or equal to P . This corresponds to exponents of monomials whose degree does not exceed P . We model the computer output as which is a polynomial of degree smaller than or equal to P . The parameter μ is introduced for convenience in writing the prior for the β α 's later (so that the prior mean of all β α 's can be set to zero). This is equivalent to omitting μ in (1) and setting the prior mean of β 0 to μ. There are a total of q = (d + P )!/(d!P !) polynomial coefficients (i.e., the β α 's including β 0 ). It is often convenient and advantageous to use ψ α 's such that w = d i=1 w i is the probability density of the input u (assumed to have independent components) and to take χ = d i=1 χ i , where χ i is the support of w i , for the purposes of uncertainty propagation and sensitivity analysis. In this case, a useful property of the model (1) Y (i,j ) is the interaction between inputs i and j , etc.), then where Sudret 2008). It follows that the variance components of Y are given by The simple formulas (2) and (3) for the functional ANOVA and variance decompositions are key advantages of the model (1) over the GP model because for the GP model, computation of these decompositions requires computation of iterated multi-dimensional integrals in general (Oakley and O'Hagan 2004). Note that if the purpose of the computer experiment is simply to build a metamodel of the form (1) to predict the computer model on χ = [−1, 1] d (which is a common objective), we suggest either choosing the ψ α 's to be tensor products of orthonormal Legendre polynomials or choosing the ψ α 's to be tensor products of orthonormal Chebyshev polynomials. These choices give good results (see Section 5). For example, consider the case where d = 1 and w 1 is the uniform distribution over [−1, 1], then ψ 0 = 1, ψ 1 = √ 3u 1 , ψ 2 = √ 5[ 1 2 (3u 2 1 − 1)], ψ 3 = √ 7[ 1 2 (5u 3 1 − 3u 1 )], . . . are orthonormal Legendre polynomials. If d = 2, both w 1 and w 2 are uniform distribution's over [−1, 1], and P = 2, then the model (1) is By (1), we have the main effect Let D = {u 1 , . . . , u n } ⊂ χ be a design and Y = (Y 1 , . . . , Y n ) T be the vector of computer outputs at the design points. By (1), Y = μ1 n + Xβ, where 1 n is an n × 1 vector of 1's, X is the model matrix with elements ψ α (u i ) in row i, and β is the q × 1 vector of the β α 's. We assume that P is chosen large enough so that q > n and D is chosen in such a way that X has full row rank. Let the prior for β be given by β|θ ∼ N (0, σ 2 S) (a normal distribution with mean 0 and nonsingular covariance matrix σ 2 S), where S depends on the parameter r, and θ = (μ, σ 2 , r T ) T is a vector of hyperparam-eters to be estimated. We introduce the parameter r to allow realistic modeling of β, as will be discussed in Section 3 (see also Joseph 2006). This is to ensure that the induced prior for Y generates well-behaved surfaces with high probability and can be tuned (with an estimate of r) to produce accurate posterior predictions. Let u 1 , . . . , u m be arbitrary points in χ , and Note that (5) is a degenerate normal distribution (which has a singular covariance matrix) for some choices of u 1 , . . . , u m and m large enough. However, due to the assumption that X has full row rank and S is nonsingular, Y |θ is a nondegenerate normal distribution, which is necessary to avoid existence of linear combinations of the Y i 's with zero variance that may be inconsistent with the data. By Theorem 1.2.11 (p. 12) of Muirhead (2005), which holds for the degenerate normal case, It can be seen that if m = n and u i = u i , i = 1, . . . , n, then Y 0 |(Y , θ ) = Y with probability 1 because setting X 0 = X in (6) and (7) gives E(Y 0 |Y , θ ) = Y and cov(Y 0 |Y , θ ) = 0. Thus, the SIP possesses the interpolation property. Choose m and u 1 , . . . , u m so that X 0 has full column rank. Then, since The degenerate normal posterior distribution of β given by (8) is needed for quantifying uncertainty in estimating the functional ANOVA and variance decompositions via (2) and (3). Consider the case of interpolating a function with two inputs at the four points D = {(0.24, 0.82), (−0.94, −0.63), (−0.11, −0.87), (0.99, 0.13)} with the SIP model (4). We find that X is a matrix with six columns and full row rank (rank four). Let S be the 6 × 6 identity matrix, m = 1, and u 1 = (u 1 1 , u 1 2 ). Then, −0.051, −0.092, −0.008). Thus, the SIP accurately captures the linear trend.
One possible choice of the parameter θ is its maximum likelihood estimate (MLE). Alternatively, we can use the maximum a posteriori estimate (MAPE) (see Robert 2007, p. 166). Let p(θ) be the prior density of θ . The MAPEθ of θ is obtained by maximizing (9) where p N (Y ; μ, ) denotes the pdf of the normal distribution with mean μ and covariance matrix . If p(θ) ∝ p(r), that is, p(μ, σ 2 |r) ∝ 1, it can be shown that for fixed r, setting μ and σ 2 toμ maximize p(θ|Y ). Thus, the MAPEr of r can be found by min- Apart from constants that do not depend on r, the profile function is In this article, point predictions of the output and prediction credible intervals are obtained from (6) and (7) by setting θ to its MAPE. Uncertainty in estimating (2) and (3) is quantified by a Monte Carlo simulation that generates random samples of β from β|(Y ,θ ) given by (8).
Until this point, we have treated P as a known quantity. We have found that as long as P is sufficiently large, point predictions and credible intervals will not be sensitive to the choice of P . However, since the number of arithmetic operations needed to compute X SX T increases with q, computation effort in implementing SIP increases with P . Thus, it makes sense to employ a not-too-large value of P that gives a sufficiently accurate model of the data. In this article, we select P using the procedure described in the next section.
In the uncertainty quantification literature, the stochastic collocation/nonintrusive approach estimates the coefficients in (1) with a computer experiment (Le Maître and Knio 2010; Xiu 2010). As mentioned in Section 1, the approach includes least squares, projection, and interpolation methods. In the first class of methods, the β α 's are estimated by the least squares or weighted least-square technique. Projection methods estimate β α via a discrete approximation of the projection of Y onto the space spanned by ψ α . The third class of methods estimates the coefficients by forcing (1) to interpolate the data. None of the existing methods allow quantification of uncertainty in estimating the β α 's.
It is clear from (5) that the SIP is a finite-dimensional GP model, where the term finite dimensional refers to the fact that (5) will be degenerated normal when m is large enough. Under some conditions, a GP can be written as an infinite series with certain orthonormal functions and independent normally distributed coefficients, which is a result commonly called the Karhunen-Loève theorem (Ghanem and Spanos 1991;Steinberg and Bursztyn 2004). However, the GP models commonly used in computer experiments have stationary positive definite covariance functions (see Currin et al. 1991;Rasmussen and Williams 2006, chap. 4). In contrast, the SIP is equivalent to a GP model with a nonstationary and nonpositive definite covariance function. More importantly, the SIP is a polynomial interpolator and it is perhaps the first such interpolator that is able to capture interpolation uncertainty.

CHOICE OF PRIORS FOR β AND r , AND SELECTION OF P
This section discusses the specification of the prior p(r) of r and the covariance matrix S of the prior β|θ ∼ N (0, σ 2 S). A procedure for selecting P is also described.
A key to making the SIP work in practice is to use a sparse parameterization of S that can be tuned to produce well-behaved low-curvature functions (which are frequently observed in practice) with high probability. A sparse parameterization of S is crucial to allow tractable and reliable estimation of the parameters of S. For simplicity, we assume that the β α 's are a priori independent. Since high-degree polynomial basis functions tend to contribute more to the curvature of the metamodel and higher order terms in Taylor expansions of functions tend to have decreasing importance, it makes sense to use a prior for β such that β α has larger variance than β α when α i ≤ α i (α i is the ith component of α and α i is the ith component of α ) for all i and α k < α k for some k. Moreover, since different inputs tend to have different degrees of influence on the output, it is sensible to use a prior for β α such that the decrease of the variance of β α with an increase in α i depends on i. Thus, a suitable prior for β α is where r i ∈ (0, 1). Note that if α i ≤ α i for all i and α k < α k for some k, then clearly d i=1 r α i i < d i=1 r α i i and that if r i is close to zero, then the ith input is inert. A similar prior for the regression coefficients for two-level factorial designs is derived by Joseph (2006). Other priors such as mixture priors may be used but they will not be considered in this article. Because orthonormal polynomials are used as basis functions in (1), the contribution of each term β α ψ α (u), α ∈ N + (d, P ) to the variance of the output is the square of its coefficient when u has density w. This makes the magnitude of all β α 's comparable so that the prior (12) can be used.
In this article, we use independent priors for μ, σ 2 , and r 1 , . . . , r d . We set p(μ) ∝ 1, p(σ 2 ) ∝ 1, and employ Beta priors for the r i 's. Thus, We have found that when data are scarce, it is better to use larger values of r so that the credible intervals have good coverage. Therefore, we choose p(θ ) ∝ d i=1 r i to pull the MAPEs of the r i 's toward 1 when there is not much data. Note that the prior for r carries important influence when the data are scarce.
We now present a procedure for selecting P . Assume that θ and P are a priori independent, and that P is uniformly dis-tributed on a finite set of values = {P min , P min + 1, . . . , P max }. Then, the posterior probability of (P , θ ) is given by where I (P ) = 1 if P ∈ and I (P ) = 0 otherwise. As discussed in Section 2, the MAPEθ (P ) of θ given P can be computed from (10) and (11). Thus, the MAPE of (P , θ ) can be obtained by the maximization of (P ,θ (P )) over all P ∈ , where (P , θ ) = −2 ln[p(Y |θ, P )p(θ)]. It is often difficult to specify P min and P max that reflects real prior knowledge, and it is time consuming to compute (P ,θ (P )) for large values of P . We propose setting P min to be the smallest P such that q ≥ 2n and X has full row rank, and P max to be the largest value of P that can be used without causing computational problems. In this article, we take P max to be the largest value of P that gives q ≤ 10 4 . To reduce the need to compute many (P ,θ (P )), we propose selecting P to beP , the smallest local minimum of (P ,θ (P )), instead of the MAPE. To findP , we compute (P min ,θ (P min )), (P min + 1,θ (P min + 1)), and so on until an increase in value of P by 1 increases (P ,θ (P )). This procedure frequently terminates quickly becauseP is often near P min . Note that if this procedure is used, setting P min to be as small as possible, that is, setting P min to the smallest P such that q > n, often do not give good results. If P min is set to the smallest P such that q > n, and if there is no basis function of degree P min + 1 with a strong effect, the estimation method would often choose P = P min . However, if P = P min , the degrees of freedom q − n will often be small, which can easily lead to underestimation of the uncertainty about the true function. Finally, we point out that comparing different estimation methods or priors for P is an area needing further research.

Designs for Uni-Dimensional Input Spaces
The Chebyshev points of the second kind in the interval [−1, 1] are given by These points are recommended for Lagrange interpolation (Phillips 2003;Trefethen 2013) and can be used as a design for SIP. This design tends to give SIPs with small maximum prediction variance compared with SIPs obtained from a design with n equally spaced points. To illustrate, Figure 1 plots the SIP point predictions and 95% credible intervals for the famous Runge function y(u) = 1/(1 + 25u 2 ) obtained with 11 Chebyshev points D 1 (computed from (15)) (1(a) and 1(b)), the design D 2 = {−1, −0.8, . . . , 1} (1(c) and 1(d)), and the design D 3 = {− 10 11 , − 8 11 , . . . , 10 11 } (1(e) and 1(f)). Figure 1(a), 1(c), and 1(e) shows the predictions of SIPs constructed with orthonormal Legendre polynomials (Legendre-SIP), that is, the ψ α 's in (1) are normalized Legendre polynomials. Figure 1(b), 1(d), and 1(f) shows the predictions of SIPs constructed with orthonormal Chebyshev polynomials (Chebyshev-SIP), that is, the ψ α 's in (1) are normalized Chebyshev polynomials. Note that the Legendre and Chebyshev polynomials, respectively. The es-timateP of P is 22 in all cases and the estimater 1 of r 1 ranges from 0.768 to 0.812. By comparing Figure 1(a), 1(c), and 1(e) and also Figure 1(b), 1(d), and 1(f), we see that the SIP point predictions (dashed line) are insensitive to the design. However, the predictions are less accurate and the credible interval width increases drastically near the endpoints ±1 when design D 2 or D 3 is employed instead of D 1 . The Lagrange interpolating polynomials, that is, polynomial of degree P = n − 1 that interpolates the data, for designs D 1 , D 2 , and D 3 are also plotted in Figure 1. We see that the Lagrange polynomial is very sensitive to the choice of design. It performs well when D 1 is employed but it gives wild predictions near the endpoints when D 2 or D 3 is used. This is the well-known Runge phenomenon. Thus, SIP is a good alternative to Lagrange interpolation when the choice of interpolation points is not fully under the control of the model builder. In Section S1 of the online supplement, we provide a heuristic explanation about why the Chebyshev points are better than equally spaced points for constructing SIPs.

Designs for Multi-Dimensional Input Spaces
In this section, we discuss the choice of an initial/exploratory design for multi-dimensional input spaces of the form [−1, 1] d . We have demonstrated that Chebyshev points are good for onedimensional interpolation. In a multi-dimensional input space, designs that are Cartesian products of Chebyshev points may be used. However, the number of design points can quickly become too large as d increases. Space-filling designs are commonly used to build GP emulators in moderate to high-dimension input spaces with small number of runs. These designs are model independent (not constructed to optimize criteria specific to a statistical model) and can achieve good spatial representativeness of the design region with relatively small number of runs compared to grid designs. Examples of popular spacefilling designs include maximin Latin hypercube designs (maximin LHDs) (Morris and Mitchell 1995) and Sobol sequences (Lemieux 2009). However, when d = 1, it has been demonstrated in Section 4.1 that equally spaced designs, which are space-filling designs, are poor choices for constructing SIPs. Nonetheless, it is not clear whether space-filling designs are poor choices for d ≥ 2. In this article, we shall propose a class of transformed space-filling design that combines some desirable features of Chebyshev points and space-filling designs. These designs can have any specified number n ≥ 2 of runs in d dimensions and they give one-dimensional projections that have distributions that converge to the asymptotic distribution of the Chebyshev points.
We propose the following general cosine transformed spacefilling design. Let D U be an n-point space-filling design such that each one-dimensional projection of the design converges to a uniform distribution on [0, 1] as n → ∞, that is, the proportion of points in any subinterval of [0, 1] is proportional to its length. Then, the cosine transformed space-filling design is given by where the multiplication by π and cosine function are applied elementwise to the D U matrix. This design can be justified as follows. Let U ∼ unif([−1, 1]) and T = cos(Uπ). Then, for Thus, for t ∈ (−1, 1), the pdf of T is The density given in (17) is the asymptotic density of the Chebyshev points, that is, when n → ∞, the proportion of Chebyshev points in each subinterval of [−1, 1] converges to the value given by the integral of f (t) over the subinterval. It is known that if F : [−1, 1] → R is a function that can be analytically continued to a function that is analytic in a neighborhood of [−1, 1], then any system of points with the asymptotic density given in (17) will give a Lagrange interpolator ζ that converges at a geometric rate or faster to F when n → ∞, that is, ζ − F ∞ = O(K −n ) for some K > 1 (Trefethen 2013). In view of this and the discussion in Section S1 of the online supplement, we expect the proposed cosine transformed space-filling design to work well when the computer output can be analytically continued and the response variation is mainly due to a single input. In Section 5, we show that cosine transformed space-filling designs obtained by taking D U as a maximin LHD, which we call cosine maximin LHDs, outperform maximin LHDs for the purpose of constructing SIPs. Note that cosine maximin LHDs have been proposed by Dette and Pepelyshev (2010) for fitting GP models.
Sequential designs can be very useful in applications where a given degree of interpolation accuracy needs to be achieved and the computing cost is to be kept as small as possible. In Section S2 of the online supplement, we propose a sequential design approach to achieve good prediction over the entire design region χ .

SIMULATION STUDIES
This section reports the results of a simulation study. The objectives of the simulation are to compare the prediction accuracy of SIPs and GP emulators, to compare the performance of cosine maximin LHD and maximin LHD for constructing SIPs and to assess the accuracy of SIP in estimating sensitivity indices.

Simulation Study 1
We perform a simulation with two main objectives. The first is to compare the accuracy of the SIP and the stationary GP emulator with Gaussian and Matérn correlation functions. The second is to compare the effectiveness of the maximin LHD and cosine maximin LHD.
Six test functions, that is, Rosenbrock, Tilden, Weld Shear Stress, Two-Bar Truss Safety Constraint, Product Peak, and Borehole Flow Rate, with input dimensions ranging from three to eight are employed. The Rosenbrock test functions are a popular class of test functions in the optimization literature. These functions are polynomial functions of degree 4. In this study, we work with a Rosenbrock function in three dimensions. The Tilden function, which has four input variables, describes the behavior of a chemical reaction system over time (Saltelli 2000). Both Weld Shear Stress and Two-Bar Truss Safety Constraint functions are given in Mullur and Messac (2006). The former gives the logarithm of the shear stress in the welding joint of a structure while the latter is the amount by which the compressive stress exceeds the critical buckling stress in a Two-Bar Truss expressed as a fraction of the critical buckling stress. The Product Peak function is used as a test function in Barthelmann, Novak, and Ritter (2000) to numerically demonstrate the convergence of sparse grid interpolation. Finally, the Borehole Flow Rate function is introduced in Morris, Mitchell, and Ylvisaker (1993). Note that most of the test functions are real albeit simple physical models (Tilden, Weld Shear Stress, Two-Bar Truss Safety Constraint, and Borehole Flow Rate) while some of them are very popular test functions (Rosenbrock and Product Peak). All test functions are defined in Appendix A and their inputs are linearly transformed so that the design region is χ = [−1, 1] d .
We run simulations with a design size of n = 7d for all the test functions. For the Rosenbrock and Tilden functions, we also run simulations with design size n = 10d. Thus, there are a total of eight test cases. Two designs are tested: the maximin LHD and the cosine maximin LHD. In each simulation run, we first generate a good maximin LHD D U on [0, 1] d . This is done by using the Matlab function lhsdesign to generate 500 maximin LHDs and choosing the best (with respect to the maximin criterion) of the generated maximin LHDs. Note that the lhsdesign function generates maximin LHDs by randomly simulating a number of Latin hypercube designs and selecting the design that performs best. The maximin LHD D U is then transformed into a cosine maximin LHD cos [(1 − D U ) π ] and a maximin LHD 2D U − 1 on χ = [−1, 1] d . The reason for constructing the two designs in this fashion is to induce positive correlation (the former design looks like the latter design with points pushed toward the boundary of χ ) in the simulation results, which will allow more precise comparison. These two designs, that is, the cosine maximin LHD and the maximin LHD on χ , are used to construct SIPs and GP emulators. We have found that they always give a model matrix X with full row rank if P is chosen such that q ≥ 2n. In each simulation run, two different SIP models, that is, Legendre-SIP and Chebyshev-SIP, are built using each of the two designs. The Legendre-SIPs are obtained by taking the ψ α 's to be tensor products of orthonormal Legendre polynomials (similarly for Chebyshev-SIP). Stationary GP emulators with product Gaussian correlation function and product Matérn correlation function with smoothness parameter υ = 2.5 (Santner, Williams, and Notz 2003) are also fitted using both designs. Thus, a total of eight metamodels are constructed in each simulation run, as indicated by the model-design combinations presented in Table 1. For the GP emulators, the mean, variance, and correlation parameters are estimated using the maximum likelihood method. Details of the optimization procedures used to obtain the MAPE of r for the SIPs and the MLE of the correlation parameters for the GP emulators are given in Appendix B. For each test case, we run 100 simulations. As mentioned above, results for the eight model-design combinations given in Table 1 are expected to be positively correlated within a simulation because any two combinations use either the same or positively correlated designs. In other words, each simulation run is a random block. This allows more precise comparison of the performance of the eight model-design combinations. The simulation mean of the estimateP of P is between 3 and 6.2 for the four SIP model-design combinations in all test cases. One of the performance measures computed in the simulations is the percent root mean squared prediction error (PRMSE). The PRMSE is the square root of the average squared prediction error in a test set expressed as a percentage of the standard deviation of true output values in the test set. In the simulations, the test set consists of points from a 1000-run maximin LHD (generated with Matlab lhsdesign and different for each simulation run) and all 2 d corner/factorial points. Table 2 gives the simulation mean and standard deviation of the PRMSE for each modeldesign combination (labeled as A-H according to Table 1) and test case. Four different mean comparisons are of interest. The first is the PRMSE of SIPs constructed using maximin LHDs versus the PRMSE of SIPs constructed using cosine maximin LHDs, which is tested with the (paired) t-statistic T 1 for the contrast (A + C)/2 − (B + D)/2. The second is the PRMSE of Legendre-SIP versus the PRMSE of Chebyshev-SIP, which is tested with the t-statistic T 2 for the contrast (A + B)/2 − (C + D)/2. The third is the PRMSE of GPs with Gaussian correlation function versus the PRMSE of SIPs constructed using cosine maximin LHDs, which is tested with the t-statistic T 3 for the contrast (E + F)/2 − (B + D)/2. The fourth is the PRMSE of GPs with Matérn correlation function versus the PRMSE of SIPs constructed using cosine maximin LHDs, which is tested with the t-statistic T 4 for the contrast (G + H)/2 − (B + D)/2. The t-statistics are given in Table 2. It is seen that except for the Tilden function with n = 28, and the Weld Shear Stress function, the T 1 statistic is significant and positive (>2). Thus, we may conclude that cosine maximin LHDs are generally better for constructing SIPs than maximin LHD. Table 2 indicates that the reduction in PRMSE achieved with cosine maximin LHD can be substantial. The T 2 statistic is significant and positive in one case and is significant and negative (<−2) in three cases. Thus, the Legendre-SIP appears to be better than the Chebyshev-SIP but only slightly as indicated by the mean PRMSE values. Except for the Rosenbrock function with n = 21 and the Product Peak function, the T 3 and T 4 statistics are significant and positive. This conclusion does not change if we compare only with GPs fitted with the cosine maximin LHD (i.e., if we redefine T 3 to test F − (B + D)/2 and T 4 to test H − (B + D)/2), which Dette and Pepelyshev (2010) argued is superior to the maximin LHD. Thus, the GP emulator with Gaussian and Matérn correlation functions generally has poorer prediction accuracy than SIPs built with cosine maximin LHDs. In fact, Table 2 indicates that the reduction in PRMSE achieved with SIPs fitted with cosine maximin LHD over GP emulators can be quite large.
The simulation means and standard deviations of the percent coverage of 95% credible intervals given by the SIPs and GP emulators are shown in Table 3. The percent coverage is the percentage of credible intervals in the test set that contains the true output value. The credible intervals for the SIPs are obtained from a normal distribution with mean and variance given by (6) and (7) while the credible intervals for the GP models are obtained from a t-distribution with n − 1 degrees of freedom, as described in Santner, Williams, and Notz (2003). Table 3 shows that the SIP and the GP model with Matérn correlation function have much closer-to-nominal coverage than the GP model with Gaussian correlation function (compare A, C, E, G for maximin LHD and B, D, F, H for cosine maximin LHD). The table also shows that in many cases, the SIP has closer-tonominal coverage on average than the GP model with Matérn correlation function.
It is of interest to determine whether cosine maximin LHDs can give smaller maximum prediction variance than maximin LHDs, because it is demonstrated in Section 4.1 that Chebyshev points give smaller maximum prediction variance than evenly spaced points when d = 1. It is also of interest to compare the average prediction variance over the design region achieved with the two designs. Table 4 presents the simulation means of the maximum and average standard deviation in the test set expressed as a fraction of the range of the true function values in the test set for cases A-D. Expressing the maximum and average standard deviation as a fraction of the range allows us to assess whether the standard deviation given by (7) is of reasonable magnitude. We see that the maximum and average standard deviations are small (around 0.1 or less) compared to the range of the function for most cases. We have observed that the maximum standard deviation over χ often occurs at the corner points. Because the test set always include these points, the maximum standard deviations over the test set and χ are likely to be nearly the same in most simulation runs. Table 4 also gives the t-statistics for testing the difference between cosine maximin LHD and maximin LHD under the columns headed A-B and C-D. For five of the test cases (Rosenbrock with n = 30, Tilden with n = 28, 40, Weld Shear Stress, and Product Peak), the maximum standard deviation is significantly smaller on the average when the Legendre-SIP or Chebyshev-SIP is constructed with a cosine maximin LHD instead of a maximin LHD. For two test cases, the opposite occurs. However, cosine maximin LHDs give a larger average standard deviation than maximin LHDs in many cases. In summary, SIPs constructed with cosine maximin LHDs has better accuracy than the stationary GP models with Gaussian and Matérn correlation functions. In addition, SIPs give prediction credible intervals with better coverage than the GP model. The cosine maximin LHDs are better for constructing SIPs than maximin LHDs because they generally give more accurate SIPs and in some cases, they yield SIPs with smaller maximum prediction variance.

Estimation of Total Sensitivity Indices
Because estimation of variance decompositions and sensitivity indices is a key application of polynomial chaos mod- y (i,j ) + · · · + y (1,...,d) denotes the functional ANOVA decomposition of y (and similarly for the model Y in (1)), then it can be shown that where ( Y − y w ) 2 = χ (Y (u) − y(u)) 2 w(u)du. Thus, if Y is a good approximation of y, then all the functional ANOVA components of Y given by (2) Monte Carlo samples from the posterior distribution of β (8) can be used to compute the posterior mean and quantiles of T i . We use the data from the simulation reported in Section 5.1 to assess the accuracy of the total sensitivity indices estimates for the Tilden and Weld Shear Stress functions. Assume that w is the uniform distribution so that the Legendre-SIP is the appropriate SIP to use. In each of the 100 simulation runs, we simulate 1000 random samples from the posterior distribution of β for the Legendre-SIP constructed with a cosine maximin LHD to get 1000 random samples of T 1 , . . . , T d . These samples are then used to determine the posterior mean and 95% credible intervals for T i . The simulation means of these point and interval estimates are given in Table 5. We also obtain Monte Carlo estimates of the total sensitivity indices using the soboljansen program in the sensitivity package of R with two samples of 50,000 uniform random numbers. The Monte Carlo estimates are presented in the last row of Table 5. We see that the SIP and Monte Carlo point estimates are in close agreement. Moreover, the mean credible intervals obtained with SIP are quite narrow.

CONCLUSION
This article proposes a new interpolator based on orthonormal polynomials for computer experiments, called SIP. It models the true output with a polynomial of degree P large enough so that there are more polynomial coefficients than observations. An intuitively reasonable default prior for the model coefficients is proposed. The posterior normal distribution of the output conditioned on hyperparameters allows statistical inference on the output to be easily made. The hyperparameters are set to their MAPEs and a simple method for selecting P is proposed.
It is shown that for univariate interpolation with SIP, Chebyshev points give small maximum prediction variance. It is also shown that the posterior mean of the output is not sensitive to the design, unlike Lagrange interpolation. For multivariate interpolation, we employ the cosine maximin LHDs to construct SIPs. It is demonstrated that SIPs constructed with these designs generally have better prediction accuracy than SIPs constructed with maximin LHDs. Simulation results also demonstrate that SIPs constructed with cosine maximin LHDs tend to provide more accurate predictions and credible interval coverage than GP models. A key computational advantage of SIP over GP models is the computation of the functional ANOVA and variance decompositions for global sensitivity analysis. For this application, we have demonstrated that the SIP gives reliable estimates of sensitivity indices.
A disadvantage of the SIP is that it cannot model some functions in high dimensions well. The computer memory and speed limits the maximum number of basis functions q that can be employed. Thus, the maximum P that can be used decreases with d. Consequently, large P 's cannot be used when d is large. However, few, if any, metamodels can successfully approximate complex high-dimension functions with small designs. Many methods can be successful when a high-dimension function has a handful of significant inputs. In these cases, the modeling problem essentially reduces to a low-dimensional modeling problem. Further research is needed to determine how best to model this kind of functions with SIPs.

SUPPLEMENTARY MATERIALS
Additional Material and Appendices.pdf: This file contains Supplementary Section 1: Heuristic Explanation of Good Performance of SIP Fitted with Chebyshev Points, Supplementary Section 2: Design of Follow-Up Runs, and Appendices A, and B.
Programs and Data.zip: The zip file contains Matlab codes for generating cosine maximin LHDs and for constructing Legendre-SIPs and Chebyshev-SIPs. Matlab codes for generating data from the test functions employed in this articles, and for fitting the GP model with Gaussian correlation function are also given.