Monotonic Metamodels for Deterministic Computer Experiments

In deterministic computer experiments, it is often known that the output is a monotonic function of some of the inputs. In these cases, a monotonic metamodel will tend to give more accurate and interpretable predictions with less prediction uncertainty than a nonmonotonic metamodel. The widely used Gaussian process (GP) models are not monotonic. A recent article in Biometrika offers a modification that projects GP sample paths onto the cone of monotonic functions. However, their approach does not account for the fact that the GP model is more informative about the true function at locations near design points than at locations far away. Moreover, a grid-based method is used, which is memory intensive and gives predictions only at grid points. This article proposes the weighted projection approach that more effectively uses information in the GP model together with two computational implementations. The first is isotonic regression on a grid while the second is projection onto a cone of monotone splines, which alleviates problems faced by a grid-based approach. Simulations show that the monotone B-spline metamodel gives particularly good results. Supplementary materials for this article are available online.


INTRODUCTION
When a computer model is time-consuming to run, a metamodel can be constructed to serve as a quick-to-evaluate approximation. In many engineering problems, it is known from physical considerations that the output is a monotonic function of some of the inputs. For example, the strength of a welding joint is increasing in the size of the weld. The critical buckling stress of a two-bar truss increases with the diameter and Young's modulus of the truss members. When a solid body is immersed in a fluid with a higher temperature, the temperature at any point in the body increases with time and the convective heat transfer coefficient. In these cases, a metamodel that uses the monotonicity information will tend to be more accurate and interpretable than a nonmonotonic metamodel. Moreover, prediction intervals constructed with a monotonic metamodel tend to be tighter because nonmonotonic predictions are excluded. Unfortunately, monotonicity information is rarely exploited in computer experiments. This article proposes statistical approaches for building monotonic metamodels in deterministic computer experiments. For deterministic computer experiments with more than one input, there are three existing methods that can be employed to construct monotonic metamodels that give both point and interval predictions. The first is the L2 projection approach proposed by Lin and Dunson (2014), which projects the GP sample paths onto the cone of monotone functions. Although Lin and Dunson (2014) discussed the possibility of a weighted projection approach, they focused on the unweighted projection. The second is the rearrangement approach proposed by Chernozhukov, Fernandez-Val, and Galichon (2009). This approach transforms the point predictor and credible limits of a GP into quantile functions induced by a uniform distribution for the inputs. Computational implementation of both approaches re-quires working with a grid in the input space. Thus, predictions at points not on a grid require a monotonic interpolator that interpolates the predictions on the grid. Selection of grid and interpolator is important to achieve small discretization error without unnecessarily high computational requirements. Lin and Dunson (2014) and Chernozhukov, Fernandez-Val, and Galichon (2009) did not address this problem. Predictors obtained from the projection and rearrangement procedures can have undesirable characteristics. These include interval predictions that are wide at the design points, point predictions with sharp corners (nondifferentiability), and prediction interval limits with sharp corners at nondesign points. Interval predictions that are wide at the design points are undesirable for deterministic data. Nondifferentiability is undesirable when the true function is known to be differentiable. The third method is the minimax predictor proposed by Beliakov (2005). This predictor often yields poor prediction accuracy, and prediction bounds that are too narrow at some regions and too wide elsewhere.
Other approaches for exploiting monotonicity information in computer experiments were proposed by Riihimäki and Vehtari (2010), Wang (2012), and Golchi et al. (2015). However, these methods only encourage the derivative of the GP to be positive at a finite set of points. The GP can be highly nonmonotonic if the set of points is small but computation can be intractable otherwise. There is a vast literature on monotonic regression but most of the methods are for observational data with large random errors and one or two independent variables. Important methods include the pooled adjacent violators (PAV) algorithm (Barlow et al. 1972), I-splines (Ramsay 1988), B-splines (Brezger and Steiner 2008;He and Ng 1999), Bernstein bases (Tan 2016), and monotonic Kernel regression (Du, Parmeter, and Racine 2013). In deterministic computer experiments, the output is observed without random error. Because of this, the output is known at the design points and there is less uncertainty about the response near the design points due to the known continuity or differentiability of the function. Motivated by this observation, this article proposes a weighted projection approach to construct monotonic metamodels from GP models. The weights are the inverse of the GP variances truncated to avoid infinite values. Thus, higher weights are given to regions near design points and the weighted projection would tend to be closer to the GP near the design points than the unweighted projection. Two computational implementations are considered. The first method is weighted isotonic regression on a grid. We propose a monotone interpolator and a grid selection method that aim to achieve low discretization error with a small number of grid points. In the second method, the sample paths of the GP model are projected onto a cone of monotonic spline functions. The method constructs a monotone B-spline model that approximates the weighted projection of the GP with a small number of model terms. The model is differentiable if the degrees of the B-spline bases are greater than one and is well defined everywhere. The proposed methods are computationally tractable and provide point and interval predictions for problems with several inputs. For both methods, the point and interval predictions can be forced to nearly interpolate the data points for problems with one and two inputs using small convergence tolerances. In higher dimensions, it is difficult to achieve small prediction interval lengths at the data points without incurring high computational burden. Thus, both methods are tuned to yield noninterpolating interval predictions. However, the point predictions are often nearly interpolating and the methods produce better accuracy and shorter prediction intervals than the unweighted projection.
The remainder of the article is organized as follows: Section 2 reviews GP modeling. Section 3 describes the projection approach introduced by Lin and Dunson (2014). The section also presents the weighted projection approach and a grid-based computational implementation using isotonic regression. In Section 4, we propose the monotone B-spline metamodel. A real example is given in Section 5 and conclusions are given in Section 6.

REVIEW OF GAUSSIAN PROCESS MODELS
In GP modeling, the prior for the functional relationship F : [0, 1] d → R between output and inputs is given by a continuous GP: where G(x) is a zero mean stationary GP. Given two points x i and x j , the covariance of Y ( where R is the correlation function. In a computer experiment, the output is evaluated at n values of inputs in the design This yields a vector Y = Y 1 , . . . , Y n T of observed outputs. The prior process is updated with the data, giving a posterior GP (Sacks et al. 1989;Currin et al. 1991) with mean function μ (· | λ, θ ) and covariance function C ·, ·| σ 2 , θ . The mean function is where 1≤j ≤n , and 1 is an n × 1 vector of 1's. The covariance function is given by The mean and covariance functions given by (3)-(4) depend on the parameters λ, σ 2 , and θ , which can be estimated from the data. One approach to estimating λ, σ 2 , and θ is the maximum likelihood method. It turns out that given θ , It is common to perform statistical inference on F usingθ in place of θ ,λ in place of λ, andσ 2 = n n−1σ 2 (an unbiased estimator of σ 2 ) in place of σ 2 , that is, using the GP Y (·)| Y ,λ,σ 2 ,θ ∼GP μ ·|λ,θ , C ·, ·|σ 2 ,θ . (6) This can be viewed as an empirical Bayes approach. In GP modeling of computer experiments, it is common to employ the product Matérn correlation function. In this article, we employ the product Matérn correlation function with smoothness parameter 1.5: where x 1 = (x 11 , . . . , x 1d ). This correlation function gives once differentiable sample paths.

PROJECTION AND WEIGHTED PROJECTION APPROACHES
In this section, we review the projection approach proposed by Lin and Dunson (2014) and introduce a weighted modification of their approach. The weighted isotonic regression on a grid method for implementing the proposed approach is discussed. We address two important implementation issues not addressed by Lin and Dunson (2014): choice of grid and interpolator. For simplicity, we consider building metamodels that are increasing in all inputs. It is straightforward to modify the approaches in this article to construct metamodels that are increasing in some of the inputs only. Let M denote the closed convex cone of all increasing functions on [0, 1] d and Y denote the continuous posterior process given by (6). Lin and Dunson (2014) proposed a monotonic metamodel that is the unweighted projection of Y onto M: where There are a few problems with this approach for deterministic computer experiments. First, it does not account for the fact that the GP model is more informative about the true function at locations near design points than at locations far from design points. The projection minimizes the integrated squared error, which gives uniform weights to deviations from Y everywhere in [0, 1] d . However, we know that Y predicts more accurately near design points. Thus,P M Y may yield credible intervals that look wide at the data points x 1 , Y 1 , . . . , (x n , Y n ). This phenomenon is seen in Figure 1(b). Second, computational implementation of the approach involves performing isotonic regression on posterior GP sample paths on a grid, which needs to be selected. For prediction at locations not on the grid, a monotonic interpolator needs to be used. However, these issues are not addressed by Lin and Dunson (2014). While fine grids can be used in problems with one and two inputs, they are infeasible for higher dimensional problems. When coarse grids are employed, the choice of grid points and the choice of the interpolator can have substantial impact on prediction accuracy and credible interval coverage.
The first problem stated above can be overcome theoretically by using a weighted projection with weight function that is inversely proportional to the posterior variance to construct a monotonic metamodel. The weighted projection of Y is where , B}, and B is a small positive number. We do not set w (x) equal to the inverse of the posterior variance var [Y (x)] to avoid infinite weights at the design points x 1 , . . . , x n . Since w (x) is very large (and maximized) at x 1 , . . . , x n , the weighted projection P M Y should tend to nearly interpolate the data points x 1 , Y 1 , . . . , (x n , Y n ). This can be seen in Figure 1(e).
In practice, only a discrete approximation of P M Y can be computed. Suppose that the values of Y on an L 1 × · · · × L d grid are denoted by {Y (i 1 , . . . , i d ) : i j = 1, . . . , L j , j = 1, . . . , d}. Then, the values of P M Y on the grid are approximated by minimizing the weighted sum of squared discrepancies , B}, subject to the constraint that f (i 1 , . . . , i d ) is increasing on the grid. This is the well-known isotonic regression problem. For the case of a single input, that is, d = 1, the solution can be efficiently obtained by applying the PAV algorithm (Barlow et al. 1972). An efficient way to compute the solution when d > 1 is the multivariate isotonic regression algorithm proposed by Dykstra and Robertson (1982). The algorithm, which is given in Appendix A, iteratively applies the PAV algorithm to obtain increasing predictions with respect to each input. To construct point and interval predictions, the algorithm is applied to a number of simulated GP sample paths. The mean of the projected sample paths is taken as the point prediction, and the tail quantiles are used to construct credible intervals.
For prediction at points not on the grid, Beliakov's (2005) interpolator can be used but it gives poor results. We use the following interpolator. If prediction is desired at a point x = (x 1 , . . . , x d ), we find the smallest hypercube with grid points as corners (which we call a cell) that contains x. We then transform the inputs so that the cell is [0, 1] d . Denote the transformed input by (z 1 , . . . , z d ) and the isotonized values on the corners of the cell byŶ (z 1 , . . . , z d ). Then, we predict at x using the model constructed from Bernstein bases Note that (10) interpolates the isotonized values at the corners of the cell. Due to the monotonicity preserving property of the Bernstein bases (Floater and Peña 1998), (10) is also increasing. It can be shown that the metamodel obtained with the above procedure is a continuous and increasing function of x. However, it is nondifferentiable.
We select the grid as follows: Start with an Increase L 1 by one and observe the relative change in prediction on a test set D t , that is, where relative change is defined as the mean absolute difference between predictions obtained with L 1 +1 equally spaced values and L 1 equally spaced values for the first input divided by the range of predictions in the former case. This process is repeated for L 2 , . . . , L d . Suppose that changing L j to L j + 1 gives the maximum mean relative change for the sample paths. Then, we reset the value of L j to L j + 1. If this maximum change is small, then the procedure is stopped. Because it is time consuming to select the grid based on many simulated sample paths, we select a grid based on isotonizing the posterior mean, the 0.01 quantile, and 0.99 quantile of the posterior GP. The grid refinement is stopped when the mean relative change in the projection of these three functions is less than 0.005. Note that the tail quantiles tend to be more nonmonotonic than most sample paths. This tends to lead to the selection of a finer grid than necessary. The selected grid is employed for prediction by isotonizing a number of simulated sample paths and predicting with the interpolator (10).

MONOTONIC METAMODELS BASED ON B-SPLINES
This section proposes a monotonic metamodel obtained by projecting the GP onto a closed convex cone of monotone splines. The monotone spline metamodel approximates the projection P M Y with B-spline bases that are usually small in number compared to the grid points employed in Section 3. Construction of the metamodel tends to be faster than the grid-based approach. Moreover, the metamodel is differentiable and welldefined everywhere.
In Section 4.1, we introduce the basic idea of the proposed approach. It consists of projecting Y onto a closed convex cone of monotone splines defined by tensor spline bases and a set of linear constraints. Because the number of basis functions for tensor spline spaces grows very fast with the dimension of the input and the number of knots, we propose a method to reduce the number of basis functions in Section 4.2, which is based on modifying the B-spline bases. The tensor product of the new basis functions decomposes into functional analysis of variance (ANOVA) components, which allows us to build a model with only significant components. In Section 4.3, we give details on the proposed algorithm for finding a closed convex cone of monotone splines such that the projection of Y onto the cone approximates P M Y well.

Basic Idea
This section presents the spline model and linear constraints on the model coefficients used to achieve monotonicity. We also give a key result, which states that the projection of a continuous GP onto the cone of monotonic spline functions converges in mean square to the projection of the GP onto the cone of monotonic functions as the knots get dense.
B-splines are widely employed for function approximation because they have many useful properties (De Boor 2001). A useful property of these bases is that it is easy to enforce monotonicity with respect to the inputs.
To approximate a function defined on [0, 1] d , we can employ the tensor product bases where It follows from the results in Floater and Peña (1998) that the tensor product bases preserve axial monotonicity, that is, if c (j 1 , . . . , j d ) is increasing in j i for all (j 1 , . . . , j i−1 , j i+1 , . . . , j d ), then f ( x| c) is increasing in x i . When K = 1 or when K = 2 and d = 1, the converse also holds: if f ( x| c) is increasing in x i , then c (j 1 , . . . , j d ) is increasing in j i for all (j 1 , . . . , j i−1 , j i+1 , . . . , j d ). Thus, the spline function f ( x| c) is increasing in x 1 , . . . , x d if c satisfies the linear constraints: This article employs bases of degrees K = 1 and K = 2. For K = 2, (11) is continuously differentiable on (0, 1) d (because the multiplicity of each interior knot is one). Let S = {f ( x| c)} be the space of functions of the form (11), E be the set of all c that satisfies (12), and R = {f ( x| c) : c ∈ E} ⊂ S, which is a closed convex cone of increasing spline functions. Note that S is a proper subset of the space of all continuous functions. Moreover, for K = 2 and d > 1 or when K > 2, R is a proper subset of S ∩ M. However, Theorem 1 implies that any continuous function can be approximated arbitrarily closely by a function in S and any continuous increasing function can be approximated arbitrarily closely by a function in R as the Cartesian where Then, it is easy to see thatf M (x) ∈ R. It seems that P M F should be continuous when F is continuous. This is shown for d = 1 by Groeneboom and Jongbloed (2010). Lin and Dunson (2014) assumed that P M F is continuous even when d > 1 without giving a proof. The proof by Groeneboom and Jongbloed (2010) seems difficult to generalize to the case where d > 1. However, since the set of discontinuities of P M F must have Lebesgue measure zero (Lindner 2004) and P M F is bounded for continuous F, Theorem B gives lim Theorem 2. Let F : [0, 1] d → R be discontinuous on a set of Lebesgue measure zero. Assume that F is bounded. Definẽ f as in Theorem 1. Then, lim h→0 F −f 2 w = 0. It is known thatf is a poorer approximation of F than P S F , where P S is the projection operator that projects a function onto S. Thus,f M should be a poorer approximation of P M F than P R F , where P R projects a function onto the closed convex cone R. Thus, we propose to employ the monotonic metamodel P R Y , which is obtained by minimizing subject to the constraints (12). We now show that lim The first inequality is because R ⊂ M while the second inequality follows from the fact that P R P M Y ∈ R. Let f M be given by (14) It is too time consuming to find a different R for each sample path Y to make P M Y − P R Y 2 w small to obtain monotonic predictions. Thus, we would like to find one closed convex cone R (defined by S and (12)) such that P M Y − P R Y 2 w is small on average. This is indeed possible as implied by Theorem 3, whose proof is given in Appendix B.
Theorem 3. Let Y be a continuous GP with continuous covariance function and R be the cone of spline functions that satisfy (12). Then, In practice, we need to choose the knots so that is sufficiently small. The knots define S, which together with (12), define R. However, it seems that the quantity , we propose to increase the number of knots until the change in P R Y is small. In particular, if R 1 , R 2 , . . . is a sequence of monotone spline cones obtained by increasing the number of knots such that h → 0, then we can select the cone R k given by the smallest k such that w ] is sufficiently small.

B-Spline Model Selection
Model (11) allows us to easily construct a metamodel that is monotonic in any input. We only need to estimate the coefficients c (j 1 , . . . , j d ) subject to linear constraints for each simulated sample path Y. However, the number of coefficients d i=1 (n i + 1) is often very large. A variable selection approach cannot be used to reduce the model size because the c (j 1 , . . . , j d )'s must be increasing in j i to ensure that f is monotonic with respect to input i.
We propose the following method to reduce the number of bases. The idea is to employ the modified B-spline bases: This idea is similar to the idea of the modified Bernstein bases in Tan (2016). Note that since . . , u i n i } have the same span. Thus, the tensor product bases . . , n k ; k = 1, . . . , d} have the same span. This implies that f can be rewritten as By equating (11) and (19), we find that (see Appendix C) Thus, monotonicity in the ith input is achieved by imposing the linear constraints: Note that (19) immediately gives a functional ANOVA decomposition, that is, it decomposes into a sum of a constant a (0, . . . , 0), terms that depend only on a single input terms that depend only on two inputs and so on. We call (22) a main effect, and (23) a two-factor interaction. These are called functional ANOVA components.
In contrast, each basis d i=1 u i j i is a d-factor interaction. In many cases, only a handful of main effects and interactions have significant influence on the response. This is called the effect sparsity principle (Wu and Hamada 2009). Other spline models with main effects and low-order interactions have been proposed (Wahba 1986;Reich, Storlie, and Bondell 2009). Model (19) with only nonzero main effects and twofactor interactions can contain far fewer terms than model (11). If n 1 = · · · = n d = η, then the former model contains 1 + ηd + η 2 d (d − 1) /2 terms but the latter contains (η + 1) d terms. The number of constraints in (21) is (η + 1) d−1 η but is reduced to (η + 1) d−1−k η when input i does not interact with k other inputs. In this article, we fit models of the form: where ⊂ {(1) , . . . , (d) , (1, 2) , . . . , (1, . . . , d)} and f (i 1 ,...,i e ) is the interaction between inputs i 1 , . . . , i e . This amounts to setting some of the coefficients, that is, the a (j 1 , . . . , j d )'s, in (19) to zero. Monotonicity with respect to input i is achieved by imposing the constraints (21).
A referee suggested using triogram splines (Hansen, Kooperberg, and Sardy 1998). However, the triogram splines are bivariate splines. Thus, a triogram spline approximation cannot converge to a function with nonzero three-factor or higher order interactions. Since the GP model with a product correlation function has nonzero high-order interactions (although they may be small), Theorem 3 cannot hold for triogram splines. However, triogram splines can be generalized to splines on simplices (Neamtu 2001). This is a problem for further research.

Algorithm
This section describes the algorithm we employ to construct a monotonic B-spline model that is the weighted projection of the GP Y onto a selected cone of monotone splines. The algorithm increases the number of knots and adds functional ANOVA components into the model (24) in such a way that the change in the monotone spline predictions on a set of points is maximized. When the maximum change is small, the algorithm terminates.
Let R 1 , R 2 , . . . be a sequence of monotone spline cones, each defined by the spline knots ζ i K , . . . , ζ i n i +1 , i = 1, . . . , d, choice of in (24), and constraints (21). Practical computation of P R k Y involves using a set of points to approximate the squared norm 2 (21) for all i = 1, . . . , d, for some chosen points {x 1 , . . . , x N } ⊂ [0, 1] d . In this article, we set the first n points of {x 1 , . . . , x N } to be points from the experiment design and the remaining N − n points to be from a Sobol sequence (Niederreiter 1992). We denote the discrete projection onto : (21) holds for all i = 1, . . . , d} .
As the number of knots and selected functional ANOVA component increases, the number of model terms increases. Thus, N must increase also to ensure thatâ k is uniquely defined.
The quantity where {x t 1 , . . . , x t m } = D t ⊂ [0, 1] d is chosen to be a maximin Latin hypercube design (MLHD). We can use R k to construct the monotone spline metamodel when the average of α k (Y ) over a number of simulated sample paths Y is small. However, it is time consuming to simulate many sample paths and project them onto R k . In this article, we set Y equal to the mean, pointwise 0.99 quantile, and pointwise 0.01 quantile of the GP, which we denote as Y 0.5 , Y 0.99 , and Y 0.01 , respectively. We use the criterion , to select the knots and the functional ANOVA components. In (27), α k (Y 0.5 ) + α k (Y 0.99 ) + α k (Y 0.01 ) is expressed as a percentage of the sum of variances of Y 0.5 , Y 0.99 , and Y 0.01 .
In this article, we employ equally spaced knots The sequence R 1 , R 2 , . . . is constructed as follows: Given R k , we increase m i by one, which gives a tentative choice for R k+1 , and we compute the value of α k+1 for this tentative choice. This process is repeated for all i = 1, . . . , d. We also add each functional ANOVA component not in the model into the model (each addition gives a tentative choice for R k+1 ) and compute the value of α k+1 . If the increase in m i gives the maximum α k+1 , then this increase is adopted to define R k+1 . If the addition of a functional ANOVA component into the model gives the maximum α k+1 , then this addition is adopted to define R k+1 . The choice of an R k+1 that maximize α k+1 (which is a surrogate is sensible because this choice has the most significant influence on the predictions given by the monotone spline metamodel. Note that the tail quantiles tend to be more nonmonotonic than any simulated sample path. Thus, using α k as criterion may lead to the selection of more knots and functional ANOVA components than necessary. The algorithm we use to construct the monotone spline metamodel is given below: Step 3. 7. Simulate the GP Y on {x 1 , . . . , (21), which givesâ. Repeat r times. The sample mean of f ( x|â) is used as point predictor whereas the sample tail quantiles are used as prediction interval limits.
Remark 1. We require the model size M to be at least 2n because good prediction at the data points would require a model of around n terms and additional terms are needed to quantify uncertainty about the true function so that credible intervals can be constructed.
Remark 2. The algorithm increases N as the model size increases because N must be larger than the model size so that the model can be estimated reliably via weighted least squares. Thus, we need N ≥ M max . Because the minimum model size that we allow is 2n, we set N = max {3M max , 2n}. The constant 3 seems to work well in practice.
The parameter T controls the accuracy with which the monotone spline model approximates P M Y while the parameter B controls closeness of P M Y to the experiment data. To construct an interpolating metamodel, we need to use a small B, a small T, and also include the design points in the training set {x 1 , . . . , x N } used to fit the monotone spline metamodel. Very small values of B and T can be used when d = 1. However, such values can cause computational difficulties for d > 1 because the model size can become too large. For fixed B, the model size will increase as T gets smaller. For fixed T, the monotonic metamodel's predictions at the design points will get closer to the experiment data as B gets smaller. Although accuracy improves and the metamodel gets closer to interpolating the data as B gets smaller, the prediction interval coverage gets poorer. The value of B we recommend is B = max{var(x t 1 ), . . . , var(x t m )}/20 for d > 1 based on results in Example 2. With this choice of B, the ratio of the maximum to the minimum weight w(x) on [0, 1] d is approximately 20. This seems to give good coverage and accuracy for T ≈ 1.
Given the selected spline knots and functional ANOVA components, the monotonic metamodel f (x|â) is constructed in Step 7 by simulating values of the GP Y on the set of training points and minimizing N l=1 (21). The main computational burden in the algorithm is the process of solving the quadratic program (25) when the vector of model coefficients a is large, or when the constraints (21) are required to hold for many inputs because the program would then contain many decision variables or constraints. In Step 4 and Step 5, we solve the quadratic program three times but the steps may need to be repeated many times. In Step 7, we need to solve the program several hundred times to obtain accurate estimates of prediction intervals.

EXAMPLES
This section provides examples to demonstrate the usefulness of our approach. We compare the two proposed methods to a few alternative methods. The first is the unweighted projection method proposed by Lin and Dunson (2014). The second is the average rearrangement estimator proposed by Chernozhukov, Fernandez-Val, and Galichon (2009). The rearrangement method monotonizes the posterior mean and credible limits of the GP on a grid. Neither Lin and Dunson (2014) nor Chernozhukov, Fernandez-Val, and Galichon (2009) discussed how the grid may be selected or which interpolator should be used. Thus, we adopt the grid selection method described at the end of Section 3 (which is used to implement the grid-based weighted projection method) to implement the unweighted projection and rearrangement methods. We use the interpolator (10) for both methods. We also compare the proposed method to Beliakov's (2005) interpolator. Beliakov's (2005) interpolator is constructed using eq. (3.3) of his article with Lipschitz constant estimated with his eq. (5.1).
For the unweighted projection and grid-based weighted projection methods, and monotone B-spline metamodel, the interval predictions are obtained with the 0.01 and 0.99 quantiles of 300 simulated sample paths while the point predictions are the sample mean. The choice of initial grid in the grid selection procedure is as follows: For d = 1, the initial grid is {0, 0.01, . . . , 1}; for d = 3, it is {0, 1/6, . . . , 1} 3 ; for d = 4, it is {0, 1/5, . . . , 1} 4 . We set the target T in the monotone spline algorithm to be 0.01 for d = 1, 1 for d = 3, and 3 for d = 4.

Example 1
In this example, we construct monotonic metamodels for an increasing function plotted in Figure 1. The design is D = {0, 0.25, 0.5, 0.75, 1}. In Figure 1(a), we plot the posterior mean and 98% credible intervals for the stationary GP model with the Matérn correlation function described in Section 2. The posterior mean is only slightly nonmonotonic. However, the credible intervals are highly nonmonotonic.
In Figure 1(b), predictions of the unweighted projection approach are plotted. We see that the interval predictions at the design points can be of nonnegligible width. This problem seems to be worse for the rearranged point predictions and credible intervals plotted in Figure 1(c). Consequently, the rearranged predictions look visually unappealing. Note that the point predictions for the unweighted projection and rearrangement approaches are nearly interpolating the data because the posterior mean of the GP is nearly monotonic. Beliakov's (2005) interpolator ( Figure  1(d)) gives poor interval predictions, which have zero width in the interval [0.25, 0.5]. Note that there will always be an interval between two design points with zero prediction interval width if the Lipschitz constant is estimated with eq. (5.1) in Beliakov (2005). Figure 1(e) plots the predictions obtained with the gridbased weighted projection method while Figure 1(f) plots the predictions obtained with the monotone B-spline metamodel (K = 2). We set B = max{var(x t 1 ), . . . , var(x t m )}/100 for both methods. We see that for both methods, the point and interval predictions nearly interpolate the data points.

Example 2: Displacement in a Mindlin Plate
Consider an isotropic flat plate of uniform thickness H > 0 under transverse loading Q (x, y). The plate is polygonal and simply supported. Assume a right-hand coordinate system (with z pointing downward). The Mindlin plate theory predicts the transverse displacement W (x, y) of the plate with two Poisson equations (chap. 7 of Wang, Reddy, and Lee 2000): where is the Laplacian operator, is the plane region of the plate, ∂ is the plate boundary, m is the Marcus moment, G > 0 is the shear modulus, and V ∈ [0, 0.5] is the Poisson ratio. Note that we set the shear correction factor to 5/(6 − V) following Liew et al. (1998 p. 24) and m is uniquely defined by (29). In particular, m does not depend on G, H, and V. It follows from physical considerations or the comparison theorem at p. 177 of Mattheij et al. (2005) that if m 1 (x, y) and m 2 (x, y) denote the Marcus mo-ment under loadings Q 1 (x, y) and Q 2 (x, y), respectively, then m 1 (x, y) ≤ m 2 (x, y) ∀ (x, y) ∈ whenever Q 1 (x, y) ≤ Q 2 (x, y) ∀ (x, y) ∈ . Similarly, the comparison theorem implies that W should be a decreasing function of the right-hand side of (30), that is, −Q (x, y) (6 − V) / (5GH) − m (x, y) / GH 3 / [6 (1 − V)] . Thus, W and the maximum displacement Y = max {W (x, y) : (x, y) ∈ } is an increasing function of Q, and a decreasing function of G, H, and V. Note that it is intuitively clear that W should decrease when H or G increases, that is, as the plate becomes thicker or more resistant to shear forces. However, the effect of V on W is not obvious to this author without the above reasoning. Finally, it is obvious from physical considerations that the maximum displacement Y is an increasing function of the length of the plate L if the plate is rectangular. This can be shown mathematically by rescaling x and y.
We solve (29)-(30) for a square plate with side length L and uniform loading Q = 10 6 using the finite element method implemented in the Matlab code given in chap. 12 of Ferreira (2008). A total of 30 × 30 = 900 Q4 elements are used. Note that the code takes the Young's modulus E = 2G (1 + V) instead of the shear modulus G as input. We transform the variables G, H, V, L to x 1 = − G − 30 × 10 9 / 10 × 10 9 , We perform simulations to compare the alternative metamodeling methods for approximating the finite element code. In the first set of 100 simulations, we fix x 4 = 0.5 and model Y as a function of x 1 , x 2 , and x 3 (d = 3). In the second set of 100 simulations, we model Y as a function of x 1 , x 2 , x 3 , and x 4 (d = 4). We set B = max var x t 1 , . . . , var x t m /20. In all simulations, a generalized maximin Latin hypercube design (GMLHD) proposed by Dette and Pepelyshev (2010) of size 7d is used to fit the GP model, which is then employed to construct the unweighted projection, grid-based weighted projection, rearrangement, and monotone B-spline (K = 1 and K = 2) metamodels. Beliakov's interpolator is constructed directly with data from the GMLHD. Each GMLHD is obtained by applying the quantile function of the arcsine density to transform each element of an MLHD.
For each of the two sets of simulation runs, we compute the mean squared error (MAE), root mean squared error (RMSE), average prediction interval length (average PI length), and coverage of nominal 98% intervals for each method on an MLHD test set D t of size 100d. The simulation means and standard deviations of these quantities for d = 4 are presented in Table 1. We only give results for d = 4 in the article because results for d = 3 (see Appendix D) lead to essentially the same conclusions regarding the relative performance of the alternative methods. Table 1 shows that Beliakov's interpolator gives the poorest prediction accuracy (MAE and RMSE) in all cases and also poor prediction interval length. The monotone spline metamodel with K = 2 provides better prediction accuracy than the other methods. It also gives the shortest prediction intervals. The average length of the intervals is only around 30% the average length of the credible intervals of the GP model. The average coverage of the monotone spline metamodel with K = 2 is 0.89, and it is only slightly lower than the average coverage achieved by the GP model, which is 0.95. It appears that the monotone spline metamodel with K = 1 is dominated by the monotone Table 1. MAE, RMSE, average PI length, coverage, model size, and MAE at design points for GP model (GP), monotone spline model (spline), grid-based weighted projection method (weighted), unweighted projection method (unweighted), rearrangement method (rearranged), and Beliakov's interpolator (Beliakov)  spline metamodel with K = 2 in terms of average accuracy and prediction interval length. Compared to the monotone spline metamodel with K = 2, the grid-based weighted projection method gives poorer accuracy, and longer intervals with better (closer to nominal) coverage. The grid-based weighed projection method gives better accuracy, shorter prediction intervals, and similar coverage compared to the unweighted projection method. Thus, the use of the weights in (9) indeed provides better results for deterministic computer experiments. The rearrangement procedure gives similar accuracy and better coverage than the grid-based weighted projection method but its prediction intervals are very wide.
The GP model, monotone spline model, grid-based weighted and unweighted projection methods, and rearrangement method give average coverage that is less than the nominal 0.98. However, the average coverages achieved by these methods are only slightly below the nominal. Note that the monotone spline metamodels and the grid-based weighted projection method are supposed to converge to the projection of the GP onto the cone of monotone functions, that is, P M Y . However, both methods yield quite different results. This is because the monotone spline metamodel uses a finite number of spline knots and is fitted with a discrete set of points, while the grid-based method uses a finite grid. Table 1 also gives the simulation means and standard deviations of the size of the monotone spline metamodel, and the grid sizes for the grid-based methods. The monotone spline metamodels (K = 1 and K = 2) have model sizes that are far smaller than the grid sizes of the grid-based methods. In fact, such large grid sizes are necessary. The initial grid is of size 1296 but the final chosen grids are far larger. Thus, the monotone spline metamodel is less memory intensive than the grid-based methods. In seven simulations, a single run for the monotone spline model with K = 2 took 44 to 169 sec with an average of 74 sec on a Lenovo laptop with a 2.9GHz processor and a 4GB RAM. The grid-based weighted projection methods took 1018 to 2814 sec with an average of 1701 sec. The unweighted projection method took 209 to 646 sec with an average of 409 sec. Finally, the rearrangement method took 53 to 606 sec with an average of 271 sec.
Next, we examine the interpolation property of the methods. The MAE of the lower and upper credible limits at the design points are given in Table 1. The GP model and Beliakov's interpolator have MAE at design points of zero because they interpolate the data. The monotone spline model and gridbased weighted projection method give tighter intervals at the design points than the unweighted projection method, and rearrangement method. The rearrangement method always gives the widest average prediction intervals at the design points. Note that the sums of the MAE of the prediction limits at the design points for the monotone spline models are somewhat less than their average prediction interval length.
Finally, we consider the effect of the choice of B. We fix x 4 = 0.5 and model Y as a function of x 1 , x 2 , and x 3 (d = 3). A total of 100 simulations is performed with B = V m /100, B = V m /20, and B = V m /10, where V m = max var x t 1 , . . . , var x t m . Results for the monotone spline model and grid-based weighted projection method are given in Table D1 and Table D2 in Appendix D. We see that when B is increased, the average prediction interval length decreases, and the interval predictions get closer to interpolating the experiment data. However, the average coverage of the intervals deteriorates. We recommend using B = V m /20 since it appears to give a good tradeoff.

CONCLUSIONS
This article proposes the weighted projection approach to construct monotonic metamodels for deterministic computer experiments. It is based on projecting sample paths of the posterior GP onto the cone of monotone functions with respect to a weighted norm with weight function that is inversely proportional to the posterior variance. Two computational implementations are proposed: the grid-based method and the monotone spline metamodel. In the first method, isotonic regression is performed on a grid and in the second method, the GP sample paths are projected onto a selected cone of monotone splines. The proposed monotonic metamodels tend to give more accurate predictions and substantially shorter credible intervals than the GP model with only slightly poorer credible interval cov-erage, as demonstrated in Example 2 and Example 3 (online supplement).
The proposed methods outperform the unweighted projection method in terms of prediction accuracy and interval length. The monotone spline metamodel is computationally less demanding and more accurate than the grid-based methods. It can be used for moderately high-dimensional problems when the number of significant functional ANOVA components is small and monotonicity with respect to a few inputs only is required because the number of decision variables and constraints for the quadratic programs that need to be solved would then be small. If one is more concerned with achieving good prediction interval coverage, then the grid-based weighted projection method should be used. A fully Bayesian approach that integrates out λ, σ 2 , and θ can be implemented in a straightforward way. The grid-based weighted projection method only requires posterior sample path values at the grid points while the monotone spline metamodel only requires posterior sample path values at the training points. The posterior sample path values can be obtained with the use of Markov chain Monte Carlo methods. This approach may give coverage closer to nominal since all parameter uncertainties are fully taken into account. However, it is more time consuming because Markov chain Monte Carlo simulation needs to be performed.

SUPPLEMENTARY MATERIALS
Appendices.pdf: This file contains Appendices A-E. Appendix A gives the multivariate isotonic regression algorithm. Appendix B contains proofs of the theorems given in the article. Appendix C gives a proof of (20). Appendix D presents additional numerical results for Example 2. An additional example is given in Appendix E.
Online zip file: This file contains Matlab codes for all methods considered in this article.

ACKNOWLEDGMENTS
The author thanks an associate editor and two referees for comments that helped improve the article significantly. This research was supported by Early Career Scheme (ECS) project No. 21201414 sponsored by the Research Grants Council of Hong Kong.