Monotonic Quantile Regression With Bernstein Polynomials for Stochastic Simulation

Quantile regression is an important tool to determine the quality level of service, product, and operation systems via stochastic simulation. It is frequently known that the quantiles of the output distribution are monotonic functions of certain inputs to the simulation model. Because there is typically high variability in estimation of tail quantiles, it can be valuable to incorporate this information in quantile modeling. However, the existing literature on monotone quantile regression with multiple inputs is sparse. In this article, we propose a class of monotonic regression models, which consists of functional analysis of variance (FANOVA) decomposition components modeled with Bernstein polynomial bases for estimating quantiles as a function of multiple inputs. The polynomial degrees of the bases for the model and the FANOVA components included in the model are selected by a greedy algorithm. Real examples demonstrate the advantages of incorporating the monotonicity assumption in quantile regression and the good performance of the proposed methodology for estimating quantiles. Supplementary materials for this article are available online.


INTRODUCTION
Prediction of the output of a computer model as a function of its inputs is an important problem in stochastic simulations of service, product, and operation systems. In these simulations, the outputs often have highly nonnormal distributions. When the output is nonnormal, estimation of the quantiles of the output rather than its mean and variance is more meaningful. Emulators are useful for reducing the cost of quantile estimation in stochastic simulation. However, compared to estimation of the mean, estimation of tail quantiles typically involves more uncertainty (a larger variance for the same sample size). Thus, it is of interest to use prior knowledge about the conditional quantile function to reduce estimation variability. Frequently, it is known that the quantiles are monotonic functions of the inputs. For instance, in open Jackson network queueing systems (Gross et al. 2013), the quantiles of the waiting time distribution are increasing functions of arrival rates into the system. As another example, quantiles of the solutions of stochastic differential equations (Xiu 2010) are often monotonic in the equation parameters. A specific example is the transient heat transfer problem considered in Cengel (2002), in which the temperature of a solid body immersed in a fluid is a solution of a differential equation. Various physical properties of the system (e.g., thermal conductivity) can be viewed as uncertain and assigned a probability distribution. Physical knowledge suggests that for a solid with lower temperature than its medium, the temperature in the solid should increase with time under all conditions. Thus, the quantiles of the temperature distribution should increase with time. In these examples, a model that is consistent with the monotonicity information will be advantageous due to three reasons. First, quantile estimation accuracy is expected to improve. As pointed out by a referee, for strictly monotone functions, any consistent estimator of the function that, when differentiated, yields a consistent derivative estimator will be monotonic for large sample sizes because the derivative will be strictly positive/negative. Consequently, the imposition of monotonicity will not improve performance asymptotically. However, significant improvements can often be seen when the sample size is small. Second, confidence intervals will not be unnecessarily wide because nonmonotonic predictions are ruled out. Third, the model is more interpretable and acceptable to the practitioner than one that is nonmonotonic. Due to these advantages, statistical methods for monotonic quantile estimation are of practical value. This article addresses the problem of monotonic quantile regression in stochastic simulation.
Linear regression models have long been used as a mean model in stochastic simulation experiments (Law 2014). Recently, Gaussian process (GP) emulators (Sacks et al. 1989;Currin et al. 1991) have been popularized for stochastic simulation by Jack P.C. Kleijnen and coauthors (Kleijnen 2009). Ankenman, Nelson, and Staum (2010) proposed a modification of the GP emulator, called stochastic kriging. The GP or stochastic kriging emulators are for estimating the mean. However, in many cases, interest centers on quantile estimation. This is the case for decision-making under uncertainty and nonparametric modeling of the output distribution. For example, in the design of a service system, a decision-maker may be interested in ensuring that the 0.9 quantile of the daily average waiting time of customers does not exceed a certain value. Note that maximization of quantiles of the utility is given a sound decision theory foundation by Rostek (2010). Quantile linear regression originated from the seminal work of Koenker and Bassett (1978), and there has been much further development on the subject (see Koenker 2005). Reich, Fuentes, and Dunson (2011) considered the problem of spatial modeling of quantiles. Recently, Plumlee and Tuo (2014) employed the GP model with a nugget to estimate quantiles in stochastic simulation.
Research on shape-restricted regression provides methods for building monotonic models. Chang et al. (2007), McKay Curtis and Ghosh (2011), and Wang and Ghosh (2012) used Bernstein polynomial models for monotonic mean regression with a single input. There are many other methods (e.g., Ramsay 1988;Mammen et al. 2001). Most monotone models can be used for both mean and quantile estimation. He and Ng (1999) discussed monotone quantile smoothing with a single input. Kim (2006) studied linear quantile models where the coefficients are monotonic functions of an input. There is less work on monotonic regression with multiple inputs. Wang and Ghosh (2011) used multivariate Bernstein polynomials. Gluhovsky (2006) modeled main effects and two-factor interactions with I-splines. Few articles explicitly consider monotone quantile regression with multiple inputs. Chernozhukov, Fernandez-Val, and Galichon (2009) proposed a general method, called rearrangement, to monotonize a nonmonotonic estimate of a function. Maineffects-only models (Koenker 2005) can be used but they are restrictive.
This article employs monotonic regression models obtained by modeling functional analysis of variance (FANOVA) components with Bernstein polynomial bases, called FANOVA Bernstein models, for quantile estimation in stochastic simulation. Stochastic simulation problems typically consist of multiple inputs and the output is often known to be monotonic functions of some of the inputs. Moreover, we can expect only a handful of inputs to have large effects on the output and a small subset of the large number of possible interactions between inputs to be significant. Thus, it is often of interest to build a monotonic quantile regression model with multiple inputs that includes only important main effects and interactions between inputs. We propose an approach based on identifying important FANOVA components by modeling each component using Bernstein polynomials. Monotonicity is guaranteed by imposing certain linear restrictions on the model coefficients. The degree of the Bernstein bases used and the FANOVA components are chosen by minimizing the Bayesian information criterion (BIC; Lian 2012).
The rest of the article is organized as follows: Section 2 reviews the univariate and multivariate Bernstein polynomials. Section 3 introduces the FANOVA Bernstein model and shows how monotonicity can be imposed. Section 4 discusses the monotone quantile regression problem. Section 5 proposes an algorithm for model selection, and discusses design and statistical inference issues. Two examples are given in Section 6. Section 7 concludes the article.

UNIVARIATE AND MULTIVARIATE BERNSTEIN POLYNOMIALS
The univariate Bernstein polynomials/bases of degree N on [0, 1] are given by The univariate Bernstein model employed by Wang and Ghosh (2012) and others is (2) For monotonic regression with d > 1 inputs on [0, 1] d , Wang and Ghosh (2011) Because One important limitation of model (4) is that the number of bases increases very fast with the dimension d. Thus, a large experiment is needed to fit the model when d is large.
The linear restrictions on the coefficients (5) are a sufficient but not necessary condition for monotonicity. Thus, coefficients that give a monotonic f may not satisfy (5) and it is not clear that (5) allows sufficient flexibility to model a large class of functions. A justification for imposing (5) is the following theorem, stated in page 115 of Gal and Anastassiou (2010).
Let β = (β l : l ∈ ℘(N)) and S k (N) be the set of β's that satisfy (5). If F is increasing in z k 1 , . . . , z k L , setting β l = F (l/N) yields a β that is in L j =1 S k j (N), that is, an f that is increasing in z k 1 , . . . , z k L . This observation and Theorem 1 imply that there exists . . . , N id ). This justifies the use of constraints (5) to achieve monotonicity.

FANOVA DECOMPOSITION AND MODELING
effect, F (i,j ) a two-factor interaction, and similarly for the other components of the decomposition. Such decompositions can be employed to improve modeling of F because we can assume that a small subset of main effects and two-factor interactions are important. This assumption is supported by the effect sparsity and hierarchy principles (Wu and Hamada 2009). Efron and Stein (1981) pointed out that the FANOVA decomposition is unique if z has independent components and the conditions E z i j [F (i 1 ,...,i e ) (z i 1 , . . . , z i e )] = 0, j = 1, . . . e are imposed, where the expectation is taken with respect to z i j while holding other inputs fixed. This decomposition is an important tool in global sensitivity analysis (Saltelli, Chan, and Scott 2000) and functional data analysis (Ramsay and Silverman 2005). This article adopts a different FANOVA decomposition obtained by imposing a different set of conditions that allow easier modeling with the multivariate Bernstein bases.
Note that N l=0 B(z, l, N ) = 1 so that an alternative set of bases with the same span, which we call the modified Bernstein bases is . . . , N. (8) If we use the tensor product of the modified Bernstein bases (8) instead of the tensor product of the Bernstein bases (as in (4)) for function approximation, we obtain a model that decomposes into the FANOVA components given in (7). This is because b(z j , 0, N j ) = 1. Sufficient conditions that ensure monotonicity of f (z) with respect to z i can be obtained as follows: Since {B(z, 0, N), . . . , B(z, N, N )} and {b(z, 0, N), . . . , b(z, N, N )} have the same span, model (9) can be rewritten in terms of the bases which gives model (4). By equating (9) and (4), we obtain (see Appendix A) From (10), we obtain the following theorem, which gives linear restrictions on the α (l 1 ,...,l d ) 's that are sufficient to achieve monotonicity in an axial direction.
Since it is often the case that only main effects and twofactor interactions are significant, we shall assume that F (z) in the remainder of the article. Proposition 1 demonstrates that the main effect F (k) plays an important role in ensuring that F is increasing in z k . It must be increasing. It must also offset all decreases in two-factor interactions involving input k, that is, the positive increment in F (k) must be at least equal in magnitude to the sum of all negative increments in two-factor interactions involving input k. Then, Remark. By abuse of notation, we let F (k,j ) (z k , z j ) = F (j,k) (z j , z k ), j < k in this article.
The full FANOVA Bernstein model (with all main effects and interactions) is given by Note that the Bernstein bases B(z i , l, N i ) are used to model f (i) and their products with B(z j , m, N j ) are used as bases to model f (i,j ) . The model (13) will be employed as the full model in the remainder of the article. Corollary 1 simplifies the result in Theorem 3 for model (13).
Corollary 1. Let f N (z) be given by (13)-(14) and define , then it can clearly be seen that (15) arise from imposing the monotonicity of F on the grid The following corollary provides a justification for imposing the restriction (15) to achieve monotonicity in (13). (15) for all k ∈ {k 1 , . . . , k L }, which implies thatf N is increasing in the inputs k 1 , . . . , k L . In addition,f N converges uniformly to F .

MONOTONE QUANTILE REGRESSION MODEL
Let Y z denote the stochastic simulation output given the vector of model inputs z and let f τ : z → R denote the function that maps z to the τ th quantile. Thus, where P (ε ≤ 0|z) = P (Y z ≤ f τ (z)|z) = τ . In this article, we shall consider the problem of estimating the conditional quantile function f τ in cases where f τ is known to be a monotonic function of one or more of the inputs. This problem is widely encountered but knowledge of monotonicity is often neglected in statistical modeling of f τ . Numerous physical relationships are monotonic and of the form π 0 = d j =1 φ(π j ), where the π j 's are either physical quantities or dimensionless parameters (Barenblatt 1996). Taking logarithm on both sides of the equations yields an additive relationship. Additive models have also been used in many other areas of statistical modeling (Hastie and Tibshirani 1990). These examples provide support for modeling monotonic relationships with only main effects and two-factor interactions, in addition to the effect hierarchy principle (Wu and Hamada 2009). Moreover, the effect sparsity principle suggests that only a subset of main effects and two-factor interactions are important. Thus, we shall adopt a model selection approach that selects important main effects and two-factor interactions while enforcing monotonicity constraints.
It is possible to model f τ using (4). However, this restricts the applicability of the method to low-dimension problems because the number of basis functions is M = d j =1 (N j + 1), which can be very large. Note that we need a design of size at least M to fit the model, which can be costly because the computer model simulations are time consuming. In contrast, even for the full FANOVA Bernstein model (13), the number of basis functions is For example, if N = 5 and d = 4, M = 1296 and N = 171. Due to the typically higher signal-to-noise ratio in stochastic simulation compared to physical experiments, the values of N 1 , . . . , N d frequently need to be larger than one to model the true functions well with (4) or (13). Thus, the number of basis functions needed to model monotonic functions accurately using (4) with constraints (5) will often be too large. The number of constraints in (15) ,j =k (N j + 1) constraints. However, this number can be significantly reduced if only a subset of main effects and two-factor interactions in (13) are nonzero. If input k does not interact with inputs z 1 , . . . , z e , then we would only need N k d j =e+1,j =k (N j + 1) constraints to enforce monotonicity in input k. The approach in this article incorporates selection of main effects and two-factor interactions, which often significantly reduces the number of constraints needed to impose monotonicity.
We employ quantile regression models of the form (13) that allow for selection of main effects and two-factor interactions. The proposed FANOVA Bernstein model is given by where δ (i,j ) takes on the value 0 or 1. Note that if N i = 0, then the main effect of input i is excluded from the model. In ad-dition, because the same N i is used to model f (i) and f (i,j ) , the interaction f (i,j ) can be included in the model only when both f (i) and f (j ) are included in the model, that is, N i > 0 and N j > 0. The value of δ (i,j ) determine whether f (i,j ) is actually included. This incorporates the strong heredity principle, which states that a two-factor interaction can be in the model only if both of its parent main effects are active (Chipman 1996;Wu and Hamada 2009). Given N and δ = (δ (1,2) , . . . , δ (d−1,d) ), f τ is a linear model with coefficients α (i) l corresponding to active main effects (N i > 0) and α (i,j ) (l,m) corresponding to active two-factor interactions (N i > 0, N j > 0, δ (i,j ) = 1). To force f τ (z)|(δ, N) to be increasing in z k , we use (15). Note that if N k > 0 and δ (k,j ) = 0 for all j ∈ J , then (15) can be replaced with For simplicity of notation, we order and rewrite the nonzero coefficients in (18) . This is the key idea in quantile regression (Koenker and Bassett 1978). When the distribution of ε in (17) depends on z, it is more efficient to estimate α by minimizing a weighted check loss described in page 160 of Koenker (2005). However, the weights are difficult to estimate.
To incorporate the monotonicity constraint ∂ ∂z k f τ (z)|(δ, N) ≥ 0∀z ∈ [0, 1] d , k ∈ ⊂ {1, . . . , d}, we simply need to enforce the linear constraints (15), which can be written as Aα ≥ 0, where the matrix A consists of elements that equal −1, 0, or 1. Thus, the monotonic quantile regression problem can be formulated as an optimization problem: The problem (19) can be rewritten as a linear program In (20), X is a matrix with rows x(z 1 ), . . . , x(z n ) and 1 n is an n × 1 vector of 1's.
We shall provide a justification for the approach given by (19) and (20). Suppose that there are t distinct design points z 1 , . . . , z t each replicated q times. As q → ∞, the objective function in (19) normalized by n = tq converges to where Y i denotes the output at z i . Thus, the integrated check loss , N), where w is a density on [0, 1] d , can be viewed as the population quantity being estimated by the objective function in (19). Let F(z) = f τ (z) |(δ = 1, N), which is a model that includes all main effects and two-factor interactions when N > 0. Theorem 4 states that minimizing (15), where F is the true conditional quantile function, yields a quantile regression modelF N that achieves the minimum possible integrated check loss (which is achieved by F ) as min{N 1 , . . . , N d } → ∞.
Theorem 4. Let F : [0, 1] d → R be a continuous function that is increasing in z k , k ∈ ⊂ {1, . . . , d} and w be a density on [0,1] where S is a compact set that contains the coefficients off N . Then,

MODEL SELECTION, STATISTICAL INFERENCE, AND DESIGN
For practical purposes, we need a data-driven procedure to determine the values of N 1 , . . . , N d and the δ (i,j ) 's. By increasing N, model (18) can approximate functions of increasingly complex shapes. On the other hand, by setting N j = 0, model (18) will be independent of z j and by setting δ (k,j ) = 0∀j = k, it will be additive in input k. Thus, by selecting appropriate N and δ, (18) can be used to model a wide variety of functions with moderately large number of inputs.
We use the BIC (Lian 2012) for comparing models with different N and δ. The BIC is whereα is the optimal solution to (20) and |α| is the number of elements inα. The BIC has been proven to be asymptotically consistent for model selection under certain conditions. We set BIC = ∞ for all models with model matrix X that is not of full column rank. Without any monotonicity constraints, the model coefficients are not identifiable and we can reduce the number of columns of X (which is equal to |α|) without changing . The situation is more complicated when there are monotonicity constraints. However, if there exists an optimal solutionα to (20) such that Aα > 0, then there are multiple optimal solutions to (20). Thus, it makes sense to rule out models that give an X that is not of full column rank even if there are monotonicity constraints. For quasi-random spacefilling designs, model matrices are almost always of full column rank unless the number of columns exceeds the number of rows.
Because the number of models is huge, we use a heuristic optimization approach to search for the best model. We suppose that N i ≤ N max , i = 1, . . . , d, where N max is an upper bound specified by the modeler. The model search algorithm is given in Figure 1. The algorithm in Figure 1 is a greedy algorithm. With δ fixed at 0, it optimizes N by finding the change in a single component that improves the BIC value by the largest amount, where each component is restricted to take on values 0, . . . , N max . This is repeated until the BIC cannot be improved, which yields N * . With N fixed at N * , each two-factor interaction involving only inputs with active main effects is removed from and added to the model. When the (i, j ) interaction is added, the algorithm also finds the best N i conditioned on N k = N * k , k = i and the best N j conditioned on N k = N * k , k = j . The reason for this step is that if N * i and N * j are both large, then adding the (i, j ) interaction into the model will increase the number of terms by N * i N * j , which is huge. This may give a model with far too many terms (perhaps even more than the number of experiment runs) and a high BIC value. As such, the (i, j ) interaction may not be able to enter the model even though it is significant. By reducing N i or N j , the (i, j ) interaction can be allowed to enter the model. The best option involving two-factor interactions is adopted each time Step 3 is performed. A best option can involve removing interaction (i * , j * ), adding interaction (i * , j * ) and changing N i * , or adding interaction (i * , j * ) and changing N j * . This process is repeated until no improvements in the BIC can be made, which leads to termination of the algorithm. Note that existing model selection methods cannot be used for the problem in this article. The LASSO and other penalties (Wu and Liu 2009) can be used to select model terms but not the polynomial degrees N. Moreover, they do not impose the effect heredity principle, which have been found improve model selection when the number of model terms far exceeds the number of distinct experiment runs (Chipman 1996), as in the problems considered in this article.
To quantify model and parameter estimation uncertainty, we employ a bootstrap procedure. In stochastic simulations, it is common and sensible to employ replicated designs, where each distinct input setting is replicated q times. Let the ordered observations at a distinct design point z i be denoted by Y (1) , . . . , Y (q) . Then, we estimate the quantile function ζ i (τ ) = f τ (z i ) with the piecewise linear function that connects the points (0, Y (1) − 0.5(Y (2) − Y (1) )), (0.5/q, Y (1) ), . . . , ((q − 0.5)/q, Y (q) ), (1, Y (q) + 0.5(Y (q) − Y (q−1) )). The first point is obtained by continuing the linear trend between (0.5/q, Y (1) ) and (1.5/q, Y (2) ) and similarly for the last point. For each bootstrap replication, we sample q uniform random numbers on [0, 1] and compute the value of the piecewise linear quantile function at the q numbers to obtain q random samples of the response. This is called a smoothed bootstrap (Silverman and Young 1987). The design is fixed and not resampled. Using each bootstrap sample of the output as data, we find the best model using the model search algorithm in Figure 1. The collection of best models found with the procedure gives an indication of model uncertainty. In addition, a 100(1 − θ) % confidence interval for a quantile is constructed from the θ/2 and 1 − θ/2 sample quantiles of the estimates given by the best model in each bootstrap sample.
Optimal designs for the FANOVA Bernstein model is a complex problem that we shall leave for further research. In the examples, we employ the cosine maximin Latin hypercube design (cosine MLHD) proposed by Dette and Pepelyshev (2010) and Tan (2015). Both Dette and Pepelyshev (2010) and Tan (2015) demonstrated that the cosine MLHD improves performance of GP models. Furthermore, Tan (2015) showed that cosine ML-HDs are good for fitting polynomial models. Since (18) is a polynomial model, cosine MLHDs should be a good choice for fitting the model. We suggest that the number of replicates q for the design be chosen so that the expected number of observations above or below the quantile to be estimated is at least one, that is, qmin{τ, 1 − τ } ≥ 1. This will allow reasonably reliable estimation of the quantile at each distinct design setting using only the data for that setting.
When estimating multiple conditional quantile functions, the problem of quantile crossing can occur, that is, an estimated τ 1 quantile can be larger than an estimated τ 2 quantile at a given z even though τ 1 < τ 2 . This problem can be overcome by rearranging the quantile estimates, as proposed by Chernozhukov, Fernandez-Val, and Galichon (2009).

EXAMPLES
This section presents two examples in which quantiles of an M/M/1 queueing system and a real physical model are estimated. Appendix B gives a numerical example based on a more complex queueing system. In each example, we compare the performance of the FANOVA Bernstein model to the stationary GP model with Gaussian correlation function and a nugget term employed by Plumlee and Tuo (2014), which we call the quantile GP model. The quantile GP parameters are estimated by the method of maximum likelihood (Currin et al. 1991), using the sample quantiles at distinct design points as data. Since the quantile GP is nonmonotonic, we also consider rearranging the quantile GP predictions and credible limits (Chernozhukov, Fernandez-Val, and Galichon 2009). This procedure is attractive because Chernozhukov, Fernandez-Val, and Galichon (2009) proved that it does not worsen and can improve prediction accuracy and coverage. Computational implementation of rearrangement requires monotonizing the function estimate on a grid. Fine grids are needed to approximate the theoretical rearrangement estimator well. This is too computationally intensive when there are more than two inputs. Thus, we apply the rearrangement procedure on a coarse grid {0, 0.1, . . . , 1} d for Examples 2 and 3, which have d = 5 and d = 4, respectively. The average rearrangement estimator suggested by Chernozhukov, Fernandez-Val, and Galichon (2009) is employed, which requires rearranging with respect to all d! ordering of the inputs. The grid partitions the input space [0, 1] d into cells, which are the smallest hypercubes with grid points as corners. To predict at a point not on the grid, we use model (4) with N = (1, . . . , 1) fitted by interpolating the rearranged predictions at the corners of the cell containing the point. To fit the model, the inputs are rescaled so that the cell is [0, 1] d . This gives a continuous monotonic predictor. Note that Chernozhukov, Fernandez-Val, and Galichon (2009) did not provide any suggestion on the choice of grid size and monotonic interpolator.
It is possible to fit a monotonic model with main effects and two-factor interactions that are modeled with linear splines. However, linear splines are not differentiable and can be poor approximations of differentiable computer models widely seen in practice. Linear spline methods proposed by Koenker, Ng, and Portnoy (1994) and Koenker and Mizera (2004) are implemented in the quantreg package in R. The quantreg package only has the capability to fit a model with prespecified main effects and two-factor interactions. Automatic selection of significant main effects and two-factor interactions is needed when there are many FANOVA components. Moreover, the package can only fit linear spline models that have monotonic main effects. It does not provide the capability to fit monotonic linear spline models with two-factor interactions.

Example 1: M/M/1 Queue
This example illustrates some advantages of the proposed method over the quantile GP model, rearranged quantile GP model, and a cubic Bernstein model (2) without the monotonicity constraint (fitted by check loss minimization) for a transient M/M/1 queuing system. The first arrival occurs at time 0, and the response Y is the average time the first 10 customers spend in the system. The mean time between arrivals is fixed at one and the mean service time z 1 can take on any value in [0, 1]. We shall model quantiles of Y as a function of z 1 using a design with four distinct points {0.02, 0.34, 0.66, 0.98}. Ten independent replicates are run at each point. The degree N 1 of the FANOVA Bernstein model is restricted to {0, 1, 2, 3} and 95% confidence intervals for the model are constructed with the proposed bootstrap method. Credible limits for the quantile GP model are obtained from the posterior GP given in Currin et al. (1991).
In Figure 2(a)-2(d), we plot estimation results for the 0.9 quantile obtained with two sets of data. Figure 2(a) and 2(c) is constructed with the same data (similarly for Figure 2(b) and 2(d)). An accurate estimate (based on 50,000 samples) of the 0.9 quantile is plotted as a solid line with crosses in each figure.
The experiment data are plotted as circles. Point estimates and 95% confidence limits (500 bootstrap samples) for the FANOVA Bernstein model are plotted as dashed lines. The cubic Bernstein model is plotted as solid lines. Point estimates and 95% credible limits for the quantile GP are plotted as dotted lines in Figure  2(a) and 2(b), while those for the rearranged quantile GP are plotted as dash-dot lines in Figure 2(c) and 2(d). Figure 2(a) and 2(c) shows that credible intervals obtained from the quantile GP or rearranged quantile GP models may not be reasonable because they do not take into account the increase in variance of Y as z 1 increases. The quantile GP model uses a constant variance nugget term. Consequently, the credible intervals for the model have nearly the same width for all z 1 . In contrast, the FANOVA Bernstein model provides reasonable confidence intervals that increase in width as z 1 increases. Figure 2(b) and 2(d) demonstrates that the monotonicity constraints make estimation of quantiles robust against outliers. The figure illustrates the case where there is one outlier at z 1 = 0.66. This outlier seriously affects the cubic Bernstein and quantile GP models, which rise close to the outlier and give nonmonotonic predictions. The quantile GP also gives interval predictions that are too narrow because the estimated nugget variance is near zero. Rearranging the quantile GP yields no visible improvements in predictions but creates sharp bents in the prediction curves, which is physically implausible in this example. In contrast, the FANOVA Bernstein model provides accurate monotonic predictions that are only slightly affected by the outlier. The confidence intervals are wide due to the outlier, which is a reasonable behavior.
Simulation results presented in Appendix C show that the FANOVA Bernstein model provides more accurate predictions and better prediction interval coverage than the quantile GP and rearranged quantile GP models.

Example 2: Temperature of a Solid Sphere Immersed in Fluid
In this example, we model quantiles in a heat transfer problem described by Cengel (2002). The sphere is immersed in a fluid with a higher temperature at time 0. Heat transfer takes place between the sphere and the fluid via convection and inside the sphere via conduction. The temperature Y in the sphere depends on the nine inputs u 1 , . . . , u 9 shown in Table 1. The table also gives the units for each of the 10 variables (output and inputs). The relationship between Y and u 1 , . . . , u 9 is the solution of a partial differential equation. It is given explicitly by 1) π, iπ) , π 1 = u 5 u 7 /u 6 , π 2 = u 6 u 2 / u 8 u 9 u 2 7 , where π 1 is the Biot number and π 2 is the Fourier number. The η i 's are the solutions of the equation 1 − η cot η = π 1 , where η i ∈ ((i − 1)π, iπ). In our computations, we approximate the infinite series in (22) with the sum of the first four terms only, which is quite accurate. The above model is useful for manufacturing process design (see chap. 8 of Jaluria and Torrance 2003). From physical considerations, we know that Y is increasing in u 1 , . . . , u 5 whenever u 4 < 0, that is, whenever the temperature of the medium is higher than the initial temperature of the sphere. The temperature at a point nearer the surface of the sphere is higher than the temperature at a point farther from the surface. Thus, Y is increasing in u 1 . The temperature Y should clearly increase with time u 2 since heat flows into the sphere. If the temperature of the medium u 3 gets higher, the temperature at any point in the sphere should get higher. Moreover, if the initial temperature of the sphere is closer to the temperature of the medium (u 4 increases), the temperature at any point in the sphere should be higher. Finally, as the convective heat transfer coefficient u 5 increases, heat transfer into the sphere increases. Thus, Y is increasing in u 5 . Note that the arguments above hold for each u i conditional on u j , j = i.
We standardize u 1 , . . . , u 5 and work with the standardized input z = (z 1 , . . . , z 5 ) ∈ [0, 1] 5 , where z 3 = (u 3 − 250)/(270 − 250) and similarly for the other z i 's. The values of u 6 , . . . , u 9 are randomly sampled from their ranges in each computer model run. Thus, the value of Y returned by the computer model is stochastic. We shall model the 0.1, 0.5, and 0.9 quantiles of Y . The difference between the 0.9 and 0.1 quantiles is useful for quantifying the spread of the distribution of Y and the quantity (f 0.9 − f 0.5 ) − (f 0.5 − f 0.1 ) gives a measure of the skewness.
The relationship between f τ and z 2 (time) has an asymptote, as shown in Figure D1 in Appendix D. This is due to the temperature achieving its steady state value. This feature is manifested in  many physical systems. The spread of the distribution of Y also changes with z 2 (see Figure D1). We perform 50 simulations to compare the performance of the proposed approach to the multivariate Bernstein, quantile GP, and rearranged quantile GP models. The design employed is a 40-run cosine MLHD replicated 10 times. Each cosine MLHD is generated by first generating an MLHD U and transforming the MLHD to obtain a cosine MLHD D = [cos(Uπ) + 1]/2. Note that U is obtained by generating a 2500 Latin hypercube designs and selecting the one that performs the best with respect to the maximin criterion. Thus, D is different for each simulation.
In each simulation, the design D is generated and the output from the simulator is obtained. Then, quantile regression is performed using the multivariate Bernstein model (4) with the monotonicity constraints (5), the FANOVA Bernstein model, the quantile GP model, and the rearranged quantile GP model. The value of N max for the FANOVA Bernstein model is fixed at seven. For the multivariate Bernstein model, we set N = (1, 1, 1, 1, 1). Note that this model requires 2 5 = 32 distinct design points so that its coefficients are identifiable. If each component of N is increased to two, we would need 3 5 = 243 distinct design points. This demonstrates a disadvantage of the multivariate Bernstein model. Two quantile estimation performance measures are computed in each simulation run: the square error and absolute error averaged over a set of validation data points, which are the mean squared error (MSE) and mean absolute error (MAE), respectively. The validation data consist of 1000 replicates of a 500-run MLHD design. The square error at each of the 500 data points is (f τ −f τ ) 2 , wheref τ is the τ sample quantile of the 1000 replicates computed using Matlab's prctile function, andf τ is the quantile estimate obtained using one of the four alternative models above (similarly for the absolute error). In Appendix D, we plot the point and interval predictions given by the FANOVA Bernstein and rearranged quantile GP model for some simulated data and discuss model selection results. Table 2 presents the simulation mean and standard deviation of the MSE and MAE for the FANOVA Bernstein model and the mean difference in MSE/MAE between a competing model and the FANOVA Bernstein model. We see that the FANOVA Bernstein model is substantially better than the multivariate Bernstein model in all cases. Table 2 shows that in the simulations, the FANOVA Bernstein model performs better than the quantile GP and rearranged quantile GP models for τ = 0.5 and τ = 0.9 but performs worse when τ = 0.1. Since the same data are used to fit the four models in each simulation, the fraction of simulations in which the difference in MSE/MAE is positive can be used to test whether the FANOVA Bernstein model is significantly better. These fractions are given in Table 2. Since P (B/50 ≥ 0.68) = 0.0077, where B is a binomial random variable with 50 trials and success probability 0.5, we say that the FANOVA Bernstein model is significantly better if the observed fraction is larger than or equal to 0.68. We see that the FANOVA Bernstein model is significantly better (with respect to both MSE and MAE) than the quantile and rearranged quantile GP models for estimating the 0.5 and 0.9 quantile but significantly worse for the 0.1 quantile. We assess the coverage of 95% confidence/credible intervals of the FANOVA Bernstein, quantile GP, and rearranged quantile GP models. The coverage of the intervals for a model is the fraction that contains the sample quantiles of the validation data. The 95% confidence intervals for the FANOVA Bernstein model are constructed with 100 bootstrap samples. Table 3 gives the mean and standard deviation of the coverages. We see that the FANOVA Bernstein model gives coverage closer to the nominal coverage than the quantile GP or rearranged quantile GP models.

CONCLUSIONS
This article proposes a class of FANOVA Bernstein models for quantile estimation in stochastic simulation. The models are obtained by modeling main effects and two-factor interactions in a FANOVA decomposition with Bernstein bases. The model coefficients are estimated with a linear program as in quantile regression. Monotonicity with respect to the inputs is imposed by linear constraints on the coefficients. Two-factor interactions and the degree of the polynomial bases for each input are selected by an algorithm that uses the BIC as model selection criterion. A smoothed bootstrap method is employed to perform statistical inference. Three examples that involve estimating the quantiles of real stochastic simulation models illustrate that quantile point estimates obtained with the proposed model are robust to outliers in the response and can be more accurate than the quantile GP or rearranged quantile GP models.
A few areas require further research. First, a faster procedure for constructing confidence intervals is desirable. The bootstrap procedure for obtaining confidence intervals is time-consuming due to the need to repeat the model selection step. If the model selection step is not repeated, the confidence intervals that are obtained can suffer from undercoverage. Second, a rigorous method is needed to choose the design points and the number of replicates.

SUPPLEMENTARY MATERIALS
Appendices.pdf: This file contains Appendices A-D. Appendix A contains proofs of mathematical results. Appendix B gives a numerical example in which quantiles of a queueing simulation model with four inputs are modeled. Appendix C gives simulation results for Example 1. Appendix D gives illustrations and results for Example 2.