An Exact Confidence Set for a Maximum Point of a Univariate Polynomial Function in a Given Interval

Construction of a confidence set for a maximum point of a function is an important statistical problem which has many applications. In this article, an exact 1 − α confidence set is provided for a maximum point of a univariate polynomial function in a given interval. It is shown how the construction method can readily be applied to many parametric and semiparametric regression models involving a univariate polynomial function. Examples are given to illustrate this confidence set and to demonstrate that it can be substantially narrower and so better than the only other confidence set available in the statistical literature that guarantees 1 − α confidence level.


INTRODUCTION
Determination of the maximum points of a function is an important problem due to its wide applications. Consider a function of x = (x 1 , . . . , x q ) of the form where z(x) is a given p × 1 vector-valued function of x and θ = (θ 1 , . . . , θ p ) . The interest is in the maximum points of l(x, θ 0 , θ ) in a given region of x. Note that the maximum points of l(x, θ 0 , θ ) have nothing to do with the constant θ 0 . If the value of θ is known, then this is a simple calculus problem. The difficulty lies in that the value of θ is unknown and only an estimatorθ of θ with a certain distribution (specified in expressions (3) and (4) below) is available. Hence, the maximum points of l(x, θ 0 , θ ) can only be estimated based onθ .
In this article, we focus on the simple situation of q = 1 (and so x has only one component, denoted as x hereafter) and z(x) = (x, . . . , x p ) . Hence, is a pth order polynomial function of x, where f (x, θ ) = θ 1 x + · · · + θ p x p contains all the information about the maximum points of l(x, θ 0 , θ ). Assume that an estimatorθ of θ is available with normal distribution where is a known positive definite covariance matrix, and σ 2 is an estimator of the error variance σ 2 with distributionσ 2 ∼ σ 2 χ 2 ν /ν independent ofθ , where ν is the degrees of freedom (df) of the chi-square distribution. In the special case that σ 2 is a known constant, thenσ 2 = σ 2 and ν = ∞; hence, without loss of generality, we assumeθ ∼ N(θ, ) . (4) The focus of this article is on the construction of a 1 − α confidence set for a maximum point of the function f (x, θ ) in a given interval x ∈ [a, b] based on the distributional assumption (3), which includes the special case (4). This is much more challenging than point estimation, which naturally uses the maximum points of the estimated function f (x,θ ). It is noteworthy that the construction of a confidence set for a minimum point of f (x, θ ) in x ∈ [a, b] can be transformed into the construction of a confidence set for a maximum point of −f (x, θ ) = f (x, −θ ) in x ∈ [a, b] with the distributional assumption −θ ∼ N(−θ , σ 2 ). Hence , the methods developed in this article can readily be applied to the construction of a confidence set for a minimum point of f (x, θ ) in x ∈ [a, b].
The distributional assumption (3) follows naturally from the standard pth order univariate polynomial regression model: Y = θ 0 + f (x, θ ) + e. Based on n observations (Y i , x i ), i = 1, . . . , n, the least squares method gives ν = n − p − 1 and is resultant from deleting the first row and the first column of (X X) −1 where X is the usual n × (p + 1) design matrix. A maximum point of f (x, θ ) over covariate interval x ∈ [a, b] may represent the dose in the range [a, b] that maximizes the response, or the amount of fertilizer that produces the highest yield, etc. The first discussion of confidence regions and their practical application was by Box and Hunter (1954). Carter, Wampler, and Stablein (1983) considered the relevant applications in cancer chemotherapy, while Farebrother (1998) dealt with response surface models.
The distributional assumption (4) holds asymptotically for many parametric and semiparametric models. In the generalized linear, random effects linear, and random effects generalized linear models (see Pinheiro and Bates 2000, McCulloch and Searle 2001, and Faraway 2006, the mean response E(Y ) may be related to l(x, θ 0 , θ ) by a given monotone link function. Based on the observed data, one has approximatelŷ θ ∼ N (θ ,ˆ ) whereˆ is provided by many statistical software packages that deal with these models. Hence, the method developed in this article (for the special case (4)) can readily be used to construct an asymptotic 1 − α confidence set for a maximum point of f (x, θ ) over a given interval x ∈ [a, b]. For Cox's proportional hazard model (see Cox 1972, 1975, and Kleinbaum and Klein 2005, the hazard function may be specified as h(t, x) = h 0 (t) exp{f (x, θ )}. The partial likelihood method gives approximatelyθ ∼ N(θ ,ˆ ). In quantile regression (see Koenker 2005), the q-quantile may be modeled by l(x, θ 0 , θ ). Again one has approximatelyθ ∼ N (θ,ˆ ). These examples serve to demonstrate wide applicability of the confidence sets developed in this article.
The method given by Box and Hunter (1954) can be used to construct a confidence set for a stationary point of f (x, θ ) over the whole covariate range x ∈ (−∞, ∞). A program for computing this confidence set is given by Del Castillo and Cahya (2001); see also Del Castillo (2007, Chapter 7) for details. Box and Hunter's confidence set is, however, often misused as a confidence set for a maximum point of f (x, θ ) over the given x ∈ [a, b]. See, for example, Bliss (1970, p. 44), Carter, Wampler, and Stablein (1983, p. 19), and Weisberg (2005 [a, b]. Even if f (x, θ ) has a stationary point in x ∈ [a, b], this stationary point may be an inflection point or a minimum point rather than a maximum point. Hence, Box and Hunter's confidence set for a stationary point is not the confidence set we want, as pointed out by Del Castillo and Cahya (2001). If one uses Box and Hunter's confidence set for a stationary point as the confidence set for a maximum point of f (x, θ ) in x ∈ [a, b] only when the fitted function f (x,θ ) indicates that the stationary point is a maximum point, an idea used in Stablein et al. (1983) for a maximum point over the whole covariate range, then this way of constructing a confidence set conditional on the curvature at the stationary point will make the true confidence level different from the 1 − α used in Box and Hunter's method and very difficult to evaluate. One can construct bootstrap confidence sets by resampling either the residuals or the observations (y 1 , x 1 ), . . . , (y n , x n ); see, for example, Efron and Tibshirani (1993), Davison andHinkley (1997), andFarebrother (1998). The true confidence level of a bootstrap confidence set is difficult to assess for both large and finite sample sizes, however. Rao (1973, p. 473) provided a general method of constructing a conservative confidence set for g(θ) where g(·) is any given function of θ. Applying this method, Carter et al. (1984) constructed conservative confidence sets for both the location of, and the response at, the stationary point of the function f (x, θ ). This method can also be used directly for the construction of a confidence set for a maximum point; see Section 2.5 for details.
Let k be a global maximum point of f (x, θ ) in x ∈ [a, b]. In this article, we provide an exact 1 − α confidence set for k by inverting a family of exact 1 − α acceptance sets for testing H 0 : a, b]. This method of constructing a 1 − α confidence set was given by Neyman (1937) and has been used and generalized to construct numerous intriguing confidence sets; see, for example, Lehmann (1986), Stefansson, Kim, and Hsu (1988), Hayter and Hsu (1994), Finner and Strassberge (2002), Huang and Hsu (2007), and Uusipaikka (2008). Note that Box and Hunter's (1954) confidence set for a stationary point is constructed by using this method too. Indeed this method was also used by Peterson, Cahya, and Del Castillo (2002) and Cahya, Del Castillo, and Peterson (2004) (referred to as PCD and CDP, respectively, henceforth) to construct a confidence set for a minimum point of l(x, θ 0 , θ ) in (1) within a given region. But PCD and CDP concentrate on the situations with p = 2 and q > 1, while the focus of this article is on the situation of a general p ≥ 1 and q = 1.
The layout of this article is as follows. The construction of the new exact confidence set for k and several related issues are studied in Section 2. Two examples of illustration are given in Section 3. Finally, some concluding remarks are contained in Section 4.

Construction Method
Neyman (1937) gave the following theorem on the construction of a confidence set; see also Lehmann (1986, p. 214).
where γ is the unknown parameter. Let B and be the parameter space and sample space, respectively. For each γ 0 ∈ B, let A(γ 0 ) ⊂ be the acceptance set of a size α test of H 0 : γ = γ 0 , that is, Note that if the size of the test of H 0 : γ = γ 0 is equal to α for each γ 0 ∈ B, then the confidence level of C(Y) is equal to 1 − α. This theorem is applied directly to construct a confidence set for k, a maximum point of f (x, θ ) in x ∈ [a, b]. To apply the theorem, we require an acceptance set for testing Hence, a natural 1 − α level acceptance set can be constructed as , and c(k 0 ) is the critical value chosen so that the acceptance level of A(k 0 ) is equal to 1 − α. Then, according to the theorem, an exact 1 − α level confidence set for k is given by The key in the construction of C E (Y) is therefore the computation of the critical value c(k 0 ) for each k 0 ∈ [a, b], which is considered next. Let P be the square root matrix of the p × p positive definite covariance matrix . Hence P 2 = and T = P −1 (θ − θ)/σ ∼ T p,ν , a standard p-dimensional tdistribution with ν degrees of freedom (see Genz and Bretz 2009). For z 1 = z 2 , define where the first inequality in (7) follows directly from the inequality in (5). Note further that with the infimum being attained at θ 1 = θ 2 = · · · = θ p = 0. Hence, the exact critical value c(k 0 ) is the unique solution of the equation Note that the critical value c(k 0 ) gives an exact α size test for each k 0 ∈ [a, b]. Hence, the corresponding confidence set (6) is of level 1 − α exactly. Next, we consider the cases of p = 1, p = 2 and p ≥ 3 separately for the determination of c(k 0 ). They provide the insight that c(k 0 ) depends on p, [a, b] and k 0 among other parameters.
from which it is clear that where t γ ν is the upper γ -point of a t-distribution with ν degrees of freedom. Hence, the confidence set (6) becomes whereĥ =θ 1 /(σ /σ var(θ 1 )). This confidence set is largely intuitive sinceĥ > t α/2 ν means that the usual two-sided t confidence interval for the slope θ 1 of the regression line f (x, θ ) lies entirely to the right of zero, which implies θ 1 is positive and so the maximum point is given by b only. Similarly, h < −t α/2 ν means that the two-sided t confidence interval for θ 1 lies entirely to the left of zero, which implies θ 1 is negative and so the maximum point is given by a only. When −t α/2 ν ≤ĥ ≤ t α/2 ν , the two-sided t confidence interval for θ 1 contains zero and so the maximum point can be anywhere in the interval [a, b].
When p = 2 and so the regression function f (x, θ ) has a quadratic form (with zero intercept), the probability in (8) So, the probability is for the standard bivariate T 2,ν random vector T in the region R k 0 . We need to consider the computation of P{T ∈ R k 0 } for k 0 = a, k 0 = b and k 0 ∈ (a, b) separately. It is shown in the appendix that and c(s) for s ∈ (a, b) can be solved from For a general p, especially for p ≥ 3, to find the probability in (8) involves a (p − 1)-dimensional integration and so becomes difficult by using numerical quadrature. Hence, we use the following simulation-based method to find c(k 0 ) for each k 0 for a general p; this method also works for p = 2 of course. Define then expression (8) becomes P{G k 0 (T ) ≥ −c(k 0 )} = 1 − α. We employ the following steps in finding c(k 0 ) by simulation.
Step 2. For each T i (1 ≤ i ≤ n T ), we compute G k 0 (T i ) in the following way. From the definition of G k 0 (T i ) in (10), the infimum can be attained only possibly at the end points a and b and the stationary points in the interval [a, b] of Note that U i (x), V i (x), dU i (x)/dx, and dV i (x)/dx are all polynomial functions and that the polynomial to the left of the equality in (11) is of order (3p − 5). Therefore, there are at most (3p − 5) real roots for the polynomial equation in (11), denoted as r 1 , . . . , r q with 0 ≤ q ≤ 3p − 5; these can easily be found numerically. It follows therefore that Step 3. Sort G k 0 (T i ), i = 1, . . . , n T in ascending order, and use the [αn T ]th value, −ĉ(k 0 ), as −c(k 0 ).
This simulation approximation to c(k 0 ) approaches the exact value c(k 0 ) as n T → ∞ and provides an accurate approximation when n T is sufficiently large. The accuracy of c(k 0 ) can be measured in two ways. The method of Edwards and Berry (1987, Lemma 2) assesses the random variable W (T , k 0 ) = P{G k 0 (T ) ≥ −ĉ(k 0 )} which has a beta distribution with mean 1 − α and variance α(1 − α)/(n T + 2). Hence, when α = 0.05 and n T = 100,000, the 3-σ rule indicates that W (T , k 0 ) has its value in 1 − α ± 3 √ α(1 − α)/(n T + 2) = (0.9479, 0.9521) with a probability almost equal to one. The method of Liu et al. (2005) assesses the random variablê c(k 0 ) which has an asymptotic normal distribution. Based on this, for example, when n T = 10,000, most critical constants should be accurate to two decimal places. In Example 1 of Section 3, n T = 100,000 and the critical constants should be accurate to two decimal places at least and so can be regarded as "exact" for practical purpose; see more information in Example 1. In this way, we have "exact" c(k 0 ) for each k 0 and then construct the exact confidence set according to (6).

The Critical Value in PCD and CDP
PCD and CDP consider the construction of a confidence set for a minimum point for the more general function l(x, θ 0 , θ ) in (1). If the method of PCD (in particular, the rejection formula (3.7) on p. 425) is used for a maximum point of the polynomial function f (x, θ ) in (2) over interval [a, b], then the resulting confidence set is of the form (6) but with a different critical constant c α in the place of c(k 0 ). It must be emphasized that both CDP and PCD focus exclusively on the situation of q > 1 ( and mostly p = 2). For this situation, PCD (2002, p. 424, lines 7-4 above expression (3.3)) and CDP (2004, p. 502, lines 6-7) recommended using the critical constant c α = √ qF (1 − α; q, ν) for l(x, θ 0 , θ ), where q is the number of components in x and F (1 − α; q, ν) denotes the 1 − α quantile of the F distribution with q and ν df, and simulation results are provided to support the use of this critical constant. If one is curious to extend the use of c α to the situation considered in this article, that is q = 1, then the c α becomes √ F (1 − α; 1, ν) = t α/2 ν . It is clear from Section 2.2 above that the exact c(k 0 ) solving the equation in (8) depends on p, k 0 and [a, b] among others. Next we show that c(k 0 ) > t α/2 ν when p ≥ 2, b − a > 0 and k 0 ∈ (a, b), and so t α/2 ν is too small to use in the place of c(k 0 ) in (8) to guarantee the probability in (8) is at least 1 − α.
Let's consider c(s) for p = 2 and s ∈ (a, b) when b − a → 0 + , that is, when the length of the interval [a, b] approaches zero. When b − a → 0 + , the angle φ 3 (s) in Figure 4 approaches zero too and so the corresponding limit of region R s is made up with all the points between two parallel straight lines that are c(s) distance from the origin. The probability of the T in this limit region of R s is therefore equal to F 1,ν c(s) 2 and so the corresponding limit of c(s) is given by t α/2 ν . It is clear that the limit region of R s when b − a → 0 + identified above is larger than the R s when b − a is equal to a fixed positive number, and so c(s) > t α/2 ν for any s ∈ (a, b) when b − a is equal to a fixed positive number. These observations are also true for a general p ≥ 2.

Conservative Critical Value c 0
It may be time consuming to compute the exact critical constant c(k 0 ) for each k 0 ∈ [a, b], especially when p ≥ 3 and so simulation is involved. Next we provide a conservative critical constant c 0 so that c(k 0 ) ≤ c 0 for each k 0 ∈ [a, b]. Note that for any given where the first inequality in (13) follows directly from the fact that Pg(k 0 , x, p) is an element of R p for any x ∈ [a, b] \ k 0 , and the equality in (14) follows directly from the Cauchy-Schwarz inequality. By noting that T 2 /p ∼ F p,ν , the constant c 0 = √ pF (1 − α; p, ν) makes the probability in (14) equal to 1 − α. This and Equation (8) show that c 0 ≥ c(k 0 ) for any k 0 ∈ [a, b] and so is conservative.
This conservative critical constant c 0 can be used to quickly construct a confidence set C 0 (Y) of level at least 1 − α: More importantly, the computation burden of the exact confidence set C E (Y) can be reduced substantially by only checking the points in C 0 (Y) to see whether they belong to C E (Y). Rao (1973) Sinceθ ∼ N(θ, σ 2 P 2 ), it is well known that the 1 − α confidence ellipsoid for θ is given by

The Confidence Set of
Then, the 1 − α confidence set of Rao (1973) for a maximum point is given by It is clear that this confidence set is conservative.

Computation of the Confidence Sets
The conservative 1 − α confidence set C 0 (Y) in (15) is always constructed first whether or not the exact 1 − α confidence set C E (Y) in (6) is required. To construct C 0 (Y), we first compute c 0 and need to check each k 0 ∈ [a, b] to see whether G k 0 (T ) ≥ −c 0 is true, where G k 0 (·) is defined in (10) but with T = P −1θ /σ whereθ andσ are the estimates based on the observed data. Note that G k 0 (T ) can easily be computed by using a similar method for finding G k 0 (T i ) in (12) but with T i = T .
As the interval [a, b] contains infinitely many points, it is necessary to use a finite grid of points to replace the interval [a, b]. We have chosen a grid of points {s 1 , . . . , s J } on the interval [a, b] with resolution d, that is, {a = s 1 < s 2 < · · · < s J = b} with s i − s i−1 = d for all i = 2, . . . , J . We just check each point k 0 in the grid and the confidence set is approximated by the set of grid points k 0 that satisfy G k 0 (T ) ≥ −c 0 . If d is small, then this set gives an accurate approximation to C 0 (Y). Note that the computation of Rao's conservative confidence set C c (Y) also uses a finite grid of points to approximate the continuous confidence ellipsoid for θ 0 . Carter et al. (1984) suggested a finite grid of points by using the polar coordinates; see also Farebrother (1998, pp. 85-86).
To construct the exact confidence set C E (Y), we further check each grid point in C 0 (Y) determined above to see whether G k 0 (T ) ≥ −c(k 0 ) is true. The exact confidence set is approximated by the set of grid points in C 0 (Y) that satisfy G k 0 (T ) ≥ −c(k 0 ) and provides an accurate approximation to C E (Y) when d is small.
Note that C E (Y) may be the union of several disjoint intervals when f (x, θ ) has, for example, a "W" shape over x ∈ [a, b]. This indicates that, s i 0 ∈ C E (Y) and s i 0 +2 ∈ C E (Y) may not imply s i 0 +1 ∈ C E (Y) when s i 0 +1 is near a valley of the "W". Similarly, s i 0 / ∈ C E (Y) and s i 0 +2 / ∈ C E (Y) may not imply s i 0 +1 / ∈ C E (Y) when s i 0 +1 is near the middle peak of the "W".
We have written a computer program to compute and plot C 0 (Y), C E (Y), and C c (Y) with inputs α, p,θ, ,σ , ν, interval [a, b], resolution d, and number of simulations n T when p ≥ 3. Illustrative examples are given in Section 3.

EXAMPLES
Example 1. The data given in Table 1 are the transformed perinatal mortality rate (PMR) Y = log(−log(PMR)) and birth weight (BW) x = BW of white infants at 35 different levels of x studied by Selvin (1998), who fitted a 4th-order polynomial regression model between Y and x. The usual model diagnostic and the exploratory index R 2 = 0.994 suggest that a 4th order polynomial regression model provides a very good fit to the observed data. Based on the data, we havê θ = It is interesting to identify the BW level in the observed range [a, b] = [0.85, 4.25] that maximizes the response Y (i.e., minimizes the PMR). The method given in this article allows us to construct a confidence set for this optimal BW level. Using α = 0.05, resolution d = 0.01 and n T = 100000, the confidence sets are computed to be C E (Y) = [3.75, 4.21], C 0 (Y) = [3.72, 4.25] and C c (Y) = [3.72, 4.25], and indicated by the three line segments in Figure 1. It is clear that the width 0.46 of C E (Y) is much smaller than the width 0.53 of both C 0 (Y) and C c (Y). It is also interesting to observe that, In this example, n T = 100000 was used. To assess the accuracy of the critical constants c(k) computed, c(k) for several given k values were computed using different random seeds. For example, using six different random seeds, we computed c (4)  These indicate that the critical constants are accurate to the second decimal places. By using the standard deviation formula of a proportion, the standard deviation in estimating P{G k 0 (T ) ≥ −c(k 0 )} by using the simulation-based critical constant is √ α(1 − α)/(n T + 2) = 0.05 * 0.95/100002 = 0.00069. Based on these observations, the simulation-based critical constants can be regarded as exact for practical purpose. The computation time of C E (Y) on an ordinary Window's PC (Core(TM2)Due CPU P8400@2.26GHz) were 2314 sec.
Example 2. This example considers the survival data for the cyclophosphamide (CTX) experiment given in Carter, Wampler, and Stablein (1983). Female mice were randomly divided into a control group (16 mice) and six treatment groups (8 mice per group) at different levels of CTX. Carter, Wampler, and Stablein Table 1. Perinatal mortality data for white infants from Selvin (1998) x  (1983) were particularly interested in the probability of survival of at least 21 days. Table 2 summarizes the observations on 21-day survival from the experiment. The usual model fitting procedure suggests that the following logistic regression model of 21-day survival probability p on dose level x = CTX fits the observed data very well:  From these two examples and all the other datasets we have tried, the confidence set C E (Y) is always smaller than C 0 (Y) and C c (Y) for q = 1. Hence it is recommended to always use the confidence set C E (Y).

CONCLUSION AND DISCUSSION
Construction of a confidence set for a maximum point of a function is an important statistical problem with immediate applications in many real problems. In this article, we have focused on the univariate polynomial function. The only confidence set available in the statistical literature that guarantees 1 − α confidence level is Rao's C c (Y), which is conservative. In this article, one exact and one conservative confidence sets are provided. The new exact confidence set C E (Y) for q = 1 is always, and can be substantially, narrower and so better than the new conservative confidence set C 0 (Y) and Rao's conservative confidence set C c (Y). This is expected since the new exact confidence set C E (Y) is purpose-built while the new conservative confidence set C 0 (Y) uses a conservative critical constant and Rao's confidence set C c (Y) is a byproduct of the standard confidence ellipsoid for the coefficient vector θ.
It is shown how the confidence sets can readily be applied to the normal-error univariate polynomial regression models and fixed or random effects linear or generalized linear models which involve a univariate polynomial function. Indeed, they can also be applied to the semiparametric proportional hazard and quantile regression models. Furthermore, they can be used in situations where a univariate polynomial function is only part of the whole regression function; see examples in Fahrmeir et al. (2013, pages 35-38 and page 538). Hence, the proposed confidence sets are widely applicable.
The approach in Section 2 can be applied to a univariate function other than a polynomial function, though the method of computing the infimum in Equation (8) need to be adapted to the specific form of the function.
Note that the approach in Section 2 up to and including expression (8) can be generalized to deal with the more general function l(x, θ 0 , θ ) in (1) for x in a given q-dimensional region. However, this general situation involves the minimization of a q-variate function rather than a univariate function in (8). This complicates the required computation by orders of magnitude. In the future, we hope to investigate some simple function l(x, θ 0 , θ ), such as the popular quadratic response surface model.

SUPPLEMENTARY MATERIALS
Appendix: "The Computation of Exact Critical Value c(k 0 ) for a Quadratic Polynomial Function in Section 2" is available online. (pdf file) [Received March 2014. Revised July 2014