Robust nonparametric frontier estimation in two steps

Abstract We propose a robust methodology for estimating production frontiers with multi-dimensional input via a two-step nonparametric regression, in which we estimate the level and shape of the frontier before shifting it to an appropriate position. Our main contribution is to derive a novel frontier estimation method under a variety of flexible models which is robust to the presence of outliers and possesses some inherent advantages over traditional frontier estimators. Our approach may be viewed as a simplification, yet a generalization, of those proposed by Martins-Filho and coauthors, who estimate frontier surfaces in three steps. In particular, outliers, as well as commonly seen shape constraints of the frontier surfaces, such as concavity and monotonicity, can be straightforwardly handled by our estimation procedure. We show consistency and asymptotic distributional theory of our resulting estimators under standard assumptions in the multi-dimensional input setting. The competitive finite-sample performances of our estimators are highlighted in both simulation studies and empirical data analysis.


Introduction
Estimation of production frontiers, and therefore efficiency, has motivated a wide and growing literature during the last decades. Mathematically, the problem can be stated as follows. Let x ∈ R p + be some inputs (represented in row vector form) used to produce output y ∈ R + . A production set is defined as = {(x, y) ∈ R p+1 + | x can produce y}, whereas the production frontier associated with is defined as ρ(x) = sup{y ∈ R + | (x, y) ∈ } for all x ∈ R p + . For any given (x 0 , y 0 ) ∈ , the efficiency is measured by the ratio between y 0 and ρ(x 0 ). Our aim is to obtain, from a given random sample {(X i , Y i ), i = 1., n}, nonparametric frontier estimators which can readily deal with multiple inputs and are reasonably robust to the presence of outliers. By referring to the presence of outliers, we mean the potential existence of a few observations which lie outside .
In this article, we restrict ourselves to the deterministic approach for the frontier estimation problem. This approach relies on the assumption that all observations, with perhaps the exception of a few outliers, lie in the production set. Two popular methods in the literature of deterministic frontier estimation are the Free Disposal Hull (FDH) estimator introduced by Deprins et al. (1984) and Data Envelopment Analysis (DEA) represented by Charnes et al. (1978). These methodologies are applied in many subsequent pieces of work, such as Seifford (1996), Daraio and Simar (2007), Simar and Wilson (2013), and Kneip et al. (2015). See also Badunenko et al. (2012). Outlier detection and treatment techniques under these approaches can be found in Simar (2003), Johnson and McGinnis (2008), and Khezrimotlagha et al. (2008), among others. An alternative modeling approach is the so-called Stochastic Frontier Analysis (SFA), introduced by Aigner et al. (1977) and Meeusen and van Den Broeck (1977). For a comprehensive overview, see Parmeter and Kumbhakar (2014), for example. Papadopoulos and Parmeter (2022) extensively describe quantile methods for SFA robust analyses. See also Parmeter and Racine (2013) where additional constraints are imposed.
Moving away from FDH and DEA in the context of deterministic frontier estimation, Martins-Filho and Yao (2007) proposed a deterministic production frontier model and a nonparametric production frontier estimator called NP3S 1 . They assumed the efficiency score across different observations to have constant mean and variance, and proposed an estimation procedure that consists of three steps. The first step estimates a conditional mean of the output with respect to the inputs using the local linear kernel method. The second step follows Fan and Yao (1998) and again uses the local linear kernel approach to estimate the conditional variance of the output with respect to the inputs. The third and final step gives an original estimator to their proposed production frontier model.
A possible drawback in the second step of NP3S estimator is that it allows for a negative estimate of the variance. To overcome this problem, Martins-Filho et al. (2013) propose to use the local exponential kernel estimator to estimate conditional volatility functions, ensuring its nonnegativity. This estimator uses an exponential functional at the minimization problem that characterizes kernel regressions for estimating nonnegative conditional variance (see Ziegelmann, 2002). We call this frontier estimator NPE, standing for NonParametric Exponential.
Considering NP3S and NPE, we believe some improvements are desirable, as summarized below: (i) These estimators are characterized by an estimation procedure in three steps. The first two steps capture the shape of the frontier and the third step is responsible for locating the estimated frontier.
It is important to emphasize that the second step of NP3S and NPE is based on a regression that has as regressand squared residuals from the first step. This feature is sometimes undesirable, since multiple choices of tuning parameters are needed. (ii) Two constant conditional moment conditions on the random variable that represents efficiency are required, which might potentially be viewed as restrictive. (iii) Their methods are solely based on the local linear kernel estimators, making it not straightforward to incorporate some commonly seen or well-accepted shape constraints of the frontiers. (iv) The estimation accuracy of the frontier using these procedures can be severely affected by the existence of a few outliers.
In this work, to address the first two issues, we eliminate the second step of NP3S and NPE estimators, thus estimating the frontier in just two steps. In doing so, we also manage to relax the condition of requiring constant mean and variance for the efficiency scores in Martins-Filho and Yao (2007) to just requiring a constant mean. Not only is this of theoretical interest, we also believe that this relaxation is of practical relevance. For instance, it is plausible that the efficiency scores of larger companies tend to be more concentrated as compared to those of the smaller ones, as in many industries, the spectrum of the larger firms often tends to be more homogeneous than that of the smaller ones.
In addition, to address the third issue, we advocate that our estimation framework go beyond the use of the local linear kernel method of Fan (1992). Constrained regression methods, such as those enforcing concavity, monotonicity, and/or additivity (c.f., Groeneboom et al., 2001b;Mammen andYu, 2007 Chen andSamworth, 2016, among others) can be used. As most of the frontiers do follow certain shapes, imposing shape constraints is natural and usually helps improving the interpretability of the estimator. Besides, in comparison to kernel-based estimators, shape-constrained approach does not rely on, or is sensitive to, the careful choice of tuning parameters such as the bandwidth and seems to enjoy better finite-sample performance in many different settings.
Finally, to handle with the fourth issue, we extend this framework to allow the presence of outliers using a quantile-based approach in the second step. These outliers, for one reason or another, should perhaps not have been included in the production set for the analysis. This could arise when there are a few firms behaving significantly different from others in their cohort (so estimating the production set for this analysis based on these observations could be problematic 2 ), or very rarely there are perhaps typos in the data recording process. In the asymptotic regime, it also implies that there is a vanishing proportion of outliers, because otherwise identifiability of the frontier could potentially become an issue. Here we propose to invoke the (100α n )%-quantile to estimate the frontier, where α n ∈ (0, 1) is a number chosen close to one to guarantee there is no actual effect on extra shifting the estimated frontier curve upwards or downwards. A more-involved iterative procedure, that could further improve its finitesample performance, is also outlined.
To summarize this manuscript's main contribution to the literature, our proposed robust two-step approach (which will be, for simplicity, called NP2S, standing for Robust NonParametric estimation in 2Steps) could easily incorporate nonparametric estimators other than local linear to enforce certain shape-constraints, and allows for the presence of outliers. In addition, we also comprehensively investigate the theoretical properties of our approach beyond the one-dimensional setting (i.e., p > 1), which is the main focus of Martins-Filho and Yao (2007) and compare the numerical performance with existing benchmarks such as those by Fang et al. (2022).
We note that similar ideas based on the two-step estimation have also been explored independently by Wang et al. (2020) in the context of additive models using splines, and by Fang et al. (2022) using quantile regression (their estimator is implemented and used for comparison purposes in our numerical experiments). In comparison to their work, we believe that our work contributes and extends to the literature in the following ways. First, on the theoretical front, we provide both the asymptotic distributional theory and the results regarding the minimaxity of the problem, and make more thorough comparisons with the original NP3S. Second, we investigate models that are not necessarily additive, being more flexible in terms of the shape constraints we incorporate. Third, in terms of methodology, we also establish that robust frontier estimation can be achieved via locating the (100α n )%-quantile frontier with α n → 1 as n → ∞. Finally, we also provide theory of our proposed robust procedure in the presence of outliers.
The remaining of this article is composed as follows. In Section 2, we briefly review the model originally proposed by Martins-Filho and Yao (2007) and present our new estimation process. Section 3 discusses the asymptotic properties of our estimators, as well as the minimax convergence rates. A Monte Carlo study comparing different estimators and an empirical example are presented in Section 4, which is followed by the conclusion and final comments in Section 5. All the proofs are deferred to the supplementary materials.

The model
We start by presenting the model setup developed by Martins-Filho and Yao (2007). Our interest lies in estimating the frontier from a set of observed firms, i.e., given a random sample of production units that share a technology , obtaining estimates of the frontier ρ. By extension, we are also interested in constructing efficiency ranks and relative performance of production units. To see this, let (X i , Y i ) ∈ characterize the performance of a production unit and define 0 ≤ R i ≡ Y i /ρ(X i ) ≤ 1 to be this unit's (inverse) Farrell output efficiency measure. R i can then be estimated based on the estimates of ρ. Our frontier regression model consists of a multiplicative regression. Primitive assumptions take place on (X i , R i ) and the properties of Y i arise from a suitable regression function. It is typically assumed that {(X, R), (X 1 , R 1 ), (X 2 , R 2 ), . . .} is a sequence of (p + 1)−dimensional independent and identically distributed random vectors with a common density. Y i then follows where R i is an unobserved random variable and X i is an observed random vector in R p + . In this context, Y i is the output, ρ is the production frontier, X i are the inputs and R i is the efficiency with values in [0, 1]. The closer R i is to 1, the closer are the observed output and the frontier. On the contrary, Y i being far from ρ(X i ) implies low efficiency and a small value for R i . There is no specification of the density of R i , however the following moment restriction is assumed for all x: In addition, we write . Note that the facts that R i ∈ [0, 1] and 0 < μ R < 1 imply by construction that 0 < σ 2 R (x) < μ R < 1, as the variance is majorized by that of a Bernoulli random variable with the probability of success being μ R .
The unknown quantity μ R and the function σ R locate the production frontier. Furthermore, we note that the NP3S estimator requires the additional constant second moment assumption of Var(

Frontier estimation without outliers
To characterize our estimating procedure, we first rewrite Eq. (2.1) as Hence, . Given the conditional moment restriction (2.2) on R i we have that E( i |X i ) = 0. In addition, by definition, Var( i |X i ) = 1. As such, The main idea is based on the observation that m(x) ≡ μ R ρ(x). Therefore, estimating m leads tô m(x) = μ Rρ (x), since μ R does not depend on x. We thus get fromm an estimator of ρ, but perhaps with a wrong multiplicative constant. Then, if we have an estimator for μ R , we can propose to estimate the frontier asρ(x) =m(x)/μ R . With this in mind, we propose to estimate ρ in two simple steps.
In the first step, we propose to estimate μ R bŷ wherem is a pilot estimator of m. We will specify and discuss the choice ofm in Section 2.4. Intuitively, this estimator works because it is generally assumed that there exists at least one observed production unit in the domain that is efficient, or at least close enough to being efficient.
The second step involves a multivariate nonparametric estimator with regressand Y i and regressor vector X i for i = 1, . . . , n. Any reasonable nonparametric methods could be used here. We mention some examples in the next subsection and study their asymptotic properties in Section 3. In particular, apart from the local linear kernel method of Fan (1992), we are also interested in incorporating the commonly seen shape constraints of the frontier surfaces, such as concavity and monotonicity into our estimation procedure. Therefore, recent developed nonparametric shape-constrained methods are also considered in this context. Importantly, these methods are typically free of tuning parameters. Here we denote the resulting estimator bym.
After these two steps, the proposed estimator for the frontier at x ∈ R p is given byρ(x) =m(x)/μ R . Furthermore, when it becomes evident that the function σ is not constant, then one could consider including an additional step when observations are weighted based on the estimated heteroskedasticity from the previous iteration, which might further improve the efficiency and utilize the additional information from the additive error term.

Robust frontier estimation (in the presence of outliers)
In the presence of outliers in the data, we propose modify the previous estimation procedure as follows.
In the first step, we estimate μ R by the sample α n -quantile of the ratios Y i /m(x i ), i = 1, . . . , n and denote this estimator byμ R . Typically we would choose α n to be close to 1 (theory dictates that α n → 1 as n → ∞), and shall discuss its choice together with the choice of other tuning parameters in Section 2.4. In addition, the second step remains unchanged and the frontier can be estimated by our robust frontier estimator asρ R (x) =m(x)/μ R , as before. Finally, we remark that an optional re-estimation procedure could be performed where we first eliminate all the observations that lie above the estimated frontier (i.e. those with Y i >ρ R (X i )) from the data and then repeat the previous estimation steps again with α n = 1.
Note that this new procedure works even under the scenario where there is no outlier. Some theoretical results regarding this will be presented in Section 3.

Local polynomial kernel (NP2S-LP)
We use the multivariate local polynomial kernel to estimate m nonparametrically. Suppose that the polynomial degree is q ∈ N. Borrowing notation from Masry (1996), for any x = (x 1 , . . . , x p ) and k = (k 1 , . . . , k p ) ∈ N p , let |k| = p j=1 k j and x k = x k 1 1 × · · · × x k p p . Write h n ∈ R p + as the bandwidth vector. For notational convenience, we will assume h n = (h n , . . . , h n ) in the remaining of our article. Now for any When q = 1, the above definition becomes the local linear kernel, wherem(x) ≡m(x; h n ) ≡α for any x with We call the corresponding frontier estimator NP2S-LL.

Concave regression (NP2S-CR)
If it is known a priori that the frontier is concave (which is common due to the law of diminishing returns), then we could estimate m using the least squares concave regression (Lim and Glynn, 2012;Seijo and Sen, 2011). More specifically, we let where F Conc = {f : R p → R|f is concave}. Here we use "∈" because the minimizer is typically not unique (but its values are uniquely determined over {X i } n i=1 ). Alternatively, we could restate the problem as the following Quadratic Program (QP): where ·, · denotes the standard inner product. We then takê

(Univariate) S-shaped regression (NP2S-SS)
For p = 1, we now define the class of S-shaped functions as follows: We remark that S-shape could be useful and preferred for the modeling of production when firms experience increasing returns to scale followed by decreasing returns to scale along their expansion paths. In addition, note that in the above definition we allow b, the inflection point of f , to be taken as ±∞ so that this class contains and generalizes both the class of (increasing) convex and concave functions (i.e., when b = ∞ and b = −∞, respectively).
Suppose that m ∈ F S (with unknown location of the inflection point), then we could estimate it using the least squares shape-constrained regression estimator See Feng et al. (2022a) for a detailed description of this estimator and its theoretical properties in depth, as well as Feng et al. (2022b) for its computation software.

Additive isotone regression (NP2S-AI)
One way to alleviate the curse of dimensionality in the multivariate nonparametric estimation (i.e., where p > 1) is to impose additivity on m. See, for instance, Stone (1986). Here a function f : . . , f p are univariate functions. For the purpose of identifiability solely, it is also assumed that f j (x j )ν(dx) = 0 for j = 1, . . . , p with some absolutely continuous measure ν. In the context of frontier estimation, typically we could assume that f 1 , . . . , f p are monotonically increasing. Therefore, we could estimate m via the least squares additive isotone regression estimator Mammen and Yu (2007), defined asm However, as pointed out by Fang et al. (2022), the additivity constraint could be inappropriate if there are interactions between inputs, or if one wants to have ∂f /∂x j to depend on all the components of x rather than a single input. If that is the case, one could also consider applying variable transformation as an attempt to restore the additivity, e.g., log-transform for the output in the Cobb-Douglas model. See also Sun et al. (2011) who address the resulting biases caused by approximating log production.
Finally, we remark that if one has reasons to believe that each of f 1 , . . . , f p follow different shape constraints (e.g., some of the f j 's are increasing while the others are concave or even S-shaped), we could replace F AddIn in the above by the appropriate alternatives, and use the approach of Chen and Samworth (2016) to estimate m.

The pilot estimator
We now specify our choice ofm for different estimators.
• For NP2S-LL, NP2S-CR and NP2S-SS, we could takem as the local polynomial kernel estimator with q = 2 or q = 3 and the corresponding bandwidth vector g n = (g n , . . . , g n ) ∈ R p + . The theoretical requirements for the kernel and the bandwidths are discussed in Section 3. • For NP2S-AI, to take into account the additive structure, we could takem as the estimator proposed by Horowitz and Mammen (2004).
We remark that our choices above are largely motivated by the convenience of theoretical development (and especially the asymptotic distributional theory) presented in Section 3. In practice, we would recommend takingm =m instead as a much more universal choice, which offers similar or sometimes better numerical performance as we experienced in our simulation experiments. We then let for the standard version of our estimator anď for the robust version, where Q α n (z 1 , . . . , z n ) returns the α n -quantile of given samples {z 1 , . . . , z n }, and where S can be taken as any reasonably-sized compact set that is contained in the interior of the support of X for NP2S-CR and NP2S-AI, or R p for NP2S-LP and R for NP2S-SS. Indeed, in practice, one might simply prefer to use a universally trimmed set S on which to compute the aforementioned quantities, in order to account for potentially variable or sub-optimal performance of certain nonparametric estimators, especially close to the boundaries.

α n for robust frontier estimation
Here we outline several different strategies for the choice of α n .
• From a theoretical perspective, we could take some α n satisfying 1 − α n = o(n −4/(p+4) ). See Section 3 for more details. • If we have prior knowledge on the number of outliers (i.e., those with R i > 1) in the dataset, denoted by n o , then we could simply use α n = 1 − n o /n. • A more practical approach is to inspect the estimated density function of the (scaled) efficiency scoresĝ n based on {Y i /m(X i )} using the kernel density estimator, then find the lowest r * n wherê g n (r) remains small (say, of O(n −1/2 )) for all r > r * n , either visually or numerically, and finally take α n = |{i : Y i /m(X i ) ≤ r * n }|/n. Here | · | gives the cardinality of a set. • The approach of Fang et al. (2022) could also be used, where they identify as outliers those observations with estimated efficiency scores that fall out of the adjusted inter-quartile range, AIQR, which is given by AIQR = [Q 0.25 − 1.5IQR, Q 0.75 + 1.5IQR], where IQR = Q 0.75 − Q 0.25 , with Q 0.25 and Q 0.75 being, respectively, the first and third quartile.

General assumptions
To discuss the asymptotic properties of the proposed estimators, here we list our assumptions in scenarios both with and without outliers.

Without the presence of outliers
is an independent and identically distributed (i.i.d.) sequence with each distributed as (X, R) with density f . f X (x) and f R (r) denote the common marginal densities of X and R, respectively, and f R|X (r; x) denotes the common conditional density of R given X. Finally, denote F R as the cumulative distribution function corresponding to f R .

Assumption A4
1. For every x ∈ , E(R|X = x) = μ R > 0. 2. σ 2 R (x) = Var(R|X = x), which is continuous with respect to every x ∈ . In addition, inf x∈ σ 2 R (x) > 0. Assumptions A1, A3, and A4 imply that {(X i , Y i )} n i=1 forms an independent and identically distributed (i.i.d.) sequence of random variables with some joint density. Assumption A2.2 essentially requires that the density of (R|X) does not decay to 0 too slowly at its right boundary. We remark that it is sufficient but not necessary, and is stated in such a way for easy interpretation. This particular assumption implies that 1 − (max i=1,...,n R i ) −1 = O p (n −1/2 ) and 1 − max i=1,...,n R i = O p (n −1/2 ), which are the actual equations we use in the proofs, and in general cannot be weaken further without affecting the speed at which information about the frontier can be gathered. This assumption is crucial because it also makes sure that the estimation error inμ R does not contribute to the asymptotic distribution of the frontier estimator, thus simplifies our analysis.
Comparing with Martins-Filho and Yao (2007) and Martins-Filho et al. (2013), we do not have to assume a constant σ 2 R (x). Furthermore, we avoid dealing with regressands that are themselves residuals from a first stage nonparametric regression, due to the elimination of the second step in their estimation procedure. Consequently, asymptotic properties are easier to obtain. In addition, our assumptions here are weaker than requiring X and R to be independent.
Next, we impose a smoothness condition on the frontier function ρ. We say that a function ρ : → R (with ⊂ R p ) has smoothness index s and constant L if ρ is s times differentiable and all its partial derivatives of order π with |π| 1 = s satisfy |ρ (π )
In essence, this assumption requires ρ to be at least twice-differentiable. Although here we focus on s > 2 in the development of our theory, many of our results also have analogous versions for other restrictions on s.
For the pilot estimator for NP2S-LP, NP2S-SS, and NP2S-CR using local polynomial, we make additional requirements on the kernel and its bandwidth vector.
Assumption A6 a 1. K(x) : R p → R is a symmetric density function with bounded support S p ⊂ R p satisfying: (a) x xK(x)dx = I p , where I p is the p × p identity matrix (N.B. recall that here x is represented in the row vector form).
Note that symmetry of the kernel also implies that S p is symmetric, and xK(x)dx = 0.
For the pilot estimator for NP2S-AI using the method of Horowitz and Mammen (2004), we require the assumptions therein, i.e.,

In the presence of outliers
In the presence of outliers, Assumption A3 shall be replaced by the following, where | · | gives the cardinal number (or size) of a set.
Here we assume that the number of outliers can grow with the number of observations, n, at a certain rate to be specified later. This includes the situation where the number of outliers is zero or at a fixed number (which also covers Assumption A3). In addition, we assume that the implied efficiency scores of all the outliers, which could be either random or deterministic, are strictly bounded away from and above 1, with their output values bounded from above. These assumptions could be relaxed but would result in more complex theoretical statements, which we shall not pursue in this article.

Asymptotic characterization
Now we are in the position to establish the uniform consistency and asymptotic distribution of the frontier estimator and robust frontier estimator using different estimating procedures. 3.2.1.1. The case of q = 1. First, we study the behavior of the local linear kernel estimator (i.e., the degree of polynomials, q = 1). For the frontier estimation, we have the following result:

Local polynomial kernel
Theorem 3.1. Suppose that Assumptions A1 -A5 and A6 a hold. Let q = 1 and take h n = c h n −1/(p+4) for some c h > 0. Then, as n → ∞. In addition, for every x ∈ int( ) (i.e., the interior of ), For the robust frontier estimator, a similar result is given as follows: Corollary 3.1. Under the assumptions and conditions stated in Theorem 3.1 but replacing Assumption A3 by A3 * . In addition, let κ ∈ (0, 2/(p + 4)) and take α n such that 1 − α n = o(n −4/(p+4) ) which also satisfies

Minimax lower bounds.
In this part, we investigate the difficulty of the frontier estimation problem (without outliers) from a minimax perspective. Let be the class of models satisfying Assumptions A1-A5 with a common (i.e., the support of X) and universal constants (e.g., ρ's smoothness index s and the corresponding constant L). For any ψ ∈ , let ρ ψ be the corresponding frontier. Then we have the following result.

Theorem 3.2. Under Assumptions A1-A5, for any
for some C > 0 and for any sufficiently large n.
Since we only restrict ourselves to s > 2 in our assumptions, the above rate for the mean-squared loss lower bounds is essentially n −4/(4+p) in the worst case (i.e., by taking s close to 2 and dropping the logarithm factor). As such, our proposed frontier using NP2S-LL can be viewed as rate-optimal estimators from a minimax perspective.

The case of q = 2, 3 and beyond.
Note that there is a bias term in Theorem 3.1 and Corollary 3.1, unless ρ(x) = 0. This bias term could be eliminated by using local polynomial of degree q ≥ 2 and choosing a bandwidth of smaller order. In the following, we first focus on the case of q = 2, 3, and then explain why using q ≥ 4 provides no further improvement for s ∈ (2, 3).
Borrowing additional notation from Masry (1996), we let N i = i+p−1 p−1 be the number of distinct ptuples k with |k| = i. We arrange these p-tuples (i.e., N i of them in total) as a sequence in a lexicographical order (e.g., with (i, 0, . . . , 0) being the first item and (0, . . . , 0, i) being the last item) and let g −1 i denote this one-to-one (i.e., tuple to index) map. In addition, let where M i,j and i,j are N i × N j dimensional matrices whose (l, m) element are S p u {g i (l)+g j (m)} K(u)du and S p u {g i (l)+g j (m)} K 2 (u)du, respectively.
As a special case, when p = 1, let φ i,j = u i K j (u)du, then we have that Picking higher-order polynomials (i.e., q > 3) will increase the value of M −1 M −1 (1,1) , and thus the mean squared estimation error, if we keep everything else unchanged. See also Fan and Gijbels (1996). Consequently, here we chose not to further pursue the use of higher-order polynomials. Finally, we note that with s ∈ (2, 3) by picking the bandwidth at an appropriate order, the mean squared error of NP2S-LP (with q ≥ 1), , would converge at a rate close to n −s/(2s+p) , which appears faster than that of NP2S-LL. However, this would require additional precise knowledge about the smoothness index s, which could be challenging in practice, even with univariate input. As such, we would suggest using NP2S-LL if the number of observations is moderate.

A comparison between NP2S-LL and NP3S. Now we compare the asymptotic biases and
variances of the estimators NP2S-LL and NP3S for the frontier estimation. Here we focus on the case of p = 1 without any outliers, because theoretical results from Martins-Filho and Yao (2007) are stated only for the univariate input with no outliers. Additionally, we assume that Var(R i |X i = x) ≡ σ 2 R (as required by NP3S).
First, we compare the ratio between two asymptotic variances (AVARs). Using Theorem 2 of Martins-Filho and Yao (2007) and the results presented in Eq. (3.4), we have that for the same bandwidth h n , It is clear that the ratio above becomes smaller as μ R increases, σ R decreases, or the kurtosis of R i increases. As an example, suppose that (R i |X i ) follows a symmetric Beta(α, α) distribution for some α > 0. Then after some calculation we could derive that AVAR NP2S−LL AVAR NP3S = 2α + 3 α(2α + 1) , i.e., this ratio is smaller than 1 if α > 3/2. Second, we take into account the bias term and compare the ratio between the asymptotic mean squared errors (AMSEs) of two estimators at x. Here different oracle bandwidths apply to NP2S-LL and NP3S. If these bandwidths are chosen in the optimal manner (i.e., minimizing the corresponding AMSE), then it can be shown that Hence a combination of larger μ R , smaller σ R and positive curvature of the frontier ρ at x tend to lead to better performance of NP2S-LL as compared to that of NP3S. Once again, we note that picking the oracle bandwidths could be quite challenging in practice, especially for NP3S, which has a more complicated estimation procedure as compared with NP2S-LL. Interestingly, our experience from numerical experiments suggests that NP2S-LL could offer better finite-sample performance than NP3S even in the settings where the ratio displayed in (3.5) would have indicated otherwise.

Concave regression
We now investigate the asymptotic properties of frontier estimation using NP2S-CR. Here we focus on the case of p = 1, in which we derive the asymptotic distributional theory largely based on Groeneboom et al. (2001b) and Ghosal and Sen (2017). We then study the more general case of p > 1 and briefly discuss the estimation consistency. First, we introduce the "invelope" process studied in Groeneboom et al. (2001aGroeneboom et al. ( , 2001b, which is closely related to the integrated Brownian motion. It will later appear in the pointwise limiting distribution ofρ. Let X(t) = W(t) + 4t 3 , where W(t) is a standard two-sided Brownian motion starting from 0 (i.e., W(0) = 0), and let Y be the integral of X satisfying Y(0) = 0. Then according to Groeneboom et al. (2001a), there exists an almost surely uniquely defined random continuous function H : R → R satisfying the following: (I) H(t) ≥ Y(t) for every t ∈ R; (II) H has a convex second derivative, and with probability 1, is three times differentiable at t = 0; Here H (j) represents the j-th derivative of H. For this part only, to simplify our analysis (mainly in the characterization of asymptotic distribution), we shall use the following random design (in additional to Assumption A1), where we assume that the input random variable X is drawn from a uniform distribution over [a, b].
Assumption A1 + . p = 1 and X is drawn from a uniform distribution over [a, b].
Theorem 3.4. Suppose Assumptions A1 + , A1 -A5 and A6 a hold. In addition, assume that ρ is concave with sup x∈ [a,b] ρ (2) (x) < 0. Then, for any x ∈ (a, b), we have that as n → ∞, where H is the invelope process defined as before.
If p > 1, we could still show thatρ andρ R are consistent under their respective settings.
Theorem 3.5. Suppose that Assumptions A1-A5, A6 a hold and ρ is concave. Then for any x ∈ int( ), In fact, for the purpose of establishing consistency, we are able to relax Assumption A5 to only requiring inf x∈ ρ(x) > 0 (i.e., dropping the smoothness condition on ρ). On the other hand, deriving the convergence rate and pointwise asymptotic distribution for p > 1 is beyond the scope of this manuscript. To our best knowledge, it is still an actively-researched open problem in the regression setting (even with i.i.d. Gaussian noise). However, we would also like to point the readers to Han and Wellner (2016) who show that a variant of the least square convex regression estimator is sub-optimal in terms of the global convergence rate when p ≥ 4.

(Univariate) S-shaped regression (NP2S-SS)
The asymptotic results based on the (univariate) S-shaped regression could also be derived, which turn out to be very similar to those presented in Theorem 3.4 and Corollary 3.3 (except for the frontier at precisely the inflection point). For completeness, we state the results below.
As a final remark for this part, we note that to our knowledge the asymptotic distribution at the inflection point remains an open problem in the literature, even in the fixed design regression setting.

Additive isotone regression
We now give the asymptotic properties of frontier and robust frontier estimators using NP2S-AI.
Some additional notation is required to handle the case of p > 1. Let X = (X 1 , . . . , X p ). We denote f X j as the marginal density function of X j for j = 1, . . . , p. In addition, the conditional variance of the unstandardized error (i.e., in Eq. (2.3)) given X j = u, Var{ρ(X)R|X j = u}, is denoted by σ 2 j (u).
Theorem 3.7. Suppose that Assumptions A1-A5, A7 and A6 b hold. Then, for any x ∈ int( ), ∼ G, where G is the distribution of the slope of the greatest convex minorant of W(t) + t 2 at t = 0, with W being a two-sided standard Brownian motion.
Here as a special case, when p = 1, the result from Theorem 3.7 can be simplified to as n → ∞, where σ and ρ are defined as previously, and where G is still the distribution of the slope of the greatest convex minorant of W(t) + t 2 at t = 0, with W being a two-sided standard Brownian motion. Note that in Theorem 3.7, Assumption 5 can be weakened to only requiring the the true frontier to have smoothness index of s = 1. Consequently, when p = 1, the convergence rate of n −1/3 stated here is slower than n −2/5 given in Theorem 3.1, as less restrictive smoothness assumption is required using tools based on isotone regression. In fact, the canonical (additive) isotone regression estimator would appear piecewise constant, and thus discontinuous; in practice, if additional smoothness assumption is warranted, one could apply post-smoothing methods, such as Mammen (1991), to further improve its finite-sample performance.
It is also worth noting that even with multiple inputs, both |ρ(x) − ρ(x)| and |ρ R (x) − ρ(x)| are of O p (n −1/3 ), which is free of p, the dimension of X. In other words, the assumption of additivity allows us to effectively circumvent the curse of dimensionality. See also Stone (1986).

Settings
We conduct a simulation study to evaluate the finite-sample properties of the estimators discussed in Sections 2 and 3. We consider the following data generating processes (DGPs) with either univariate or bivariate inputs: Beta(a, b). Note that here ρ is concave and increasing.
(ii) p = 2. Random samples are generated from Y i = ρ(X i1 , X i2 )R i for i = 1, . . . , n with 0,1] and independent of R i ∼ Beta(a, b). Here ρ is additive with concave and increasing individual components.  In the panels on the left, the existence of outliers has not been taken into account in the (standard) estimation procedure, while in the panels on the right, the robust estimators are used.
(iii) p = 2. As in the second DGP, random samples are generated from Y i = ρ(X i1 , X i2 )R i for i = 1, . . . , n. Nevertheless, here the frontier is nonadditive, assuming the following Cobb-Douglas-type frontier function Two sample sizes n = 200, 400 are used as well as two different choices of the pair (a, b) for the Beta distribution: (2, 2) and (4, 2).
To examine the performance of our robust estimators, we also consider the same settings with outliers for DGPs (i) and (ii), where we randomly select 1% of the previously generated observations and increase their corresponding efficiency scores to 1.5.
Each experiment involves 500 Monte Carlo replicates. Here the variance of R is taken as constant (by assuming R and X to be independent) in order to facilitate fair comparison between different estimators.

Estimators
We compare our proposal to NP3S (Nonparametric in 3 Steps by Martins-Filho and Yao (2007)) and NPE (Nonparametric Exponential by Martins-Filho et al. 2013) estimators, as well as to the three estimators by Fang et al. (2022): UQS (Unconstrained Quantile Spline), MCQS (Monotone Constrained Quantile Spline), and MCCQS (Monotone and Concave Constrained Quantile Spline). Hereafter, we use the following abbreviations for the proposed variations of our 2-step estimators: • LL for NP2S with local linear estimator; • AI for NP2S with additive increasing regression; • ACR for NP2S with additive concave regression; • CNA for NP2S with concave nonadditive regression with multi-dimensional input.
When p = 1, in NP2S-LL, we use the bandwidth with value h n = c h n −1/5 for c h = 1.5 Var({X i } n i=1 ). We have also experimented other selection rules such as those in Ruppert et al. (1995) and Fan and Gijbels (1995), but do not see much improvement. In addition, as advocated in Section 2.4, we bypass the pilot estimator in our simulation and takeμ R = max {i: X i ∈S} Y i /m(X i ) −1 with S = R + for LL, Figure 5. Milk producers dataset. Estimated additive components of the frontier using the ACR model for three different α n -levels, 100%, 99% and 98%. Note that the above plots do not reflect the actual levels of the estimated frontiers because the constants of the additive functions are not included here.
is the sample α-quantile based on {z 1 , . . . , z n }. When p = 2, similar implementation for the bandwidth of LL is used, and we take S = R 2 + for LL and for the other approaches. Finally, in terms of the robust version of our procedure, for sake of comparison, in our numerical experiments, we pick α n using the outlier detection scheme proposed by Fang et al. (2022), where they identify as outliers those observations of Y i /m(X i ) that fall out of the adjusted inter-quartile range, AIQR, which is given by AIQR = [Q 0.25 − 1.5IQR, Q 0.75 + 1.5IQR]. Here IQR = Q 0.75 − Q 0.25 and Q 0.25 and Q 0.75 are, respectively, the first and third quartile of the estimated efficiency scores.
In Fig. 1 we illustrate that, regardless the sample size (200 or 400) and the dimension (univariate or bivariate), the errors keep the same relative structures among the different estimators. Hence, for brevity, in the next figure we only report results based on the sample size 200. We can also see that, for these concave and additive concave frontier shapes, our proposal, under its several variants, produces better results when compared to the five benchmark approaches. The ACR has the best overall performance. Figure 2 reports results when outliers are intentionally included in the data, either for the univariate or bivariate case (DGPs (i) and (ii)). We use this plot to show the effect that outliers can cause on the frontier estimators. Going from the left panel plots, where nonrobust estimators are used, to the right panel plots, where the robust versions are implemented, we see a very sharp reduction in the errors, making it clear how relevant the robust versions are. Furthermore, we can noticeably see that all our estimators present homogeneously superior or at least similar performance compared to the benchmarks.
On Fig. 3, we explore what the effect of outliers and the effect of increasing the number of outliers (along with the sample size) are on the frontier estimators. For the univariate case, in the top two rows of Fig. 3, our proposals are superior, especially LL and ACR. Panels in the bottom two rows of Fig. 3 show a bit more balanced situation, although our proposals still perform slightly better than, or at least comparable to the benchmarks. Finally, Fig. 4 shows the results when a Cobb-Douglas function is used as the DGP, that is, our DGP (iii). We only illustrate the nonoutliers case. Here we see that, in terms of the median of the errors, CNA performs best, followed by NP3S, NPE, and LL, which present similar overall results. These findings are in line with our expectation, as in this case, the true frontier is concave but not additive, so the model is misspecified for ACR and those by Fang et al. (2022). The better performance from CNA confirms the usefulness of shape-constrained approach over kernel-based methods when the model is correctly or nearly-correctly specified.

An empirical application
In our empirical application, we use the Milk producers dataset Bogetoft and Otto (2020), available in the R package Benchmarking with DEA and SFA. The dataset is composed by n = 108 observations, and here we estimate the production frontier using output of milk per cow as the response variable, and energy expenses per cow and veterinary expenses per cow as our two covariates. Here we use the additive concave (i.e., ACR) model for our estimation. The additive structure facilitates interpretation, while the (increasing and) concave constraint is used to reflect the plausible diseconomies of scale in milk production. Figure 5 shows the two estimated additive components using the ACR model for three different α nlevels, 1, 0.99, and 0.98. Here the concave shapes of the curves are reasonably robust against different choices of α n -levels for the individual estimated components. That is, assuming having 0%, 1%, or 2% of the outliers does not change substantially the contributions either from energy expenses per cow or from veterinary expenses per cow to estimate the production frontier measured in terms of output of milk per cow.
In Fig. 6, we present the estimated frontier surfaces, also via the ACR model, for the three different α n -levels, 1, 0.99, and 0.98. Similar conclusions to those from Fig. 5 can be drawn here. There are little changes in the overall shape of the surfaces over the different choices of α n , which is concave but highly nonlinear. The yellow points, which are the only ones above the estimated frontier surface, are those associated with the outliers. One of those on the top right side of the figure can be clearly seen, whereas two can be identified at the bottom part. As we decrease the value of α n , the estimated frontier surface becomes lower.
Finally, we remark that the outlier detection criterion of Fang et al. (2022) is also implemented, but with no outliers detected, so the estimated frontier remains the same as in the case with α n = 1.

Conclusion
In this article, we propose a robust methodology for estimating production frontiers with multidimensional input via a two-step nonparametric regression. We present a frontier estimation framework under a variety of flexible models which can both accommodate commonly seen shape constraints of the frontier surfaces and be robust to the presence of outliers. Monte Carlo simulations and empirical data analysis illustrate the good performance and finite properties of our proposed methods. Possible future extensions include, but are not limited to, testing strategies for different shape constraints, automatic selection of those constraints, testing the constant mean (or variance) assumption on the efficiency scores, handling of a large number of predictors in the high-dimensional setting, and handling outliers in the inputs.