Nonparametric, Stochastic Frontier Models with Multiple Inputs and Outputs

Abstract Stochastic frontier models along the lines of Aigner et al. are widely used to benchmark firms’ performances in terms of efficiency. The models are typically fully parametric, with functional form specifications for the frontier as well as both the noise and the inefficiency processes. Studies such as Kumbhakar et al. have attempted to relax some of the restrictions in parametric models, but so far all such approaches are limited to a univariate response variable. Some (e.g., Simar and Zelenyuk; Kuosmanen and Johnson) have proposed nonparametric estimation of directional distance functions to handle multiple inputs and outputs, raising issues of endogeneity that are either ignored or addressed by imposing restrictive and implausible assumptions. This article extends nonparametric methods developed by Simar et al. and Hafner et al. to allow multiple inputs and outputs in an almost fully nonparametric framework while avoiding endogeneity problems. We discuss properties of the resulting estimators, and examine their finite-sample performance through Monte Carlo experiments. Practical implementation of the method is illustrated using data on U.S. commercial banks.


Introduction and Background
Stochastic frontier models are widely used to analyze the productivity and efficiency of firms, and are attractive because they allow for noise in the data generating process (DGP).The models are typically fully parametric, with functional form specifications for the frontier as well as both the noise and the inefficiency processes along the lines of the composite-error models of Aigner et al. (1977), Battese and Corra (1977), and Meeusen and van den Broeck (1977).The generic model in this framework is where Y is an output quantity, X ∈ R p + is a vector of input quantities, ϕ(X) is the production frontier, η ≥ 0 is stochastic inefficiency and ∈ R is stochastic noise with E( ) = 0. Conditionally on X, η and are independent and in some studies, both η and are allowed to be heteroscedastic.Note also that X could be replaced by (X, E), with E denoting environmental factors that may influence the production process.In parametric applications, ϕ(X) is almost always specified as either Cobb-Douglas or Translog.Coelli (1995) and Kumbhakar et al. (2020) observe that most applied articles specify a normal distribution for the two-sided noise term and a half-normal distribution for the one-sided inefficiency term, that is, ∼ N(0, σ 2 ) and η ∼ N + (0, σ 2 η ).This article presents a highly flexible approach for estimating production efficiency in the presence of multiple inputs and multiple outputs.We propose a flexible, almost fully nonpara- metric model based on the directional distance measures of firms' efficiencies developed by Chambers et al. (1996Chambers et al. ( , 1998), but we allow for noise.We then transform the model to facilitate estimation using the ideas of Simar et al. (2017) (hereafter, SVKZ) and Hafner et al. (2018) (hereafter, HMS) and to avoid endogeneity issues.As shown in the separate Appendix E.1, supplementary materials, our method can also accommodate radial, as opposed to directional, distances via a simple modification of the transformation we use below.The frontier is allowed to be fully nonparametric, and we impose only symmetry on the density of the noise term.We specify a specific functional form for the density of the inefficiency term, but we estimate its parameters locally.We show that estimates of marginal rates of substitution and transformation, as well as other features of the original, untransformed model, can be easily recovered from estimates of the transformed model.An empirical illustration is provided by estimating the efficiency of U.S. commercial banks before, during and after the late-2007 financial crisis.
Ours is not the first attempt to move away from the fully parametric version of (1.1) and the associated restrictive and unrealistic assumptions that are often rejected when tested; see Parmeter and Zelenyuk (2019), Sickles and Zelenyuk (2019), or Kumbhakar et al. (2020) for discussion.A fully nonparametric approach raises identification problems as discussed by Hall and Simar (2002), in addition to possible problems for inference.However, we go farther toward a nonparametric model than other proposed methods, while avoiding a number of problems in other proposals.Fan et al. (1996) and Kuosmanen and Kortelainen (2012) allow the frontier function ϕ(X) to be nonparametric, but the stochastic parts of the their models are still parametric and homoscedastic.The model of Hall and Simar (2002) is fully nonparametric with symmetric noise, but asymptotic identification of the model requires the unrealistic, restrictive assumption that the variance of the noise goes to zero as the sample size n tends to infinity.Kumbhakar et al. (2007), Simar andZelenyuk (2011), andPark et al. (2015) use nonparametric local maximum likelihood approaches.In these approaches, both the model for ϕ(X) and the structure of the parameters of the local densities of η and are functional (nonparametric) and approximated by local polynomials, allowing unspecified heteroscedasticity for both stochastic components.Although this is attractive and flexible, estimation is difficult and computationally expensive.
Recently, SVKZ suggest a flexible model based also on local polynomials, but requiring only local least-squares methods for estimation.This model also allows for heteroscedasticity in both η and in a functional way, but is much faster to compute and is more robust since estimation is based on moment estimators and the full likelihood is not needed.In particular, only symmetry is required for the density of .We will adapt and use this approach below.
Other approaches are based on deconvolution techniques.Among these, Daouia et al. (2020) use a Tikhonov regularization technique to solve the ill-posed deconvolution problem, allowing ϕ(X) and η to be fully nonparametric, but requiring the distribution of the noise to be fully known (e.g., normal with zero mean and known variance).Kneip et al. (2015) need only to specify the family (e.g., normal) of the symmetric density for the noise term , and prove that the model is identified and provide estimates of all the components of the model.A pseudolikelihood approach is used for estimation, with the density of η approximated by an histogram.Another extension is provided by Florens et al. (2020), where the noise has an unspecified symmetric distribution and the frontier is nonparametric.Identification is obtained by specifying a flexible density for η based on Laguerre polynomials.Both the stochastic noise and inefficiency are allowed to be heteroscedastic and to depend on some extra environmental variables E, but computation is again difficult due to the required highly nonlinear optimization, and results are in many cases numerically unstable.Sun and Salim (2020) propose a semiparametric model where they estimate an input distance function while avoiding endogeneity problems, but their model involves somewhat more structure than our approach.Prokhorov et al. (2021) estimate a nonparametric stochastic production frontier model with endogenous inputs using a local limited information maximum likelihood method.
With the exception of Sun and Salim (2020), all of the above approaches are developed for a univariate response Y as in (1.1).Extensions to multivariate Y have also been proposed.Some are based on using polar coordinates in the space of the outputs Y, which is natural for radial efficiency measures.This is the approach of Simar (2007), who extends the univariate approach of Hall and Simar (2002), but the noise has to be of "small size, " and Simar and Zelenyuk (2011) who extend the local likelihood approach of Kumbhakar et al. (2007).The latter also provides a stochastic version of traditional envelopment estimators such as data envelopment analysis (DEA) and free-disposal hull (FDH) estimators.In these approaches, the stochastic deviations from the frontier are radial.However, the statistical properties of these approaches have not been established, primarily because they suffer from endogeneity problems that are ignored in each of the articles mentioned above.In each case, after transformation to polar coordinates, the regressors are functions of the endogenous inputs (for the input orientation) or of the endogenous outputs (in the output orientation).As will be seen below, the transformation in our model avoids such issues, and we are able to establish desirable properties (e.g., consistency) of our estimators.
Other approaches are based on modeling directional distance functions.In parametric settings, this involves regressing one of the outputs on the inputs and on the other outputs divided by the dependent variable.This necessarily involves endogeneity issues that are either ignored or solved by adopting restrictive assumptions.For example, Cuesta and Orea (2002) ignore any possible endogeneity while citing Coelli and Perelman (1996), who argue that since ratios of outputs or inputs appear on the right-hand side of the estimated model, exogeneity can plausibly be assumed.Serra et al. (2011) similarly ignore any possible endogeneity.Malikov et al. (2016) estimate a parametric, quadratic, directional distance function which is homogeneous with respect to directions but not with respect to inputs nor outputs, and use a system of equations to deal with potential endogeneity.
Recently Kuosmanen and Johnson (2017) suggest a way to avoid endogeneity when estimating directional distance functions in a nonparametric approach, but their method requires restrictive assumptions on the DGP (as honestly admitted by the authors), including constant returns to scale, nonstochastic frontier points and homoscedasticity of the stochastic components.In addition, the variance of the noise must tend to zero asymptotically as n → ∞ as in Hall and Simar (2002).Moreover, the properties of their proposed estimator, based on convex nonparametric least squares in a first stage, remain unknown.
In contrast to the various proposals for less-than-fullyparametric versions of the model in (1.1), our approach allows for both multiple inputs as well as multiple outputs, is computationally tractable, avoids endogeneity problems, and requires only symmetry for the density of the noise term .The article is organized as follows.The next section establishes notation and the basic statistical model as well as its transformation.In Section 3 we discuss estimation methods based on SVKZ.Section 4 presents empirical results from estimation of the efficiency of U.S. commercial banks, illustrating the practical usefulness of our new methods.Summary and conclusions are given in Section 5.
Some additional results are given in separate appendices.Appendix A, supplementary materials briefly presents evidence on the flexibility of our stochastic specification.Appendix B, supplementary materials shows how the model can be estimated using the HMS extension of SVKZ to deal with observations where residuals are skewed in the unexpected direction, and Appendix C, supplementary materials discusses some alternative approaches that might be used for this problem.Appendix D, supplementary materials discusses some alternative approaches for estimation, Appendix E, supplementary materials discusses various extensions of our model to estimate hyperbolic or radial distances, introduce environmental variables, etc. Appendix F, supplementary materials describes our Monte Carlo experiments and results, providing evidence on the performance of our estimators and methods.Additional details on our empirical application in Section 4 are given in Appendix G, supplementary materials.

Notation
Let X ∈ R p + and Y ∈ R q + denote (stochastic) vectors of input and output quantities, respectively, and let f (x, y) denote the joint density of (X, Y) with support on a strict subset of R p+q + .Similarly, let x ∈ R p + and y ∈ R q + denote fixed, nonstochastic vectors of input and output quantities.In the absence of any noise or uncertainty, the production set gives the set of feasible combinations of inputs and outputs, and the support of f (x, y) is a subset of .In this case, Pr((X, Y) ∈ ) = 1.But in the presence of noise, the support of f (x, y) may extend outside of and hence Pr((X, Y) ∈ ) < 1.Standard assumptions on in micro-economic theory of the firm include the following.
Assumption 2.1 ensures the existence of the efficient frontier, or technology given by ∂ := {(x, y)|(α −1 x, αy) ∈ for all α > 1}. (2.2) In cases where q = 1, the loci of points in ∂ comprise a function ϕ(x), that is, the frontier production function.Let p ≥ 1, q ≥ 1 and define r := p + q.Define a fixed, nonstochastic, (r × 1) direction vector d = (−d x , d y ) where d x ∈ R p + and d y ∈ R q + (see the separate Appendix E.4, supplementary materials for discussion on the choice of directions).Then for a fixed point (x, y) ∈ R r + , distance to the frontier in the direction d is measured by the directional distance function proposed by Chambers et al. (1996Chambers et al. ( , 1998) ) whenever δ(x, y|d) exists.
In the absence of noise, properties of nonparametric DEA and FDH estimators of the distance function defined in (2.3) are well-developed; see Kneip et al. (2008) and the sources cited therein or see Simar andWilson (2013, 2015) for recent surveys.Except for the required assumption of no noise, DEA estimators (which assume strong disposability of inputs and outputs and convexity of ) and FDH estimators (which only assume strong disposability of inputs and outputs) are very flexible, require no functional form assumptions, and easily handle multiple outputs and multiple inputs.DEA and FDH estimators rely on the assumption that the support of the stochastic variables (X, Y) coincides with a subset of .For additional technical details, see Kneip et al. (2008).
Regardless of whether noise is present, the production set , its frontier ∂ , and the distances defined in (2.3) are unobserved and must be estimated from data.Some additional assumptions are needed to define a statistical model.We first introduce a basic model below in Section 2.2, and then transform this model in Section 2.3 to facilitate estimation.As will be seen, the basic model amounts to a system of equation describing the underlying process that generates the observed data, while the transformed model is used for estimation.

The Basic Model
We first assume that the underlying theoretical economic process generates exogenous, efficient, but latent, production plans.
Assumption 2.2.The Data Generating Process (DGP) generates random, optimal production plans that belongs to the efficient boundary ∂ via a well-defined probability mechanism, providing iid values The optimal plans W ∂ i (i.e., frontier points) are unobserved due to inefficiency and noise.Now consider a given, fixed direction vector d.As noted in the separate Appendix E.4, supplementary materials empirical researchers often choose a fixed direction vector d (e.g., see the discussion by Färe et al. 2008, p. 534), often with elements equal to means or medians of the corresponding variables.For our purposes, the point is to make d fixed, nonstochastic and the same for all firms (the latter is sometimes called the "egalitarian" evaluation in (2.3)).Let observed input-output pairs be denoted by W i := (X i , Y i ).
Assumption 2.3.Given a direction vector d, the set of sample observations S n = {W i } n i=1 is generated according the model where the ξ i are, conditionally on W ∂ i , independent, scalar random variables whose characteristics may depend on W ∂ i .
Assumption 2.3 makes clear that the stochastic element ξ i operates on frontier points in the direction d, and that changing the direction vector d results in a different model.See separate Appendix E.4, supplementary materials for additional discussion.Next, we decompose ξ i into additive inefficiency and noise terms to complete the assumptions of our basic model.
Assumption 2.4.The stochastic term ξ i in Assumption 2.3 involves both an inefficiency component where conditionally on W ∂ i , i and η i are independent, with where Sym(0, a) denotes a symmetric (around zero), two-sided distribution on R with standard deviation a ≥ 0, and D + (b) denotes a one-sided distribution on R + belonging to some oneparameter scale family with parameter b ≥ 0.
A density f (•) belongs to a one-parameter scale family if it can be written as f (•) = (1/σ ) f (•/σ ) for some σ > 0, where f (•) is any density on R + .We will call f the reference density of the family.Examples include the Half-Normal and Exponential distributions, and the Gamma and Weibull distributions with fixed shape parameters.
Both the inefficiency and the noise processes in (2.4) operate on both inputs and outputs, as is standard in the literature on directional distance functions.Assumptions 2.1-2.4 ensure we have a random sample S n = {(X i , Y i )} of size n with the latent efficient variables W ∂ i playing the role of unobserved, exogenous variables.The components i and η i of the independent ξ i may be heteroscedastic.These assumptions induce the joint density f (x, y) of the observables (X, Y).
As an aside, consider the following special case of our basic model.Suppose that q = 1 (univariate output) and that the chosen direction is output oriented with d = (0 p , 1) where 0 p is a p-vector of zeros.Then (2.4) simplifies to (2.8) Since X i = X ∂ i is observed (because the corresponding elements of d are zero), and Y i is univariate, we can in this case describe the frontier points by the production function By conditioning on X i , the model in (2.8) can be written as (2.9) since the parameters of the scale functions in (2.6)-(2.7)can now be expressed as functions of X i .The model in (2.9) is identical to the output-oriented model suggested by Simar et al. (2017) for the case of a univariate output Y. Kumbhakar et al. (2007) use the same model with a specific localized distribution (i.e., Gaussian) for i in order to derive the local likelihood function.Moreover, it is obvious that (2.9) nests the fully parametric models of Aigner et al. (1977), Battese and Corra (1977), and Meeusen and van den Broeck (1977) as a special case, but allows far more flexibility than the fully parametric models.In addition, (2.9) is a special case of our basic model in (2.4), which allows q > 1.

The Transformed Model
In order to estimate the frontier as well as features of the inefficiency process, we translate the problem to a new coordinate system to obtain a transformed version of the basic model defined by Assumptions 2.1-2.4.For the r-dimensional direc- where I r−1 denotes the (r−1)×(r−1) identity matrix.Moreover, V d depends only on the given direction vector d.Using the matrix V d , define the (r × r) rotation matrix It is important to remember that the rotation matrix R d is specific to the given direction vector d as signified by the subscript "d." If the direction vector is changed, this will affect the rotation matrix as well as everything that follows.But as with V d , R d depends only upon the given direction vector d.
In the special case of a univariate output (q = 1) with outputoriented direction d = (0 p , 1) considered above in (2.8), it is easy to show that the matrix R d is given by the identity matrix of order r.In this case, no transformation of the basic model is needed, since the setup is the same as that of SVKZ.Otherwise, however, we can transform the observations to a new coordinate system by applying the rotation matrix R d in (2.10) to obtain where Z i ∈ R r−1 and U i ∈ R 1 for each i = 1, . . ., n.Moreover, the independent, identically distributed (iid) sample S n is transformed to an iid sample . Obviously, the (Z i , U i ) depend on the direction vector d through the rotation matrix R d , but we avoid further complicating the notation.Note that U i = d W i /||d|| and hence is invariant to the ordering of the inputs and outputs since the ordering of the elements of the direction vector d necessarily corresponds to whatever order of the elements of X i and Y i is chosen.
From a geometrical point of view, we have a linear transformation from w = (x, y) ∈ R r to (z, u) ∈ R r , where z u = R d w.This one-to-one transformation can be inverted by writing w = R d z u .This is illustrated in the case p = q = 1 in Figure 1 where the origin in the new (z, u)-space corresponds to the origin in the original (x, y)-space.The rotation transforms the production set in (x, y)-space to the set in (z, u)-space.Moreover, the density f (x, y) is transformed to a new density g(z, u) = g(u|z)g(z).Since u is in the direction d, and the r − 1 coordinates z are orthogonal to d, the frontier ∂ in (x, y)-space is transformed to the function φ(z) : R r−1 (2.13) In (x, y)-space, the frontier is an (r−1)-manifold, while in (z, u)space it becomes a scalar-valued, (r − 1)-variate function φ(z).So we have (2.14) The frontier points Pre-multiplying both sides of the basic model in (2.4) by R d yields the transformed model Then the transformed model in (2.15) can be written equivalently as where the equation for U i is univariate.Note that (2.16) has structure analogous to that of the special case of our basic model described by (2.8).Since Z i = Z ∂ i is observed, we can rewrite the frontier points in the (z, u)-space as Therefore, by conditioning on Z i , the second equation of (2.16) can be written as (2.17) where the scale functions σ (W ∂ i ) and σ η (W ∂ i ) in ξ i can be expressed as functions of Z i since W ∂ i can be expressed as a function of Z i only.Hence, the heteroscedastic features defined in (2.6)-(2.7) of Assumption 2.4 can be written as and It is important to note that (2.20) The first line of (2.16) results from the facts that (i) R d depends only upon the fixed d, and (ii) R d d = 0 r−1 1 by construction as discussed above.Consequently, by Assumption 2.2 and the first line of (2.16), Z i = Z ∂ i and therefore Z i is exogenous since W ∂ i , and hence Z ∂ i , are exogenous.In the transformed model (2.17),only the scalar left-hand side variable U i depends on the stochastic terms i and η i .In addition, U i depends on the exogenous Z i , and hence U i is endogenous.This greatly simplifies estimation, and stands in sharp contrast to most of the literature where researchers work in the original (x, y)space where all of the elements of X i and Y i are potentially endogenous.Of course this depends on a fixed, exogenouslychosen direction vector, but this is in fact what most empirical researchers adopt.See the separate Appendix E.4, supplementary materials for additional discussion regarding the choice of direction vector d.
In order to identify the frontier function φ(Z i ) in (2.13), a functional form for the distribution D + in (2.19) must be specified.Various possibilities are mentioned above following Assumption 2.4; for purposes of illustration, we adopt the following.
Assumption 2.5.The distribution of η, conditional on Z, is halfnormal with scale parameter σ η (Z), that is, (2.21) In addition, for purposes of estimation using local-linear methods along the lines of SVKZ, two additional assumptions are needed.
Assumption 2.6.For any orthonormal basis V d of V(d), the function φ(z) defined in (2.13) is differentiable at all points z in its domain.
Assumption 2.7.For all i ∈ {1, 2, . ..}, the conditional moments For a given Z i , there is a unique U ∂ i and W ∂ i in the direction d.By Assumption 2.4 and (2.19), E(η ) for all j ≥ 1, provided k j , the jth moment of the reference density of the family f exists (see the remarks following Assumption 2.4).Moreover, Assumption 2.4 ensures that the conditional mean efficiency function is given by (2.22)Under Assumption 2.5, k 1 = √ 2/π .Replacing Assumption 2.5 with a different distributional assumption will change the value of k 1 .Assumption 2.6 imposes some smoothness on the frontier, in all directions, while Assumption 2.7 imposes some minor restrictions on the tail behavior of the distributions of i and η i in (2.18)-(2.19).Both assumptions are needed to ensure consistency of the local-linear estimators used below in Section 3.
In Section 3, we show how the unknown functions φ(Z i ), σ 2 (Z i ) and σ η (Z i ) can be easily estimated using the transformed model in (2.17) using standard nonparametric regression estimators (e.g., local polynomial or orthogonal series estimators).In particular the "σ functions" in (2.18)-(2.19)will be functional parameters, enhancing the flexibility of the model.Estimates of the conditional mean efficiency function in (2.22) are obtained directly from the transformed model.Estimates of features of the basic model, such as marginal rates of substitution or transformation, marginal products, or derivatives of conditional mean efficiency with respect to inputs or outputs, can be recovered easily from estimates of the transformed model due to the fact that the transformation from (x, y)-space to (z, u)-space is one-to-one as shown below in Section 2.4.

Links between the Two Spaces
Recall that the direction d and the chosen matrix V d are fixed as discussed earlier in Section 2.3.In applied economic studies, researchers are often interested in marginal effects such as marginal productivity, marginal rates of substitution, etc.As demonstrated below, it is possible to recover partial derivatives in the original (x, y)-space from derivatives obtained in the new, transformed (z, u)-space introduced in Section 2.3.
First, the locus of the frontier function in the transformed model is given by the set of points (z, φ(z)), where z ∈ R r−1 .
From this the frontier surface in (x, y)-space (an (r − 1)manifold) can be obtained from which can be viewed as a mapping F : R r−1 → R r , where w ∂ = F(z) is a manifold parameterized by F. The Jacobian of the transformation in (2.23) can be written as the r × (r − 1) matrix where i = 1, . . ., r and j = 1, . . ., (r − 1).Existence of the Jacobian in (2.24) is ensured by Assumption 2.6.Now consider a fixed point (z 0 , φ(z 0 )) on the frontier.Then from (2.23) the image on the frontier manifold in w-space is w ∂ 0 = F(z 0 ).Standard results from real analysis (e.g., Lang 1964, pp. 164-174) ensure that under Assumption 2.6, where ϑ is a small vector in R r−1 and lim ||ϑ||→0 G(ϑ) = 0. Therefore, J φ (z 0 ) defines a linear map from R r−1 to R r which can be viewed as an approximation of F near the point z 0 , up to an error term whose magnitude is small.This linear map is said to be a tangent hyperplane to F at the point z 0 .Since F is differentiable at z 0 by Assumption 2.6, this hyperplane is unique due to Theorem 1 of Lang (1964, p. 166).
Clearly, w = F(z 0 + ϑ) in (x, y)-space.Therefore, the equation of the unique hyperplane tangent to the surface F(z) at the point z 0 is given by This hyperplane can be written in explicit form by the linear equation where c ∈ R r .The vector c is identified up to a multiplicative constant by the equation c J φ (z 0 ) = 0 r−1 , or equivalently, (2.28) In other words, c is determined (up to a scalar constant) by a basis of the null space of the matrix J φ (z 0 ).Using the equation of the tangent hyperplane in (2.27), it is obvious that any component w of w can be expressed as a function of the other elements w k , k = 1, . . ., r, k = .Hence, partial derivatives in (x, y)-space can be written as provided c k = 0, indicating that the variable w has no effect on the component w k at the particular frontier point of interest.Another interesting feature of the transformed model is the function μ η (z) defined in (2.22).It is clear that the vector ∂μ η (w ∂ )/∂w ∂ can be recovered from knowledge of the vector ∂μ η (z)/∂z.Since z = V d w ∂ , the Jacobian of the mapping from w ∂ to z is (2.31) with i = 1, . . ., r − 1 and j = 1, . . ., r. Applying the chain rule yields which is an r-vector of derivatives of mean inefficiency with respect to the components of w at a point w ∂ .Details on estimation are given in the next section.

Estimation Methodology and Basic Properties
As discussed above in Section 2.3, the transformed model given by (2.17)-(2.19)involves a scalar response variable U i , a link function φ(Z i ) and a composite error term ).These are necessarily related to the direction vector d through the dependence on Z i .It is easy to prove that the model is identified once the reference family is chosen (as in Assumption 2.5) using the sufficient condition for identification given by Florens et al. (2020).
We use the moments-based method of SVKZ and its extension described by HMS to estimate our transformed model.To conserve space, we discuss here how the SVKZ method can be used; a discussion of the HMS method appears in the separate Appendix B, supplementary materials.In addition, alternative methods that might be used are discussed briefly in the separate Appendix D, supplementary materials.However, our stochastic specification is quite flexible since it is localized (i.e., the specification may change with the value of z), and the convolution of a one-parameter scale family density with a symmetric density is also quite flexible (see separate Appendix A, supplementary materials for further discussion and illustration).Another advantage of our estimation approach is its numerical easiness and relatively low computational requirements, since only local least-squares methods are used and most of the relevant information about the model can be recovered by assuming only symmetry of and selecting a specific oneparameter scale family for the distribution of efficiency.
We first show how the transformed model given by (2.17)-(2.19)can be estimated.. The estimation method can be viewed as a nonparametric version of the modified OLS procedure suggested by Richmond (1974) that is sometimes used in fully parametric settings (e.g., see Lovell 1993 for discussion).Our approach allows the link function φ(z) to remain nonparametric and uses only information of the first three local moments of the random terms.Consequently, the method used here is less sensitive to error in specifications of the local densities of the stochastic terms than some of the approaches discussed in the separate Appendix D, supplementary materials.
To explain the procedure, note that μ η (Z i ) defined in (2.22) must be nonnegative for all i = 1, . . ., n. Define The conditional mean function r 1 (Z i ) can be viewed as an average production function that can be estimated using standard nonparametric regression estimators (e.g., the local-linear estimator).
The assumption of symmetry of i |Z i ensures that and Now let r j (z) := E(ε j |z) for j ∈ {2, 3}.Under Assumptions 2.4 and 2.5, it is straightforward to show that and From (3.7) is clear that the scale function σ η (z) is identified by the third (local) moment r 3 (z), and this in turn provides identification of μ η (z).
Estimation is straightforward.First, (3.1) is estimated by local linear least squares (LLLS), yielding estimates r 1 (z) and ∇r 1 (z) of r 1 (z) and ∇r 1 (z), respectively.The properties of these estimators are well-known; for example, see Fan and Gijbels (1996) or Li and Racine (2007).Second, compute the residuals ε i = ||d|| −1 (U i − r 1 (Z i )) for i = 1, . . ., n, and again using LLLS, regress 2 i and 3 i on the Z i , i = 1, . . ., n to obtain corresponding estimates r j (Z i ) and ∇r j (Z i ) of r j (Z i ) and ∇r j (Z i ), j ∈ {2, 3}.As noted in the Appendix of SVKZ, reasoning along the lines of Masry (1996) With least-squares crossvalidation for the bandwidths this is clearly o p n −1/(r+3) , and hence r 1 (z) converges uniformly.SVKZ then prove, under the additional assumption that the conditional moments of ε of order up to 7 are finite, and selecting the bandwidths for the regression of ε 2 i and of ε 3 i on z such that nh r−1 (log n) −4 → ∞ and nh r+3 → 0 as n → ∞, we have for j ∈ {2, 3} and where the expression the variances are given in SVKZ for the local constant regression case.In practice the variances have to be evaluated by bootstrap techniques.SVKZ note that, as usual, for avoiding the asymptotic bias in (3.8), the bandwidths h must be of slightly smaller order, for example, h = O n −(1+ν)/(r+3) for some 0 < ν < 4/(r − 1).Since the bandwidths for the r components of the Z i have the same order, then the r j (z), j ∈ {2, 3} converge at rate (nh p+q−1 ) 1/2 where the bandwidth h is written without a subscript since all are of the same order.It is well-known that the optimal order of h for local-linear regression is h = O n −1/(p+q+3) , which leads to an optimal rate n 2/(p+q+3) for the r j (z).Here, as elsewhere with nonparametric regression, there is no escape from the curse of dimensionality, but there is minimal risk of specification error.Substituting r 3 (z) into (3.7) and rearranging terms yields the estimator of the scale function σ η (z).The max operator is used to ensure that the estimate is nonnegative, as in Olson et al. (1980); note that the HMS estimator discussed in the separate Appendix B, supplementary materials avoids this truncation.Substituting σ η (z) into (3.5)yields an estimator of mean conditional inefficiency μ η (z), that is, Similarly, an estimator of the variance of is obtained from (3.6), although this is not needed if one is only interested in estimates of the frontier and of mean conditional inefficiency.Finally, an estimator of the frontier function in the transformed space is given by φ(z) = r 1 (z) + ||d|| μ η (z). (3.12) In addition, the partial derivative estimates ∇r j (z), j = 1, . . ., 3 can be used to recover estimates ∇μ η (z) and ∇φ(z) of the partial derivatives of μ η (z) and φ(z) with respect to the r elements of z.
One can also derive predictions of individual efficiencies for each observation, using a local version of the Jondrow et al. (1982) approach.However, this requires specification of a particular density in the family Sym(0, σ (z)) (e.g., normality).Then the frontier estimates φ(Z i ) can be used to compute residuals Jondrow et al. (1982).However, this should be used with caution.As observed by SVKZ, the resulting η i are predicted values based on a single, particular realization ξ i .In fact, the η i are not predictions, but rather plug-in estimates of the conditional mean E(η i |ξ i , Z i ) as discussed by Simar and Wilson (2010) and Wheat et al. (2014).Even so, confidence intervals for η i are likely to be quite wide since the estimate is based on a single observation.Moreover, if one is interested in predictions, one should estimate prediction intervals as discussed by Simar and Wilson (2010) instead of confidence intervals for the conditional mean.
As noted above, the derivatives ∂μ η (z)/∂z and ∂φ(z)/∂z can be estimated easily using local-linear (or local-quadratic) estimation of the transformed model.By inverting the transformation in (2.11), the points (Z i , φ(Z i )) can be translated to estimates of the frontier surface in the original (x, y) coordinate system.This leads to where the matrix R d defined earlier is known and nonstochastic.
Similarly, estimates of points on the average production function r 1 (z) can be obtained in the original space via Since these transformations are nonstochastic, it is clear that the asymptotic properties of the estimators in the transformed (z, u)-space remain valid in the original (x, y)-space with the same rate of convergence (i.e., n 2/(p+q+3) ).
The same is true for the other quantities of interest derived in Section 2.4.In particular, the derivatives of expected efficiency with respect to inputs and outputs given in (2.32) can be estimated by noting first that differentiation in (3.5) yields The derivative ∂σ η (z) ∂z can be easily estimated, and repeated substitution leads to an estimate of ∂μ η (w ∂ )/∂w ∂ in (2.32).
Estimates of the marginal effects given by (2.29) can also be computed, giving estimates of marginal products, marginal rates of substitution and marginal rates of transformation.LLLS estimation of the average production function r 1 (z) provides estimates of ∂r 1 (Z i )/∂Z i , and from the discussion above we have estimates of ∂μ η (Z i )/∂Z i .Adding these give estimates of ∂φ(Z i )/∂Z i due to (3.12), and these can be used to compute estimates J φ (Z i ) of the Jacobian matrix J φ (Z i ) given in 2.24.Substituting J φ (Z i ) for J φ (Z i ) in (2.28) and solving for the vector c allows estimates of the marginal effects given in (2.29) to be computed.

Further Details
Several extensions of the ideas developed above are possible.To conserve space, we give brief discussions of these in the separate Appendix E, supplementary materials, with systematic analysis of the ideas left for future work.In Section E.1, we show how the methods we have described above can be adapted to estimate radial input, output and hyperbolic distances.In Section E.2, we discuss how exogenous environmental variables can be introduced into the analysis.This allows investigation of the effect of such variables on the production frontier ∂ as well as on the distribution of inefficiency.Section E.3 discusses how the SVKZ and HMS estimators can be extended to impose free disposability or convexity on the production set , thereby providing stochastic analogs of FDH or DEA estimators.Finally, Section E.4 discusses the consequences of allowing the choice of direction d to be data-driven.

Efficiency of U.S. Banks
We use the methods developed above to estimate technical efficiency and scale elasticities of U.S. commercial banks operating during first-quarter 2001 (2001Q1) through fourthquarter 2020 (2020Q4).We use quarterly data from the U.S. Federal Deposit Insurance Corporation Statements of Income and Condition (Call Reports).Our data cover the periods before, during and after the financial crisis of 2007-2009, and thus allow us to examine how banks responded to the crisis.Following prior work (e.g., Wheelock and Wilson 2018), and using subscript i to index observations 1, . . ., n, we define five inputs, that is, purchased funds (X i1 ), labor (X i2 ), physical capital (X i3 ), equity (X i4 ) and risk (X i5 ).Labor is measured in terms of full-time equivalents, and physical capital is measured by the book value of premises and fixed assets as is typical in banking studies.All dollar amounts are measured in thousands of constant 2012 U.S. dollars.
Consistent with Wheelock and Wilson (2018) and other studies, we view both equity and risk as quasi-fixed inputs.Equity is another source of financial capital, but as noted by Akhavein et al. (1997) and Berger and Mester (1997) is very difficult and costly to adjust substantially in the short run.Risk is measured in terms of nonperforming loans as in Wheelock and Wilson (2018).Others, including Malikov et al. (2016) and Simper et al. (2017), view nonperforming loans as an undesirable output.Since we hold risk constant, it can also be viewed as a "netput, " and the results will suggest whether it behaves as an input or an output as discussed below.
We define three outputs, including total loans (Y i1 ), securities (Y i2 ) and off-balance sheet items (Y i3 ).Off-balance sheet items are measured by total noninterest income and reflects fiduciary activities, foreign exchange and securities trading, data processing services and other activities.
After deleting observations with missing or implausible values as well as credit-card, bankers' , cash management, nonbank, pure internet, cooperative and wholesale banks, depository trust companies and industrial loan companies, we have 486,461 observations, with number of observations per quarter ranging from 8001 in 2001Q1 to 4261 in 2020Q4, reflecting the on-going consolidation in the U.S. banking industry through mergers and acquisitions.Summary statistics for these variables are given in Table G.1 appearing in the separate Appendix G, supplementary materials.Table G.1 indicates that marginal distributions of the input and output variables are skewed to the right, reflecting the skewness of the distribution of bank sizes measured by total assets.
As others (e.g., Wheelock and Wilson 2001, 2012, 2018) find, there is substantial collinearity among the specified inputs and outputs.We exploit this by reducing the dimensionality of our problem along the lines of Daraio and Simar (2007) and Wilson (2018).First, we standardize each of our eight variables by dividing by their standard deviations (this amounts to merely changing units of measurement).Let n denote the sample size, and consider the (n × 3) matrix X whose columns consist of the observations X i1 , X i2 , and X i3 , and the (n × 2) matrix Y whose columns consist of the observations Y i2 and Y i3 , i = 1, . . ., n.For the moment matrix X X, the ratio of its largest eigenvalue to the sum of its eigenvalues is 0.9609.Similarly, for the moment matrix Y Y the ratio of its largest eigenvalue to the sum of its eigenvalues is 0.9661.
These values suggest that most of the information in X and Y are contained in their first principle components, and the simulation results of Wilson (2018) suggest that in the absence of noise, DEA and FDH estimates of efficiency are likely to have smaller mean-square error using reduced-dimensional data as opposed to the full, eight-dimensional data in our problem.Consequently, we replace the first three inputs (i.e., purchased funds, labor and physical capital) with the first principle component X * 1 = Xλ 1 of X, where λ 1 denotes the (3×1) eigenvector of X X corresponding to its largest eigenvalue.We similarly replace the second and third outputs (i.e., securities and off-balance sheet items) with their first principle component, denoted by Y * 2 .We retain the standardized versions of equity and nonperforming loans, denoted by X * 2 and X * 3 and the standardized version of total loans, denoted by Y * 1 .Thus, after dimension reduction, we have p = 3 inputs and q = 2 outputs.Summary statistics for these variables are also given in Table G.1.
We set the second and third elements of the direction vector to zero in order to hold equity and risk constant, and we set the first, fourth and fifth elements equal to the medians of X * i1 , Y * i1 and and Y * i2 , ensuring that our results are independent of units of measurement.We construct the (r × r) rotation matrix R d (where r = 5 after reducing dimensionality) as described in Section 2.3, and use this to translate each observation and U i is scalar.Summary statistics for the transformed data in (z, u)-space are given in Table G.1 appearing in the separate Appendix G, supplementary materials.
In addition to Z i , we include in our regressions both time and firm effects.For estimation of r 1 (•) in (3.1), we regress U i on (Z i , T i , L i ), i = 1, . . ., n, where T i ∈ {1, 2, . . ., 80} corresponding to 2001Q1, 2001Q2, …, 2020Q4 (respectively), and L i is a unique numeric label for the firm represented by observation i. Local linear regression requires an (n × n) diagonal matrix of kernel weights for each observation i = 1, . . ., n. we use a Gaussian product kernel for the four continuous elements of Z i .For estimating r 1 (•) at observation i, the jth diagonal element of our weighting matrix is given by where Z ik denotes the kth element of Z i , K(  Bandwidths are optimized via leave-one-out cross-validation.To avoid possible numerical difficulties, we constrain the first four bandwidths (corresponding to elements of Z i ) by imposing a lower bound 0.1 σ k n −1/(4+r * ) on h k , k ∈ {1, 2, 3, 4} where σ k is the sample standard deviation of Z ik , i = 1, . . ., n and r * = 4 is the number of continuous variables used in each regression.In addition, we impose an upper bound on each h k , k ∈ {1, 2, 3, 4} given by 3 [max i (Z ik ) − min i (Z ik )].Table G.2 in the separate Appendix G, supplementary materials shows the optimized values of the bandwidths along with the corresponding lower and upper bounds.For each of the three estimated regressions, regression, the optimized bandwidths are well within the intervals between corresponding lower and upper bounds.
After regressing U i on Z i , and the squared as well as the cubed residuals from this regression on Z i , we estimate σ η (Z i ) and μ η (Z i ) for each observation in our sample.Figure 2 shows the (sample) average and median values of the estimates μ η (Z i ) across the 80 quarters represented in our sample.The estimates show an increase in inefficiency just before and during the recession of 2007-2009.Both average and median inefficiency are seen to decline after reaching a peak in 2011Q2, reaching a trough in 2014Q4.The estimates indicate that both mean and median inefficiency began to increase again starting in 2015, coincident with increases in the U.S. Federal Funds rate by the Federal Reserve System.Specific values of μ η (Z i ) are difficult to interpret due to the fact that in the model, η i is multiplied by the norm ||d|| of the direction vector.The nonzero elements of the direction vector correspond to medians of X * 1 , Y * 1 , and Y * 2 , and hence the direction vector is d = − 0.01313 0.0 0.0 0.004934 0.002622 and its norm is ||d|| = 0.01427.Increasing (decreasing) the length of the direction vector necessarily decreases (increases) η i and its conditional expectation.In any case, the values indicated by Figure 2 should be viewed while keeping in mind the length of the direction vector.
As a check on our results-as well as the ability of our methods to provide meaningful estimates-we also estimate marginal rates of substitution, marginal rates of transformation and marginal products in the space containing the untransformed, reduced-dimensional inputs X * j , j =∈ {1, 2, 3} and outputs Y * k , k ∈ {1, 2}.In addition, we estimate scale elasticities with respect to the variable input X * 1 by computing for each observations i = 1, . . ., n.All derivatives are estimated using (2.29) as described in Section 2.4.In order to conserve space, we report summary statistics for these estimates in Table G.3 in the separate Appendix G, supplementary materials.Interestingly, all of the estimates of scale elasticity in Table G.3 are greater than 1, suggesting that all banks in the sample face increasing returns to scale.This is consistent with results obtained by Wheelock and Wilson (2018) and may explain why even the largest U.S. banks continue to grow larger.
It is important to note that the derivative estimates are for a particular point on the frontier, and that our use of local estimation methods is likely to result in noisy estimates at each point.Nonetheless, the results in Table G.3 reveal that among the 486,461 observations in our sample, the median and mean values of estimates of (i) the marginal rate of substitution between X * 1 and X * 2 , (ii) the marginal products with respect to X * 1 and X * 2 , (iii) the marginal rate of transformation between Y * 1 and Y * 2 and (iv) scale elasticity have the expected signs.One should expect marginal rates of substitution and transformation to be negative, and marginal products and scale efficiency to be positive.The right-most column in Table G.3 gives the percentage of observations yielding negative estimates, and with the exception of derivatives involving X * 3 , these range from 86.1% to 99.8%.
The results for marginal rates of substitution between X * 3 and the other inputs, as well as for the marginal products of Y * 1 and Y * 2 with respect to X * 3 are surprising, with most observationsbetween 0.8% and 14.6%-yielding unexpected signs.While we initially viewed risk as measured by nonperforming loans, the results for derivatives involving X * 3 in Table G.3 suggest that this variable behaves as an output.However, recalling the discussion above, we treat this variable as fixed, and the corresponding direction is set to zero in the direction vector d.Consequently, as noted above, X * 3 is allowed to act as a "netput, " that is, either as an input or an output, and the evidence we obtain suggests that it is better viewed as an undesirable output.Our use of a zero direction for this variable means that our initial view of the variable as a quasi-fixed input has no bearing on the estimation when we change our view of the variable as an undesirable output.Treating the variable as a fixed, undesirable output merely amounts to changing the label on the variable from X * 3 to Y * 3 .We examine potential heteroscedasticity in the inefficiency process by estimating derivatives ∂μ η (W ∂ i )/∂W ∂ i given by (2.32).Table G.4 in the separate Appendix G, supplementary materials gives the percentages of observations where derivatives of the conditional expectation μ η (•) with respect to X * j , j ∈ {1, 2, 3} and Y * k , k ∈ {1, 2} that are less than, equal to or greater than zero.The estimated derivatives equal zero in about 39.7% of all cases, but otherwise show varying proportions that are less than or greater than zero.The results in Table G.4 provide evidence of heteroscedasticity in the distribution of the one-sided inefficiency process.Our model explicitly allows for this in Assumption 2.4.There are many published examples where parametric, stochastic frontier models specified with a fixed, constant shape parameter for the one-sided inefficiency distribution; Few studies that estimate parametric stochastic frontier models allow for or test for heteroscedasticity in the inefficiency distribution.
As a final note on our application, σ (Z i ) = 0 for 307,456 observations, or for about 63.2% of the sample.Thus, for almost two-thirds of the sample, we find no evidence of noise.As discussed in the separate Appendix C, Section C.1, supplementary materials, one could impose observation-specific bounds when estimating r 2 (x) in order to ensure σ 2 (z) is greater than zero.But, as also noted in Section C.1, this is computationally prohibitive given our large sample size.As noted in separate Appendix G.2, supplementary materials, using the HMS estimators results in 485,664 cases (99.8% of the sample) where σ (Z i ) = 0. Given that we use accounting data on banks, and that the data are audited both by the banks as well as regulators, it may not be unreasonable to find little evidence of noise.

Summary and Conclusions
The methods proposed in this article provide a very flexible, almost fully nonparametric approach for evaluating the performance of firms in terms of their efficiency.For purposes of estimating the production frontier, we need only some mild assumptions on continuity and smoothness of the frontier, symmetry of the density of the noise process, and tail behavior of the noise and inefficiency distributions.No functional form assumptions are needed-not for the frontier, nor for the distributions of noise and inefficiency.In addition, our method accommodates both multiple inputs as well as multiple outputs while avoiding endogeneity problems.As discussed in Section 1, Other flexible methods that have been proposed to date either do not allow for noise, do not permit multiple inputs and outputs, involve significant computational difficulties, or involve endogeneity problems.Of course, as with most nonparametric methods, ours incurs the curse of dimensionality, but as illustrated in Section 4, dimension reduction methods can mitigate this curse in many cases.Our empirical application on U.S. commercial banks demonstrates that our method is computationally tractable with real data.
There have been for many years two disparate strands in the literature on frontier efficiency estimation.One strand consists of articles describing deterministic estimators such as DEA and FDH estimators that require no functional form assumptions, and allow for multiple inputs as well as multiple outputs, but which do not allow for noise.The other strand consists of articles that are often fully parametric, or in recent years, partially parametric, that permit noise, but which require functional form assumptions and do not allow multiple inputs and outputs.Our article provides a bridge between these two worlds.

Figure 2 .
Figure 2. Sample Average and Median Values of μ η (Z i ), SVKZ Estimates, 2001Q1-2020Q4.NOTE: Averages are indicated by the solid curve, while medians are represented by the dashed curve.
||d|| and where || • || denotes the L 2 -norm.The matrix V d is not unique, but this does not create any problem for what follows provided V d is fixed after it is selected.Clearly, R d is an orthogonal matrix.Hence,