Semiparametric spatial autoregressive models with nonlinear endogeneity

Abstract. This article constructs nonparametric two-step least squares (2SLS) and generalized method of moments (GMM) sieve estimators to estimate a functional-coefficient spatial autoregressive model with an endogenous environment variable. We derive the consistency and asymptotic normality results for our proposed sieve estimators. A small Monte Carlo study shows that our proposed estimators exhibit good finite-sample performance. An empirical application is used to illustrate the usefulness of our methods.


Introduction
Parametric spatial autoregressive (SAR, henceforth) models with exogenous regressors have been well studied in the spatial econometric literature; e.g., Ord (1975), Lee (2004Lee ( , 2007)).Recently, Liu and Lee (2013), Drukker et al. (2013), and Liu and Saraiva (2015) have studied parametric SAR models with endogenous regressors.As parametric SAR models can suffer model misspecification problem, non/semiparametric SAR models have been studied by Su and Jin (2012), Malikov and Sun (2017), among others, when all regressors are exogenous.Assuming a constant spatial lag parameter, Hoshino (2018) considers a semiparametric SAR model nested to the functional-coefficient model studied by Malikov and Sun (2017), where Hoshino (2018) allows linear endogeneity.Hoshino (2022) extends the conventional SAR model to accommodate nonlinear spatial endogeneity, allowing for a spatial lag term in the nonparametric form of the dependent variable.In contrast, this article considers a different type of nonlinear endogeneity in the semiparametric spatial autoregressive framework.Specifically, we extend the functional-coefficient SAR model of Malikov and Sun (2017) by allowing endogenous environmental variables (i.e., the variables inside the unknown coefficient functions).Drukker et al. (2013) construct a GMM estimator using both linear and quadratic moments, while Liu and Lee (2013) and Hoshino (2018) construct a two-stage least squares (2SLS) estimator from linear moments only.It is well known in the spatial literature that the 2SLS estimator can be less efficient than the GMM estimator using both linear and quadratic moments, although the GMM estimator does not have an analytic solution.In addition, the number of moments is fixed in Drukker et al. (2013) and Hoshino (2018), but it increases with sample size in Liu and Lee (2013).Applying a linear control function approach, Liu and Saraiva (2015) construct a GMM estimator using different quadratic moments than Drukker et al. (2013), where the linear control function can suffer misspecification in practice.Hence, to avoid introducing additional model misspecification, this article applies a nonparametric control function approach to correct for the endogeneity of the functional coefficients in a semiparametric regression framework and constructs a nonparametric sieve estimator, allowing the number of linear orthogonal moments as well as the number of quadratic orthogonal moments to grow with the sample size.
To illustrate our estimation methods with empirical data, we consider a semiparametric stochastic frontier model.Specifically, we consider that a firm's technical inefficiency error is affected by its local market power, which is measured by the Herfindahl-Hirschman Index.Using the Baltagi et al. (2016) dataset for Chinese firms in the chemical industry in 2006, we document an interesting finding: the kernel density function of the estimated spatial lag coefficient curves for private-owned firms is bimodal and more than 60% of the spatial lag coefficient estimates are negative, while the kernel density for stateowned firms is unimodal.The result indicates that state-owned firms behave differently from privateowned firms in the presence of competition.Details are given in Section 6.
The rest of the article is organized as follows.Section 2 defines our model under consideration.Section 3 describes our proposed 2SLS and GMM sieve estimators via sieve approximation approach.Section 4 lists regularity assumptions and derives the limiting results of our proposed estimators.Section 5 uses a small Monte Carlo simulation to assess the finite sample performance of our proposed estimators and test statistics.In Section 6, we propose a semiparametric spatial autoregressive stochastic frontier model, which is estimated using our proposed estimators given in Section 3. Section 7 concludes.All the mathematical proofs are relegated to an online supplemental Appendix.
For ease of reading, we introduce our notation here.(i) A and A 1 are the Euclidean norm and the row-sum norm of matrix A, respectively; (ii) λ i (A) is the ith (real or complex) eigenvalue of A, the spectral radius, (A) = max 1≤i≤n |λ i (A)|, and For a symmetric matrix A, λ min (A) and λ max (A) are its minimum and maximum eigenvalues, respectively; (iii (iv) S x denotes the support of a random variable x; (v) I n is an n × n identity matrix, 0 n is an n × 1 vector of zeros, and 0 n×m is an n × m matrix of zeros with m ≥ 2; (vi) capital letters are reserved for column vectors and matrices, while lowercase letters are reserved for scalars; and (vii) we use C to denote a finite positive constant which can take different values at different positions.

The model
We consider the following model (2.1) with d x,2 > 1, and w ij,n is the nonstochastic spatial weight capturing the impact of the jth individual unit's outcome on the ith individual unit's outcome.Also, {u i } is a sequence of i.i.d.errors with zero mean and finite variance.The model extends the functional-coefficient SAR model of Malikov and Sun (2017) by allowing the environmental variable, z i , to be endogenous.The case with ρ (•) ≡ 0 has been studied in Delgado et al. (2020).We aim to estimate ρ (•) and consistently in the presence of nonlinear endogeneity, applying the control function approach introduced in Newey et al. (1999), where for convenience, we denote where E (ε i |X i , z i , v i ) = 0 for all i almost surely.According to Theorem 2.2 in Newey et al. (1999), β (•) and ω (•) are identified under the condition that E [ω (v i )] = 0 and that there are no functions δ (•) and , where δ (•) and ϕ (•) are nonzero over at least one nonempty subset of the support of z i and v i , respectively.For example, if X 1,i equals one, β 1 (z) and ω (v) are identifiable as long as X 2,i contains relevant instruments for z i or θ (•) is nonzero over at least one nonempty subset of the support of X i .Rewriting model (2.4) in matrix form gives (2.5) where Y n = y n,1 , . . ., y n,n and E = [ε 1 , . . ., ε n ] are both an n × 1 vector, W n is an n × n spatial weight matrix with zero diagonal elements and w ij,n being its i, j th element, ρ(Z) ≡ diag {ρ(z 1 ), . . ., ρ(z n )} is an n × n diagonal matrix, X 1 =diag X 1,1 , . . ., X 1,n is an n × (nd β ) block diagonal matrix, and β(Z) = β(z 1 ) , . . ., β(z n ) and ω(V) = [ω(v 1 ), . . ., ω(v n )] are vectors of dimension nd β and n, respectively.To ensure the spatial stationarity of our model, we assume that (2.6) If (2.6) holds true, the reduced form of model (2.5) becomes

The sieve estimators
This article proposes a sieve estimator, where we approximate all the unknown functions by linear sieve estimators.Specifically, let p 1 (•) , p 2 (•) , . . .be an orthonormal basis functions, and we, respectively, approximate ω (•), ρ (•), and where p l x j for x = x 1 , . . ., x d x for any l.1 As v i 's are unknown in model (2.4), we first estimate θ (•) from model (2.2).Applying the sieve approximation gives where b i = θ (X i ) − θ * (X i ) captures the series approximation error.The OLS estimator of α θ is denoted by αθ = s n s n −1 s n Z, where we denote Z = [z 1 , . . ., z n ] and Then, we calculate the residuals ṽi = z i − α θ s n (X i ) for i = 1, 2, . . ., n.The unknown v i 's in model (2.4) will be replaced by ṽi 's in the subsequent estimations.
Below, Sections 3.1 and 3.2 explain how to construct the nonparametric 2SLS and GMM estimators, respectively.Section 3.3 derives the consistency and asymptotic normality results of the proposed estimators.Note that the estimator for ω (•) is the demeaned estimator to ensure E [ω (v i )] = 0.

Nonparametric 2SLS estimator
Applying the series approximation to model (2.4) yields where r n,i captures all the series approximation errors, and x 1,l,i is the lth element in X 1,i .Rewriting above model in matrix form gives where P z,l =diag p l (z 1 ) , . . ., p l (z n ) and X 1,l =diag 1, x 1,l,1 , . . ., x 1,l,n are both an n × n diagonal matrix, , and R n = r n,1 , . . ., r n,n .If the series approximation bias term, R n , is ignorable, model (3.3) is approximated by a SAR model with κ n spatial lags in the dependent variable constructed from κ n spatial weight matrices, P z,1 W n ,..., and P z,κ n W n .Thus, model (3.3) becomes an increasing-order SAR model.Gupta and Robinson (2015) estimate a higher-order linear endogenous SAR model with an increasing number of regressors and order, while our model (2.1) allows nonlinearity in the endogenous environmental variable, z i .In addition, Gupta and Robinson (2015) estimate their linear SAR model via a 2SLS estimator, while this article considers the 2SLS estimator as well as the GMM estimator using additional quadratic moment conditions.Our Monte Carlo simulation results given in Section 5 show that the semiparametric GMM estimator performs better than the semiparametric 2SLS estimator, as expected.
Denoting S n α ρ = I n − κ n l=1 α ρ,l P z,l W n , we have under a sufficient condition, , which is infeasible due to the unknown parameter vector ϑ.Let Q 1,n be an n × q n feasible instrument matrix for W n Y n , where q n ≥ 1.As S n α ρ where κ Q = κ n q n + d β + 1 .
Our proposed 2SLS estimator of ϑ is defined as where Once we obtain θ, we construct the second-step 2SLS estimator, θopt , from Qn = P z,1 Q1,n , . . ., , respectively.The second-step 2SLS estimator is expected to be more efficient than the first-step estimator, as shown in Section 4.

Nonparametric GMM estimator
In the parametric SAR literature, it is well known that the 2SLS estimator of the spatial lag parameter may not exist for a pure SAR model with no additional exogenous regressors, and that the 2SLS estimator can be inefficient.A similar argument holds true for our semiparametric model.Therefore, we propose a GMM estimator by introducing additional quadratic moments conditions beyond the linear orthogonal moment conditions (3.6).
Similar to Lee (2007), the quadratic moments are in the form of E A n,l E E = 0, where A n,l is an n × n matrix satisfying tr A n,l = 0.The arguments given in Section 3.1 suggest that A n,l = for l = 1, . . ., κ n , which is not feasible due to the unknown parameter vector α ρ .So, the feasible candidates can be A n,l = P z,l W n −n −1 tr P z,l W n I n or A n,l = P z,l W n −diag P z,l W n for l = 1, . . ., κ n , for example, where the second definition of A n,l is used in Section 5 as it is robust to error heteroskedasticity.
We then construct our initial nonparametric GMM estimator: where we denote Qn is the same as that is used to construct the first-step 2SLS estimator, and n is a Note that θ = α ρ , α β,1 , . . ., α β,d β , α ω and that the first-step GMM estimators of ω (•), ρ (•), and Given the initial GMM estimator, θ, and motivated by Lee (2007), we construct a more efficient second-step GMM estimator, θopt .From model (3.4), better instruments for the spatial endogenous term, W n Y n , will be Q1,n = W n S n αρ −1 d β l=1 X 1,l P κ n z αβ,l + P κ n ṽ αω , and better quadratic moments can be constructed from Ân,l = P z,l W n S n αρ −1 −diag P z,l W n S n αρ −1 for l = 1, . . ., κ n .Then, our second-step GMM estimator solves where and

Assumptions and limiting results
Below, we list regularity conditions required for the derivation of consistency and limiting distributions of our proposed estimators.
sequence with finite moments of at least second order, and y i is generated from model (2.1)-(2.3).(ii) The respective support, S z , S v , and S x l of z i , v i , and x i,l are all compact subsets of R for l = 1, . . ., d x .(iii) W n 1 and W n 1 are both finite and (2.6) holds.

Assumption 2. (i)
There exist finite constants c and c such that 0 < c ≤ λ min n, ≤ λ max n, ≤ c < ∞ uniformly in s n , where n, = n −1 E s n s n .(ii) There exist finite constants c and c such that 0 where X equals X with P κ n ṽ being replaced with P κ n v .
(iii) The set of the orthonormal basis functions p 1 (•) , p 2 (•) , . . .are twice continuously differentiable everywhere on its support and max l≥1 max and ω (•) are nonzero over at least one nonempty subset of their respective domain, and β (•), ω (•), and ω (•) are all uniformly bounded over their domain.(iii) There are no functions δ (•) and ϕ (•) such that X 1 δ (z) ≡ ϕ (v), where δ (•) and ϕ (•) are nonzero over at least one nonempty subset of the support of z i and v i , respectively.(iv n , and P κ n r = max 0≤s≤r max x ∂ s P κ n /∂x , where ∂ s P κ n /∂x is the s th (s ≥ 1) partial derivative with respect to x.As n → ∞, (i) s n → ∞ and Assumption 1 (i) defines a spatial autoregressive varying coefficient model with an endogenous environmental variable.The compact support in Assumption 1(ii) is typical in series approximation literature (e.g., Newey, 1997).Allowing infinite support can involve lengthy proofs and the introduction of trimming function (Newey et al., 1999) or weighted sup-norm metric (Blundell et al., 2007;Su and Jin, 2012).Assumption 1(iii) is a standard spatial stationarity assumption in the literature of spatial autoregression.Assumption 2 (i) ensures the existence of the estimator for parameter α θ in model (3.1), while Assumption 2(ii) ensures the existence and nonsingularity of the asymptotic covariance matrix of the proposed initial 2SLS and GMM estimator of model (3.2).Similar assumptions can be found in Newey (1997) and Ozabaci (2014).
Assumption 3 regulates the smoothness of all the unknown coefficients as well as the magnitude of the approximation errors, where the ς -smooth function is [ς ]-times differentiable and its [ς ]th derivative satisfies Hölder's condition with an exponent ς − [ς ] ∈ [0, 1], and [ς ] is the largest integer smaller than ς > 0. Assumptions 3(i)-(ii) reflect the fact that sieve approximation error depends on the number of basis functions used, the smoothness of unknown functions, and the argument dimension of the unknown functions.The B-splines series and orthonormal Hermite series all satisfy Assumption 3(iii) Newey (1997) and Chen (2007).Under Assumptions 4(i)-(ii), model (2.1)-( 2.3) defines a semiparametric model with nonlinear endogeneity and relevant instruments.Assumption 4(iii) is an identification condition, while Assumptions 2(ii) and 4(iv) ensure the existence of the second-step 2SLS and GMM estimators.The boundness of max 1≤i≤n E ε 4+η i |X i , v i in probability is used to derive the asymptotic normality results of our estimator by Lemma A.5 in Lin and Lee (2010).Assumption 5 gives restrictions on the smoothing parameters s n and κ n , and q n , the number of instruments for W n Y n .
Below, we present the consistency and limiting distributions of our proposed estimators and relegate all mathematical proofs to Appendix.
Next, we denote and present the asymptotic normality results of the nonparametric 2SLS estimators.
Theorem 4.1.Under Assumptions 1-6, we have where ρ (z), β (z) and ω (v) refer to the 2SLS or optimal-2SLS estimator, depending on the definition of Qn , and we denote Next, we derive the consistency and asymptotic normality results for our proposed nonparametric GMM estimators.
Assumption 7. (i) The weight matrix, n , defined in (3.7) and (3.8), is a positive definite matrix satisfying 0 Under Assumptions 1-5 and 7(ii), the leading term of g n (ϑ 0 ) is Lemma 4.2.Under Assumptions 1-5, and 7, we have (i Next, we derive the limiting distribution of our proposed GMM estimator.In doing so, we first calculate the variance of ḡn (ϑ 0 ) as follows where n is a κ n × κ n matrix with a typical element equal to π n,l,l =tr n,ε A n,l A n,l n,ε + n,ε A n,l for l, l = 1, . . ., κ n , and n,Q = E Q n n,ε Q n .Following the proof of Lemma 4.2, we have where Now, we are in a position to give the limiting distribution of the semiparametric GMM estimators.
Theorem 4.2.Under Assumptions 1-7, we have where ρ (z), β (z) and ω (v) refer to the GMM or optimal-GMM estimator, depending on the definition of Qn , and n = κ n n n n matrix, assuming that lim n→∞ n has a full rank.
To make statistical inference, we estimate from the (optimal) GMM estimator.We denote the resulted estimator of n and n −1 n,g by ˇ n and n −1 ˇ n,g , respectively.Theorem 4.2 is obtained for a general weight matrix n , from which we observe that n = n −1 n,g gives the most efficient estimation results among all n satisfying Assumption 7(i).
Corollary 4.1.Under the assumptions in Theorem 4.2 and setting n = n −1 n,g , we have )

Monte Carlo simulations
In this section, we run a small Monte Carlo simulation to evaluate the performance of our proposed estimators.Specifically, we generate our data from the following model (5.1) (5.3) for i = 1, . . ., n, where X i = x 1,i , x 2,i is randomly drawn from a bivariate uniform distribution over [0, 1] × [0, 1] with a correlation equal to ρ = 0.25 according to the Plackett distribution, v i 's are i.i.d.N 0, σ 2 v , ε i 's are i.i.d.N 0, 0.25 2 , and {v i } is independent of {ε i }.Also, we set ρ (z) = .75sin (π z), and ω (v) = 0.5v.For the spatial weight matrix, we randomly draw the geographical coordinates of n US counties from a data set for 1980 Presidential election results covering 3,107 US counties in Pace and Barry (1997).2Let d ij be the geodesic between two counties.We then calculate w * ij,n = 1/d ij for i = j and set w * ii,n = 0 for all i = j and w ij,n = w * ij,n / W * n , where W * n is the spectral radius of W * n .We set σ v = 0.25 and 0.5, where the higher σ v , the lower the R 2 from the first-stage regression (5.2), the less relevant the instrument variable (x 1 , x 2 ), and the higher the correlation between u and v.We apply penalized cubic B-splines method to estimate model (5.2) using GAM package in the R software.And, univariate cubic B-splines are used to estimate ρ (•), β (•) and ω (•).In the first-step estimation, we set Q1,n = W n X 1,1 P κ n z , . . ., W n X 1,d β P κ n z , W n P κ n ṽ .In Table 1, we report the sample average of the root mean square errors (RMSEs) of the optimal 2SLS and GMM estimators.We observe these: (i) the RMSEs of the two estimators decrease as the sample size increases, which supports the consistency of our proposed optimal 2SLS and GMM estimators; (ii) as expected, a higher σ v or a higher endogeneity and weaker instrument scenario lead to less accurate estimation; and (iii) the optimal GMM estimator in general performs better than the optimal 2SLS estimator, especially under the higher endogeneity and weaker instrument scenario.
Note that the calculation of the proposed estimators requires determining the number of basis functions used in the first-and second-stage estimation.We subjectively set κ n = 8, 10, and 12 for n = 200, 400, and 800, respectively.3First, we apply the generalized cross-validation (GCV) method to determine the optimal overall number of basis functions in the sieve estimation of the first-stage regression model (5.2).Second, for the estimation of unknown coefficient curves ρ (•), β (•), and ω (•), since the 2SLS estimator only involves linear moment conditions, it is straightforward to apply Zhang (2010) minimax concave penalty (MCP) method to select relevant basis functions using the R software.After obtaining the optimal 2SLS estimator, we therefore apply the MCP method to select the relevant basis functions for each of the unknown coefficient curves.In other words, the MCP-selected basis functions used to estimate ρ (•) may differ from those used to estimate β (•) and ω (•).This penalty method is borrowed from the high-dimensional estimation literature to reduce estimation dimensions (see, e.g., Belloni et al., 2012;Li and Sun, 2023).Evidently, the penalized selection (PS) method is a finer and more flexible selection method than the conventional cross-validation method because the PS method allows different basis functions to be used to estimate different unknown coefficient curves, while the conventional cross-validation method selects the same basis functions for all curves.In this article, we limit the application of the PS method to the optimal 2SLS estimator, as the nonlinear optimization problem associated with the penalized GMM estimator is numerically challenging.Also, we omit the proofs for the penalized optimal 2SLS estimator to save space, as they are essentially similar to the proofs given in Li and Sun (2023).In Table 1, the columns under ρopt_s (•) and βopt_s (•) report the RMSEs for the penalized optimal 2SLS estimator, where we observe that the penalized optimal 2SLS estimator does slightly outperform the optimal 2SLS estimator, although it still performs worse than the optimal GMM estimator in general.

Empirical application
In this section, we will apply our proposed model to study firms production function using stochastic frontier models.Specifically, we downloaded the Baltagi et al. (2016) dataset from the Journal of Applied Econometrics Archive.The dataset contains 12,552 Chinese firms in the chemical industry for the period of 2004 to 2006, which is complied from the Chinese Industrial Enterprises Database survey conducted by China's National Bureau of Statistics (NBS).In this article, we focus on a cross-sectional analysis by working on the data observed in 2006.In particular, our dependent variable, y i , is the log of firm i's sales.The basic production factors include the log of capital used in production, k i , the log of employment l i , the log of material inputs m i , and the share of high-skilled labor h i , where a worker is deemed a highskilled worker if he/she has a university or comparable education.
The following stochastic frontier (SF) model has been widely applied to investigate firm production inefficiency since the seminal work of Aigner et al. (1977): where x i includes production input variables and other own technology shifters, ξ i ≥ 0 is a one-sided error capturing the technical inefficiency, and i is the random error.In practice, it is reasonable to believe that firms can experience productivity/technology spillover via informal conversations, sharing common resources such as capital, skilled labor and specialized suppliers, and local competitive pressure, especially for firms geographically close to each other.Consequently, a firm's productivity can be affected by its neighboring firms' technology/productivity.Such productivity spillover effects are well documented (e.g., Hu et al., 2005;Keller and Yeaple, 2009), but are not captured by the SF model (6.1).Spatial autoregressive stochastic frontier (SARSF) model, is introduced to capture the technology/productivity spillover effects by augmenting the traditional SF model with a spatial lag term, j =i w n,ij y n,j ; see, e.g., Glass et al. (2016).In (6.2), the spatial lag parameter, ρ, is used to measure the magnitude of spillover effects, and ρ = 0 suggests the nonexistence of technology/productivity spillover effects.
In the literature of stochastic frontier modelling, it is common to assume that i is i.i.d.normally distributed and independent of ξ i , which is assumed to be i.i.d.half-normally distributed (e.g., Aigner et al., 1977).Reifschneider and Stevenson (1991) are the first to allow firm-specific characters to affect the technical inefficiency error, assuming ξ i = g (z i ) + e i , where z i contains firm specific inefficiency explanatory variables, and e i is an i.i.d.error.Initiated from Hicks (1935), the quiet life hypothesis implies that firms with market power prefer to operate inefficiently rather than reap all potential rents, as stated in Koetter et al. (2012).This motivates us to set the Herfindahl-Hirschman Index (HHI) as z i . 4As all the firms in our dataset are relatively large with an annual turnover exceeding 5 million yuan (about US $700,000) as explained in Baltagi et al. (2016), it is reasonable to believe that those firms can enjoy local market power to some degree.We therefore construct a variable HHI i as follows: where d ij is the haversine distance between firm i and firm j; I d ij ≤ d * = 1 if d ij ≤ d * holds true and zero otherwise; for a firm j that is located within d * -km from firm i, we calculate the firm j's local market share as , where d * is a constant threshold.We then take HHI i as a proxy for firm i's local market power.2020) assume that the coefficients in model ( 6.2) vary with respect to the firm's aggregate of unskilled and skilled labor.As a firm with significant market power can enjoy bargaining in capital loan and material cost, as well as attract highly skilled labor, we propose the following semiparametric SARSF with an environment variable being HHI i where z i = HHI i and u i = i − e i .Since that the varying intercept coefficient curve, α 0 (•), is not separately identified from the technical inefficiency curve, g (•), we actually estimate the following model where we denote Additionally, being calculated from the firms' sales data in year 2006, HHI i can be endogenous.We therefore assume that where Lhhi i is firm i's HHI observed in year 2005, and X i = x i , Lhhi i .Model (6.3) extends model (6.2) by allowing heterogeneous coefficients and endogeneity of the environmental variable, z i .We use the pre-determined Lhhi i variable as an instrumental variable of HHI i for two reasons: In our data, HHI i and Lhhi i display a very high correlation, with the sample correlation slightly exceeding 0.98; (ii) it is reasonable to assume that v i is uncorrelated with firm i's past local market power given the high persistency of HHI.
For the spatial weight matrix, we define the raw spatial weight by wn,ij = I d ij ≤ d * . 5We then apply the row-normalization method to obtain the final spatial weight matrix denoted by W n .We set d * = 300. 6Our base model is the spatial autoregressive stochastic translog production model and as well as two dummy variables ex i and in i , where ex i = 1 indicates firm i is an exporter, and in i = 1 indicates firm i intensively uses its intangible asset.In conventional stochastic frontier literature, all input materials are typically considered exogenous.However, this exogeneity assumption may be challenged if there is potential feedback from either the statistical noise, i , or the technical inefficiency error, ξ i (see, e.g., Amsler et al., 2016).Since we are using data from Baltagi et al. (2016), we have opted to follow their approach and treat x i as exogenous.Note that our model allows for the possibility of correlation between the input variables (x i ) and the technical inefficiency error (ξ i ), but not with v i after controlling for Lhhi i .7We consider three parametric SARSF models: (i) model (6.2); (ii) model (6.5), and model (6.6) both nest to model (6.3), using firms' specific character, HHI, to explain the technical inefficiency error.Specifically, we define models (6.5) and (6.6) by w n,ij y n,j + u i , (6.5) (6.6) variables in x i to obtain an error term μ i , which is assumed to be correlated with the error term (u i ) in model (6.3); (ii) define the control function as By following these steps, our theory remains valid with the proper adjustment of assumptions.However, due to data limitations, further exploration of the possibility that some of the x i variables are endogenous in model ( 6.3) falls outside the scope of this article.We intend to investigate this aspect in our future research.respectively, where a linear instrument equation is used to obtain model (6.6), HHI i = θ 0 + θ 1 Lhhi i + v i .Essentially, model (6.5) takes HHI i as an exogenous variable, while model (6.6) treats HHI i as an endogenous variable and uses Lhhi i as its instrumental variable.
Our analysis focuses on firms with zero foreign ownership only.In addition, we split the data into two subsamples: data for state-owned firms (SOFs) and data for private-owned firms (POFs), as the state-owned firms and the private-owned firms are expected to be managed differently.After sorting Lhhi i , we notice that there are 307 and 1,452 different values taken by the variable for the SOFs and POFs data, respectively.For each subsample, we remove the firms with Lhhi i falling between the lower and upper 0.5% of its empirical distribution out of the concern of outliers.We then end up with 555 and 9,117 observations for the SOFs and POFs data, respectively.Table 2 reports the summary statistics.On average, the SOFs sold and exported slightly more than the POFs, and the SOFs have higher capital investment, number of employees, share of skilled labors, and intensive use of intangible assets, and stronger local market power than the POFs.
We report the model estimation results for the SOFs and POFs subsamples in Tables 3 and 4, respectively.In both tables, we provide the generalized 2SLS estimates for models (6.2), (6.5), and (6.6) in columns (1), (2), and (3), respectively.The remaining columns present estimation results from the semiparametric model (6.3).Specifically, we report the 10th, 25th, 50th, 75th, and 90th percentiles of each estimated coefficient curve, denoted as 10, 25, 50, 75, and 90, respectively. 8The corresponding standard errors are provided in parentheses.9Additionally, the following first-stage IV regression estimation results indicate that Lhhi enjoys high explanatory power of HHI i : POFs Data: HHI i = 0.0289 + 1.0067Lhhi(0.0067)(0.0023) R 2 = 0.9560 and n = 9, 117 SOFs Data: HHI i = 0.0840 + 1.0527Lhhi(0.0165)(0.0090) R 2 = 0.9611 and n = 555 First, we observe that most of the coefficient estimates are very close among the three parametric models, which is particularly true for the POFs data.The estimates for the spatial lag parameter, ρ, equal 0.063, 0.056, and 0.054 for the SOFs data and 0.029, 0.033, and 0.040 for the POFs data for models (6.2), (6.5), and (6.6), respectively.At the 10% level, ρ is significant for all the models for the SOFs data, but the POFs data give a significant spatial lag parameter for model (6.6) only.Numerically, the stateowned firms exhibit stronger positive sales spillover effects than the private-owned firms.In addition, the coefficients in front of HHI is not significant in either dataset, although the POFs data support a weak endogeneity of HHI with a significant β 3 at the 10% level.
Second, the estimates of ρ (•) in model ( 6.3) run from −0.118 to 0.22 for the POFs data and from 0.073 to 0.109 for the SOFs data.So, 90% of the estimates of the spatial lag curve are larger than the parametric estimate for the SOFs data.For the POFs data, we find that 60% of the estimates are negative.These results indicate that the state-owned firms behave differently from the private-owned firms, where the former show more positive sales spillover effects, while some portion of the private-owned firms exhibit vicious competition.Our results indicate that the semiparametric model is better in capturing coefficient heterogeneity than the parametric model.
Furthermore, we plot the histogram of HHI and the kernel estimates of the density functions of the estimates of ρ (•), β 0 (•), and the control function ω (•) for the POFs and SOFs data in Figures 1 and 2, respectively.All the density functions are estimated by the kernel method.with bandwidth selected by the cross-validation method.Comparing the two figures, we observe an interesting finding: the kernel density functions are all bimodal for the POFs data, which implies that the private-owned firms can be categorized into two latent groups.Splitting the POFs data by the dummy variable, ex i , we re-estimate model (6.3) for the resulted two sub-POFs data and observe that the bimodal feature does not go away.And the bimodal feature exists when we split the data by the dummy variable, in i .Evidently, some economic theory is behind the documented bimodal distribution feature.

Conclusion
This article considers a spatial autoregressive functional coefficient regression model where the environment variable is endogenous.Applying our proposed estimation method to investigate firms' production function, we observe that private-owned firms behave differently from state-owned firms.In particular, the cross-firm sales dependence has a bimodal distribution for the private-owned firms and a unimodal distribution for the state-owned firms.It suggests that the prediction of a firm's production behavior is biased if the prediction is obtained from a model pooling together the private-and state-owned firms data.Additionally, we leave it to our future research to consider the case that both X 1i and z i are endogenous in model (2.1).

Table 1 .
The sample average of RMSEs.

Table 2 .
Data summary statistics.

Table 3 .
Estimation results for the POFs data.

Table 4 .
Estimation results for the SOFs data.