GMM inference in spatial autoregressive models

ABSTRACT In this study, we investigate the finite sample properties of the optimal generalized method of moments estimator (OGMME) for a spatial econometric model with a first-order spatial autoregressive process in the dependent variable and the disturbance term (for short SARAR(1, 1)). We show that the estimated asymptotic standard errors for spatial autoregressive parameters can be substantially smaller than their empirical counterparts. Hence, we extend the finite sample variance correction methodology of Windmeijer (2005) to the OGMME for the SARAR(1, 1) model. Results from simulation studies indicate that the correction method improves the variance estimates in small samples and leads to more accurate inference for the spatial autoregressive parameters. For the same model, we compare the finite sample properties of various test statistics for linear restrictions on autoregressive parameters. These tests include the standard asymptotic Wald test based on various GMMEs, a bootstrapped version of the Wald test, two versions of the C(α) test, the standard Lagrange multiplier (LM) test, the minimum chi-square test (MC), and two versions of the generalized method of moments (GMM) criterion test. Finally, we study the finite sample properties of effects estimators that show how changes in explanatory variables impact the dependent variable.


Introduction
The nite sample properties of generalized method of moments estimators (GMMEs) have been extensively studied in the literature on nonspatial econometric models (in the 1996 special issue of Journal of Business & Economic Statistics). For example, Hansen et al. (1996) show that the estimated asymptotic variance of their continuous updating GMME for the asset pricing models are downward biased in nite samples. For linear and nonlinear models of covariance structures, Altonji and Segal (1996) and Clark (1996) show that the correlation between the optimal weight matrix and sampling errors introduces nite sample bias and that the amount of bias is relatively greater when data have long-tailed and skewed distributions. Furthermore, the optimal GMMEs (OGMMEs) of models of covariance structures have smaller asymptotic standard errors, making inference and speci cation testing problematic. Hall and Horowitz (1996); Brown and Newey (2002) con rm that the rst-order asymptotic theory o en provides poor approximations to the distributions of test statistics obtained from GMM estimators. Similarly, with regard to dynamic panel data models and panel count data models, asymptotic standard errors for the OGMME may also be severely downward biased (more than 20%) relative to the empirical standard deviations (Arellano and Bond, 1991;Blundell and Bond, 1998;Windmeijer, 2005;Bond and Windmeijer, 2005;Windmeijer, 2008).
GMM estimators for the spatial autoregressive models may su er from similar de ciencies because they are essentially multistep estimators. For instance, Prucha (1998, 2010) propose a multistep estimation method that involves a combination of IV and GMM estimation for the spatial model with a spatial autoregressive process in the dependent variable and the disturbance term, commonly referred to as SARAR(1, 1). In terms of estimation, the Kelejian and Prucha approach is straightforward as computations do not require special techniques even for extremely large samples. In principle, the GMM/IV estimator is ine cient relative to the maximum likelihood estimator (MLE) (Prucha, 2014), but in nite samples the di erence can be negligible (Das et al., 2003).
To increase the e ciency of GMMEs, Lee (2007a,b), Lin and Lee (2010), Liu et al. (2010), and Lee and Liu (2010) suggest a set of linear and quadratic moment functions. The reduced forms of spatial models specify expressions for spatial lags of the dependent variable (i.e., endogenous variables) in terms of exogenous variables and the disturbance term. Then, linear moment functions may be constructed on the basis of the deterministic part of the spatial lag terms, and quadratic moment functions may be formulated from the stochastic part of the spatial lag variables. Both linear and quadratic quadratic moment functions are chosen in such a way that the resulting GMME is asymptotically equivalent to the MLE when disturbances are independent and identically distributed (i.i.d.) with a normal density. When disturbances are simply i.i.d., Liu et al. (2010) and Lee and Liu (2010) show that the generalized method of moments (GMM) estimator can be more e cient than the quasi MLE, respectively, for the case of SARAR(1, 1) and SARAR(p, q).
The GMM estimation approach suggested in Lee (2007a,b), Lin and Lee (2010), and Liu et al. (2010) requires initial consistent parameter estimates to obtain not only the optimal weight matrix but also the linear and quadratic moment functions. The asymptotic theory in Lee (2007a,b), Lin and Lee (2010), and Liu et al. (2010) assumes that these estimated matrices are in nitely precise and therefore does not account for the variation stemming from the initial estimator used to construct these matrices. The correction method suggested in Windmeijer (2005) speci cally accounts for the presence of these initial consistent parameter estimates in the optimal weight matrix. Therefore, it can provide improvements for statistical inference. The aforementioned studies on the GMM estimation of spatial models indicate that the GMMEs have desirable nite sample properties in terms of bias and root mean square error. However, their performance in terms of estimated asymptotic standard errors and related test statistics for statistical inference has not been explored. Hence, further research on the performance of the estimated asymptotic standard errors of the GMME in spatial autoregressive models is warranted.
In this study, we evaluate the performance of the asymptotic approximations suggested for the GMM estimators and MLE for the SARAR(1, 1) model. We extend the nite sample correction method of Windmeijer (2005) to our model. The correction is based on a Taylor approximation of the rst-order conditions of the GMM objective function to account for the additional variation in the optimal weight matrix. Therefore, the correction method accounts for the variation stemming from the initial GMM estimation to obtain an estimate of the optimal weight matrix. In a Monte Carlo study, we compare the estimated asymptotic standard errors and the corrected ones with the empirical standard deviations under various scenarios. Our simulation results indicate that the estimates of the asymptotic standard errors based on the OGMME are downward biased for the spatial autoregressive parameters. The amount of bias is generally larger than 5%. For the correction method, the amount of bias is less than 5% in most cases suggesting that the correction method can be useful for statistical inference.
For the SARAR(1, 1) speci cation, we consider various test statistics for testing linear restrictions on spatial autoregressive parameters based on the GMM and ML methods. These tests include standard asymptotic Wald tests based on the MLE and the GMME, a bootstrapped version of the Wald test, two versions of the C(α) test, the standard LM test, the minimum chi-square test (MC), and two versions of the GMM criterion test. We compare the performance of these tests through a Monte Carlo simulation for the two null hypotheses that there is no spatial dependence (i) in the dependent variable or (ii) in the disturbance term. Our results show that the standard asymptotic Wald test based on the OGMME tends to over-reject, whereas the Wald test based on the corrected standard errors has proper size properties. The GMM criterion tests and the C(α) test can be useful for testing the null hypothesis that there is no spatial dependence in the disturbance terms. Our results also show that the C(α) test can be useful to detect the presence of spatial dependence in the dependent variable.
For the same speci cation, we study the nite sample properties of scalar measures suggested by LeSage and Pace (2009) for the partial derivatives that show how changes in explanatory variables impact the dependent variable. We consider (i) the simulation method of LeSage and Pace (2009) and (ii) the Delta method to study the dispersions of the direct, indirect and total e ects. Our results indicate that both methods perform similarly and the dispersions calculated with the corrected standard errors for the e ects estimators can be useful as they have relatively smaller bias.
This article is organized in the following way. Section 2 presents the spatial autoregressive model under consideration and discusses its assumptions. Section 3 reviews the GMM estimation approach and lays out the details of the nite sample correction method for the SARAR(1, 1) speci cation. Section 4 reviews scalar measures suggested for the interpretation of parameters, and shows how their dispersions can be calculated. Section 5 reviews the details of test statistics under consideration in the GMM framework. Section 6 lays out the details of the Monte Carlo design and presents the results. Section 7 closes with concluding remarks. Some of the technical derivations and Monte Carlo results are relegated to a web appendix.

The model speci cation and assumptions
The SARAR(1, 1) speci cation is given by where Y n is the n × 1 vector of a dependent variable, X n is the n × k matrix of non-stochastic exogenous variables with a conformable parameter vector β 0 , W n and M n are the n × n spatial weight matrices of known constants with zero diagonal elements, and ε n is the n×1 vector of disturbances (or innovations). W n Y n and M n u n are known as, respectively, the spatial lag of the dependent variable and the spatial lag of the disturbance term. λ 0 and ρ 0 are the spatial autoregressive parameters. Because spatial data is characterized with triangular arrays, the variables in (2.1) have the subscript n (Kelejian and Prucha, 2010). Let be the parameter space of the model. In order to distinguish the true parameter vector from other possible values in , we state the model with the true parameter vector θ 0 = (ρ 0 , λ 0 , β ′ 0 ) ′ . Furthermore, for notational simplicity we let S n (λ) = (I n − λW n ), R n (ρ) = (I n − ρM n ), G n (λ) = W n S −1 n (λ), H n (ρ) = M n R −1 n (ρ), and G n (ρ, λ) = R n (ρ)G n (λ)R −1 n (ρ). Also, at the true parameter values (ρ 0 , λ 0 ), we denote S n (λ 0 ) = S n , R n (ρ 0 ) = R n , G n (λ 0 ) = G n , H n (ρ 0 ) = H n , and G n = R n G n R −1 n . The consistency and asymptotic normality of the GMME for (2.1) are established under the following assumptions. 1 Assumption 1. The elements ε in of the disturbance term ε n are distributed independently and identically with mean zero and variance σ 2 0 , and E |ε in | 4+ν < ∞ for some ν > 0 for all n and i.
Assumption 2. The spatial weight matrices M n and W n are uniformly bounded in row and column sums in absolute value. Moreover, S −1 n , S −1 n (λ), R −1 n , and R −1 n (ρ) exist and are uniformly bounded in row and column sums in absolute value for all values of ρ and λ in a compact parameter space.
The regularity conditions in Assumptions 1 and 2 are motivated to restrict the spatial autocorrelation in the model at a tractable level (Kelejian and Prucha, 1998). By this assumption, the third and fourth moments of ε in , denoted respectively by µ 3 and µ 4 , exist for all i and n. Assumption 2 also implies that the model in (2.1) represents an equilibrium relation for the dependent variable. By this assumption, the reduced form of the model can be written as Y n = S −1 n X n β 0 + S −1 n R −1 n ε n . 2 In the literature, weight matrices are usually treated as exogenous and xed. Lee (2004Lee ( , 2007b formulates the weight matrix as 1 For interpretations and implications of these assumptions, see Lee (2007a) and Kelejian and Prucha (2010). 2 The uniform boundedness of S −1 n (λ) and R −1 n (ρ) is required for the ML estimator. In the case of GMM estimators, only the uniform boundedness of S −1 n and R −1 n is required .
a function of the sample size, such that the sequence of weight matrices {W n } is uniformly bounded in both row and column sums and its elements w n,ij are O( 1 h n ). The sequence {h n } can be bounded or divergent with the property that lim n→0 h n n = 0, which implies that h n is allowed to diverge only at a rate slower than that of n. For large group interactions for which lim n→∞ h n n = 0, the estimators might not be consistent. In this study, we assume that h n is bounded.
Throughout this study, the vector of moment functions we consider for the GMM estimator takes the form of g(θ 0 ) = ε ′ n P 1n ε n , . . . , ε ′ n P mn ε n , ε ′ n Q n ′ . Moment functions involving the n × n constant matrices P jn for j = 1, . . . , m are known as quadratic moment functions. The moment function Q n ′ ε n is a linear moment function where Q n is an n×r instrument matrix with r ≥ k+1 and has full column rank. The matrices P jn and Q n are chosen in such way that orthogonality conditions of population moment functions are not violated. Let P n be the class of n × n constant matrices with zero trace. The quadratic moment functions involving matrices from P n satisfy the orthogonality conditions when disturbance terms are i.i.d. Assumption 4 states regularity conditions for these matrices. Assumption 5 characterizes the parameter space. 3 Assumption 3. The regressors matrix X n is an n × k matrix consisting of uniformly bounded constant elements. It has full column rank. Moreover, lim n→∞ 1 n X ′ n X n exists and is nonsingular.
Assumption 4. Elements of the IV matrix Q n are uniformly bounded. Matrices P jn for j = 1, . . . , m are uniformly bounded in row and column sums in absolute value.
Assumption 5. The parameter space is a compact subset of R k+2 , and θ 0 ∈ Int( ).

The moment conditions
Equation (2.1) implies Y n = S −1 n (X n β 0 + R −1 n ε n ). Thus, the endogenous variable W n Y n on the righthand side of Eq. (2.1) may be written as W n Y n = W n S −1 n (X n β 0 + R −1 n ε n ) = G n X n β 0 + G n R −1 n ε n . Equation (2.1) may also be written as R n Y n = λ 0 R n W n Y n + R n X n β 0 + ε n . This expression is the basis for the choice of instruments (Lee, 2007a;Liu et al., 2010;Lee and Liu, 2010;Lin and Lee, 2010;Kelejian and Prucha, 2010): the linear moment matrix Q n is constructed from the expectation of Z n = (R n W n Y n , R n X n ), which implies that the orthogonality condition for the population moment function holds: E Q ′ n ε n = Q ′ n E ε n = 0 r×1 . Given consistent initial estimates of λ 0 , ρ 0 , and β 0 , the IV matrix Q n becomes available. In particular, Lee (2003) shows that the 2SLS estimator based on Q n is the best estimator in the sense that its asymptotic covariance matrix is the smallest among the class of 2SLS estimators based on linear moment conditions.
Instruments that lead to quadratic moment functions follow from Assumption 1, in particular that ε in is i.i.d. with variance σ 2 0 . For any P jn ∈ P n , P jn ε n is an instrument that satis es the orthogonality condition for a population moment function: E ε ′ n P ′ jn ε n = E tr(ε ′ n P ′ jn ε n ) = tr P ′ jn E(ε n ε ′ n ) = σ 2 0 tr P jn = 0 for j = 1, . . . , m, where tr(·) returns the trace of an input matrix. Moreover, when P jn is suitably chosen, we have E (P jn ε n ) ′ R n W n Y n = σ 2 0 tr P ′ jn R n G n R −1 n = 0: P jn ε n also instruments R n W n Y n .
The consistency of the GMM estimator does not depend on a particular P jn , but the asymptotic variance-covariance matrix is a function of utilized P jn matrices. Therefore, the asymptotic e ciency of the estimator is used as a criterion for the selection of P jn matrices from P n . Liu et al. (2010) suggest a set of quadratic and linear moment matrices such that the resulting GMME is theoretically more e cient than the quasi MLE. In this approach, the diagonal elements of the best quadratic and linear moment matrices are also used to formulate additional moment functions to increase e ciency.
To describe the best choice of matrices P jn , we de ne the following notation. For the regressors matrix, let X n = R n X n = (R n l n , R n X ⋆ n ), where l n denotes n × 1 vector of ones and X ⋆ n is the submatrix of X n with the intercept column deleted. The submatrix R n X ⋆ n = X ⋆ n is assumed to be n × k ⋆ matrix with column vectors denoted by X ⋆ nl for l = 1, . . . k ⋆ . 4 Let D(·) be the operator which either creates a matrix from the diagonal elements of an input matrix or returns a diagonal matrix if the input is a vector. Furthermore, let vec(·) be the operator that creates a column vector from the elements of an input matrix, and let vec D (·) be the operator that creates a column vector from the diagonal elements of an input matrix. Finally, for any n × n matrix A n , let A (t) n = A n − 1 n tr(A n )I n , and let A (s) n = A n + A ′ n . De ne the following quadratic moment matrices: n ), and P ⋆ l+5,n = D( X ⋆ nl ) (t) for l = 1, . . . , k ⋆ . Note that P ⋆ jn ∈ P n for j = 1, . . . , k ⋆ + 5. The linear moment matrices are , which, one might note, expands on the matrix Q n given above. Then, as shown by Liu et al. (2010), the best set of moment functions is given by where ε n (θ ) = R n (ρ)S n (λ)Y n − R n (ρ)X n β. This set of moment functions is best in the sense that any other moment function that can be added to this set does not increase the asymptotic e ciency of the GMME.
The negative expectation of the gradient matrix is given by The results in (3.1) and (3.2) indicate that consistent estimates of ⋆ n and ⋆ n require consistent estimates of θ 0 , σ 2 0 , µ 3 , and µ 4 . In practice, an initial estimator of θ 0 can be used to recover consistent estimates of these parameters. 6 Letθ 1 be an initial GMME (IGMME). Let n (θ 1 ) denote the estimate of ⋆ n recovered fromθ 1 . Then, the OGMME is de ned bŷ Under Assumptions 1-5, Liu et al. (2010) show that the GMM estimator de ned in (3.3) is consistent and asymptotically normally distributed, namely 7 In the case where X n has no column of ones, R n X n = R n X ⋆ n = X ⋆ n and k ⋆ = k. 5 Lemma 1 in Web Appendix A can be used to derive ⋆ n matrices in this section. 6 An initial estimator of θ 0 can be the one suggested in Prucha (2014). Another candidate is the initial GMM estimator (IGMME) suggested at the end of this section. In addition, the moment matrices also are functions of unknown parameters. With initial estimates, these matrices also became available. 7 The identi cation of parameters in the GMM framework requires lim n→∞ 1 n E(g n (θ 0 )) = 0 (Newey and McFadden, 1994). Lee (2007a) investigates this condition and state the identi cation conditions for parameters. Here, we simply assume that the parameters are identi ed.
An estimate for the covariance of √ n θ ⋆ n − θ 0 is needed for statistical inference. The result in (3.4) suggests that an estimate may be obtained from 1 is an estimate of ⋆ n recovered fromθ ⋆ n . Alternatively, one might form an estimate from 1 where n (θ ⋆ n ) is an estimate of ⋆ n recovered fromθ ⋆ n . Although these two procedures are asymptotically equivalent, little seems to be known about the relative merits of these two procedures in small samples (Newey and McFadden, 1994).
The estimator de ned in (3.3) is the best GMM estimator in the class of OGMME formulated from the set of linear and quadratic moment functions. When disturbances are i.i.d. normal, the best set of moment functions is given by n , and Q n = G n X n β 0 , X n . Liu et al. (2010) show that any other moment function that can be added to this set is redundant. Furthermore, the MLE is also characterized by this set of moment functions, which implies that the GMME based on this set is asymptotically equivalent to the MLE. When disturbances are simply i.i.d., the additional moment functions formulated from D G n , and D X ⋆ nl (t) for l = 1, . . . , k ⋆ increase asymptotic e ciency. The resulting GMME based on the bigger set g ⋆ n (θ ) is asymptotically more e cient than the quasi MLE . Note that, the e ciency gain from these additional moment functions depends on the speci cation of weight matrices and regressors.
For the case of SARAR(1, 0), the best set of moment functions follows from the more general case above. Let δ 0 = λ 0 , β ′ 0 ′ be the true parameter vector. Then, the best set of moments g ⋆ n (δ) uses Similarly, the best set of the moment functions for the case of SARAR(0, 1) may be obtained. Let η 0 = (ρ 0 , β ′ 0 ) ′ be the true parameter vector. Then, the best set of moments g Note that P ⋆ jn and Q ⋆ n also involve unknown parameters. In practice, an initial consistent estimator is used to construct these matrices. For example, an IGMME of θ 0 based on quadratic moment matrices and the linear moment matrix Q n = W n M n X n , W n X n , M n X n , X n may be used to construct these matrices when disturbance terms satisfy Assumption 1. Hence, the optimal GMM estimator de ned in (3.4) becomes feasible when there is an initial consistent estimator of θ 0 . The optimal weight, the quadratic, and linear moment matrices are constructed from the initial estimator. The replacement of P ⋆ jn , Q ⋆ n , and ⋆ n with their consistent estimates does not a ect the asymptotic properties of the optimal GMM estimators. 8 However, it may a ect the small-sample behavior of the estimators.

Finite sample variance correction method
In this section, the nite variance correction methodology of Windmeijer (2005) is extended for our spatial autoregressive model. We provide a general argument by considering the general set of population moment functions g n (θ 0 ) = ε ′ n P 1n ε n , . . . , ε ′ n P mn ε n , ε ′ n Q n ′ , where Q n is an n × r matrix of linear instruments, and P jn ∈ P n for j = 1, . . . , m.
The extension of the nite sample variance correction method to our case is not straightforward for mainly two reasons. First, our generic set of moment functions cannot be written at the observational level in such a way that the resulting moment functions are independent across observations. This arises simply because of the spatial dependence. For the framework considered in Windmeijer (2005), the set of moment functions is de ned at the observational level such that g in (θ ) is the sample moment function for the ith observation. Within this framework, a good candidate for an estimate of n is given by 1 n n i=1 g in (θ n )g ′ in (θ n ), whereθ n is a consistent initial estimator McFadden, 1994, p. 2155). This candidate is no longer valid for our case because of the spatial dependence implied by the model speci cation. The second reason is a byproduct of the rst and is related to the feasibility of the variance correction formula for our spatial model. In our case, an estimate of n is recovered from E g n (θ 0 )g ′ n (θ 0 ) . The correction method is based on the idea of taking a rst-order Taylor approximation of n (θ 1 ) around the true parameter vector. As we will show below, this step is not as straightforward as the one stated in Windmeijer (2005). In what follows, we present the details on how the correction method is extended to our model. Let −1 n be an arbitrary non-stochastic weighting matrix for the GMM objective function. The weighting matrix plays the role of a metric by which the sample moment functions are made as close as possible to zero. Furthermore, assume that n converges to a constant positive de nite matrix . The IGMM estimator is obtained by minimizing the objective function with respect to θ . Let Q n (θ ) = g ′ n (θ ) −1 n g n (θ ) be the objective function when the weight matrix is n . Then, the initial estimator is de ned asθ 1 = argmin θ Q n (θ ). 9 Under our stated assumptions, we have (3.5) Once we have a consistent estimator, we can construct an estimate of n for the optimal GMM estimation. Let n (θ 1 ) be the estimate of n constructed fromθ 1 . Similarly, we will denote the unknown n with n (θ 0 ) in the rest of this section. Then the OGMME is de ned asθ 2 = argmin θ Q n (θ 1 ) . Let C n (θ ) = ∂g n (θ) ∂θ ′ be the matrix of rst derivatives and F n (θ ) = ∂C n (θ) ∂θ be the matrix of second derivatives. 10 The rst and second derivatives of the GMM objective function are given by where B n (θ 0 ) is a (k + 2) × 1 vector and A n (θ 0 ) is a (k + 2) × (k + 2) matrix. When the weight matrix is n in the objective function, these matrices are denoted by B n (θ 0 ) and A n (θ 0 ), respectively. From the de nition of the GMM objective function, we have B n (θ 1 ) (θ 2 ) = 0 (k+2)×1 and B n (θ 1 ) = 0 (k+2)×1 . Then, ignoring the remainder terms, a rst-order Taylor series approximation of (3.6) around θ 0 for the OGMME and IGMME yields, respectively, Note that the expansion in (3.8) is also a function ofθ 1 . Therefore, a further expansion ofθ 1 around θ 0 yields is given by (see Windmeijer, 2005, p. 28) is the bias of an infeasible GMM estimator based on the unknown optimal weight matrix n (θ 0 ) = E g n (θ 0 )g ′ n (θ 0 ) . Hence, the rst term is a function of this bias, which tends to be small and generally remains constant as the number of moment functions increases. The remaining term in T n (θ 0 ) (θ 0 )[•, j] is the dominant term, and it becomes larger as the number of moment functions increases (Windmeijer, 2005, p. 28).
The result in (3.10) may be used to formulate a variance formula forθ 2 . The term T n (θ 0 ) (θ 0 ) θ 1 −θ 0 in (3.10) has an order of O p ( 1 n ) and vanishes as n → ∞, but it does not equal zero in nite samples. Therefore, accounting for this term in a variance formula forθ 2 may improve inference in nite samples. Equations (3.9) and (3.10) result in the following nite sample varianceθ 2 : The variance formula in (3.12) is infeasible as it involves the unknown θ 0 . Windmeijer (2005) circumvents this problem by simply replacing the unknown parameters with their consistent estimates. The same approach cannot be directly adopted for our case as the functional form for the estimate of n is now di erent. An estimate for the unknown term T n (θ 0 ) (θ 0 )[•, j] is recovered from the estimates ofθ 1 andθ 2 . The rst term in T n (θ 0 ) (θ 0 )[•, j] vanishes as B n (θ 0 ) (θ n ) = 0 (k+2)×1 for any consistent estimator θ n . Hence, the rst term of (3.11) does not play any role in the calculation of the variance ofθ 2 . Then, a consistent estimate of the jth column of T n (θ 0 ) (θ 0 ) for j = 1, . . . , k + 2 is given by More explicitly, the terms in (3.13) are given by where C n (θ 2 ) = ∂g n (θ) ∂θ ′ θ 2 and F n (θ 2 ) = ∂C n (θ) ∂θ θ 2 . Note that the estimates C n (θ 2 ), F n (θ 2 ), and g n (θ 2 ) are simply obtained by evaluating C n (θ ), F n (θ ), and g n (θ ) at the consistent estimateθ 2 . On the other hand, the consistent estimates n (θ 1 ) and ∂ n (θ) ∂θ j θ 2 are recovered from the corresponding results derived under the expectation operator.
Next, we provide explicit expressions for E ∂g n (θ) ∂θ j g ′ n (θ ) θ 0 + g n (θ ) ∂g ′ n (θ) ∂θ j θ 0 for j = 1, . . . , k + 2. Let bk(·) be the matrix operator that creates a generalized block diagonal matrix from a given list of matrices. For example, if c = A r×k , B r×l , then bk c = A r×k 0 r×l 0 r×k B r×l . With this new notation, we have the following result with respect to λ: mn , I m is the m × m identity matrix, l m is the m×1 vector of ones, and ⊗ denotes the kronecker product. In a similar fashion, with respect to ρ we obtain . Finally, with respect to β j for j = 1, . . . , k, we obtain where X nj is the jth column of X n . Note that under our stated assumptions, estimates of (3.16), (3.17), and (3.18) are recovered by replacing (θ ′ 0 , σ 2 0 , µ 3 , µ 4 ) ′ with their consistent estimates (θ ′ n ,σ 2 n ,μ 3 ,μ 4 ) ′ . 11 Whenθ 2 is used to estimate (3.16), (3.17), and (3.18), the resulting estimate of E ∂g n (θ) . Then, inserting these results into (3.12) yields an estimate of the variance ofθ 2 that incorporates the variation in the IGMMEθ 1 : Using (3.9), the estimate of variance ofθ 1 is obtained as (3.20) The estimate of the variance ofθ 2 in (3.19) accounts for the variation that stems from the initial estimatorθ 1 . Windmeijer (2005Windmeijer ( , 2008 shows that this nite sample correction provides a better approximation to the empirical standard deviations of the estimator when the estimator itself is not substantially biased. The Monte Carlo studies in Lee (2007a), Lin and Lee (2010), and Liu et al. (2010) indicate that the bias of the best GMME of the spatial autoregressive model is negligible. Hence, the correction formula provided in (3.19) can be useful for statistical inference.

Direct, indirect and total e ects
In this section, we describe the marginal e ects (impact measures) accounting within the context of our spatial model. The partial derivative of the dependent variable with respect to kth explanatory variable yields the following matrix of marginal e ects: where β k0 is the kth component of β 0 . The result in (4.1) indicates that the marginal e ects of a change in X nk is represented by an n × n matrix. The diagonal elements of this matrix ∂Y ni /∂X n,ki contain the own-partial derivatives, while the o -diagonal elements represent the cross-partial derivatives ∂Y nj /∂X n,ki . LeSage and Pace (2009) de ne the average of the main diagonal elements of this matrix as a scalar summary measure of direct e ects, and the average of o -diagonal elements as a scalar summary measure of indirect e ects. The sum of direct and indirect e ects is labeled as the total e ects. We consider two methods for the calculation of dispersions of these impact measures: (i) a simulation approach suggested by LeSage and Pace (2009), and (ii) the Delta method (Debarsy et al., 2015). The simulation approach utilizes the parameter estimates and the estimated covariance matrix of a consistent estimator. Let L n be a lower-triangular matrix recovered from the Cholesky decomposition of Var θ n , and ϑ n be a random vector that has a multivariate standard normal distribution. Then, random draws of the parameter vector are generated according to θ r = L n ϑ r n +θ n , for r = 1, . . . , R. (4.2) For each θ r , the direct, indirect, and total e ects can be calculated using (4.1). The mean and the standard deviation calculated from each sequence of impact measures can be used as the point estimate and the standard error of the corresponding impact measure.
The second approach is the conventional method based on the Delta method. The result in (4.1) indicates that the estimate of direct e ect is 1 n tr S −1 n (λ n )β nk . By the mean value theorem, where A 1n = 1 n tr S −1 n G n β k0 , 1 n tr S −1 n , and B n is the asymptotic covariance of √ n λ n − λ 0 ,β nk − β k0 ′ . The result in (4.3) indicates that the asymptotic variance of direct e ects can be estimated by 1 nÂ 1nBnÂ ′ 1n , whereÂ 1n = 1 n tr S −1 n (λ n )G n (λ n )β nk , 1 n tr S −1 n (λ n ) , andB n is the estimated asymptotic covariance of √ n λ n − λ 0 ,β nk − β k0 ′ .

Hypothesis testing with the GMME
In this section, we consider a Wald test procedure that is familiar within the general GMM framework for testing linear and nonlinear restrictions on the parameter vector θ 0 . We provide details about how to adjust the formulation of this test statistic for the spatial SARAR(1, 1) model. Let r : R p → R q be a twice continuously di erentiable function, and assume that its derivative matrix R = ∂r(θ )/∂θ ′ has rank q. 12 Letθ n be an unrestricted optimal GMME. Then, the Wald test statistic for q linear restrictions of the form r(θ 0 ) = 0 has a limiting chi-square distribution with q degrees of freedom, namely wherer n = r(θ n ) andR n = ∂r(θ) ∂θ ′ θ n . The Wald test statistic may be calculated with the variance formulas given (3.4), (3.5), and (3.19).
We also consider a bootstrapping method for the distribution of the Wald test statistic in our GMM framework for the null hypotheses of H 0 : λ 0 = 0 and H 0 : ρ 0 = 0. Hall and Horowitz (1996), Brown and Newey (2002), and Bond and Windmeijer (2005) outline bootstrap methods to determine nite sampling distribution of Wald statistics. In the spatial econometrics literature, examples of these methods may be found in Anselin (1988), Pinkse andSlade (1998), Fingleton (2009), Burridge and Fingleton (2010), and Jin and fei Lee (2013. According to the bootstrap approach, the empirical distribution of a test statistic is constructed by resampling data. For our model, sampling from Y n , W n Y n , and X n cannot be done without preserving the inherent spatial dependence structure of the model. Instead, Anselin (1988) considers bootstrapping of residuals when disturbances of the model are i.i.d. Given a consistent estimatorθ n , the vector of residuals isε n = R n (ρ n ) S n (λ n )Y n −X nβn . Under Assumption 1, we demean residuals to obtainε n = J nεn , where J n = I n − 1 n l n l ′ n . The bootstrap sample of residuals, denoted by ε * n , is generated with random draws from the empirical distribution of the demeaned residuals. Then, the vector of random draws ε * n is used to generate Y * n = S −1 n (λ n ) X nβn + R −1 (ρ n )ε * n . As a result, a bootstrap sample of (Y * in , X n,i• ′ ) : i = 1, . . . , n , where X n,i• ′ is the ith row of X n , is obtained. For our GMM set up, the bootstrapped Wald test of H 0 : λ 0 = 0 is implemented with the following procedure: (i) Compute the best GMMEθ n stated in (3.3), and the resultingε n = J n R n (ρ n ) S n (λ n )Y n − X nβn . Calculate t λ =λ n /Se(λ n ), where Se(λ n ) is the estimated asymptotic standard errors ofλ n . (ii) Draw ε ⋆ n fromε n through sampling with replacement. Generate Y * n = S −1 n (λ n ) X nβn + R −1 (ρ n )ε * n . (iii) Formulate an initial GMME with the bootstrap version of moment functions: , Q n = W n M n X n , W n X n , M n X n , X n , and ε * n (θ ) = R n (ρ)S n (λ)Y * n − R n (ρ)X n β. De neθ b1 = argmin θ g * ′ n (θ )g * n (θ ). (iv) Usingθ b1 and corresponding residuals, construct an estimate of ⋆ n given in (3.1) and denote it withˆ * n . Usingθ b1 , generate the best set of moment functions g * n (θ ). De ne the bootstrap optimal GMME byθ b2 = argmin θ g * ′ n (θ )ˆ * −1 n g * n (θ ). (v) Usingθ b2 and corresponding residuals, nd an estimate of ⋆ n and denote it withˆ * n . Obtain an estimate of the variance ofθ b2 from (ˆ * ′ nˆ * −1 nˆ * n ) −1 . Obtain the bootstrap standard error ofλ b2 and denote it with Se(λ b2 ). Then, calculate t b λ = λ 2b −λ n /Se(λ b2 ). (vi) Repeat steps (ii)-(v) B times and calculate the bootstrap p-value of p λ = 1 B B b=1 1 |t b λ | > |t λ | . Then, the test rejects the null at nominal size α if p λ < α. As for a test of H 0 : ρ 0 = 0, simply compute t ρ in step (i) and t b ρ in step (v), and evaluate these values similar to step (vi).
Given the result in (3.4), t λ has an asymptotic standard normal distribution, and hence the inference is based on this asymptotic distribution. On the other hand, the distribution function of t b λ obtained from the bootstrap method is an approximation to the exact nite sample distribution function of t λ and can di er from the asymptotic distribution. 13

Design
For the model speci ed in (2.1), we assume that there are are two regressors and no intercept term such that X n = X 1,n , X 2,n and β 0 = (β 10 , β 20 ) ′ = (1, 1) ′ , where X 1,n and X 2,n are n × 1 vectors of variables. The i-th observation of disturbance term ε in is i.i.d. Normal(0, 1). The spatial autoregressive parameters λ 0 and ρ 0 take values from the set K = {−0.6, −0.3, 0, 0.3, 0.6} in order to allow for both weak and strong spatial interactions.
As for X 1,n and X 2,n , we use the county-level data set of Pace and Barry (1997) on the 1980 presidential election. We set X 1,n equal to the standardized value of income per-capita and X 2,n equal to the 13 The Wald test is not the only test statistic that could be used for hypothesis testing. We refer the reader to Web Appendix C for the formulation of ve other test statistics (the LM test, the C(α) test, the minimum chi-square test (MC), and two criterion-based tests (D NW , and D RU )), and we report on their performance in Web Appendix E.
standardized value of the homeownership rate. The data set describes 3,107 U.S. counties, of which we use the rst n observations in the Monte Carlo study. The weight matrix W n is based on the interaction scenario described in Arraiz et al. (2010). The observations are distributed across four quadrants of a space in such a way that the number of observations in each quadrant can be arranged to allow for sparse or dense quadrants. The location of each unit across the space is determined by the xy-coordinates over a square grid. Let m andm be two integers. Then, the units in the northeast quadrant of the space have discrete coordinates satisfying m + 1 ≤ x ≤m and m + 1 ≤ y ≤m, with an increment value of 0.5. For the other quadrants, the location coordinates are integers satisfying 1 ≤ x ≤ m, 1 ≤ y ≤m, and 1 ≤ x ≤m, 1 ≤ y ≤ m. The distance d ij between any two units i and j, located respectively at (x 1 , y 1 ) and (x 2 , y 2 ), is measured by the Euclidean distance given by d ij = (x 1 − x 2 ) 2 + (y 1 − y 2 ) 2 1/2 . Then, the (i, j)th element of the W n , w n,ij , equals 1 if 0 ≤ d ij ≤ 1 and zero otherwise.
Varying For the spatial weight matrix M n , we consider a distance-based binary matrix. Let J i be the set of the nearest 10 observations to the ith observation based on d ij . Then, the (i, j)th element of M n , m n,ij , equals 1 if j ∈ J i and zero otherwise.
Both W n and M n are row normalized. For each speci cation, resampling is repeated 1,000 times. We examine the following estimators: (i) the IGMME, (ii) the OGMME, and (iii) the MLE. For each estimator, we provide bias, estimated asymptotic standard errors (OGMME-Asy.), and empirical standard deviations (Emp.). Besides these results, we also provide the corrected standard errors (OGMME-Crc.) in the case of OGMME. 14

Estimation results
The results are easily interpretable when presented in graphical form, and in doing so, we highlight the standard error estimators. To this end, we compute the percentage deviation of the estimated asymptotic and corrected standard errors from the corresponding empirical standard deviations; i.e., we compute 100 × (Asy. − Emp.)/Emp. and 100 × (Crc. − Emp.)/Emp. We then plot these percentage deviations against 25 combinations of (λ 0 , ρ 0 ) separately for each (n, d) value. Following Bond and Windmeijer (2005, p. 18), we use the term "bias" for these deviations. If the asymptotic distribution of an estimator performs well enough, the bias should be near zero. Conversely, a large bias suggests that, for the given estimator, the asymptotic distribution approximates the nite sample distribution poorly. Figure 1 presents the results forρ n . There is a downward bias in the case of OGMME-Asy., OGMME-Crc., and MLE. The direction of the bias for the IGMME changes from positive to negative as the northeast quadrant density changes from low to high. The estimated asymptotic standard errors of IGMME can exceed the corresponding empirical standard errors by more than 10%. In the case of the MLE, spikes for ρ 0 = −0.6 result in a downward bias of up to 50%. Except for these spikes, the estimated asymptotic standard errors of the MLE are relatively closer than those of OGMME to the corresponding empirical standard deviations. In the case of OGMME, the correction method provides standard errors that are closer to the empirical standard deviations. In Fig. 1(b), where the density of the northeast quadrant is high, all estimators have a downward bias. The size of the bias is the smallest in the case of the correction method for all combinations of (λ 0 , ρ 0 ). In Figs. 1(c) and 1(d), where the results for the large samples are presented, the same pattern is also observed. That is, the standard errors based on the correction method are relatively closer to the corresponding empirical standard deviations.
The results forλ n are presented in Fig. 2. In Figs. 2(a) and 2(b), the magnitude of the percentage deviations are similar for the OGMME-Asy. and the OGMME-Crc. In both gures, the bias is generally less than 5%, and it is relatively larger in Fig. 2(a). As can be seen from these gures, the correction method generally provides an improvement for the statistical inference as they are closer to the empirical standard deviations. Again for the combinations of autoregressive parameters where ρ 0 = −0.6, the MLE results display spikes, but this time with smaller amplitudes than forρ n . The asymptotic method based on the IGMME provides standard error estimates that are biased by 25% or more. As the northeast quadrant density increases, biases become smaller. Biases are generally less than 5% in the case of OGMME-Asy., OGMME-Crc., and MLE for all (λ 0 , ρ 0 ) combinations. We summarize the main ndings of our simulation experiment in the following list with a focus on results reported for the OGMME-Crc. 15 : (i) Across the various (λ 0 , ρ 0 ) combinations, the three estimators of θ 0 have a negligible bias.
(ii) The size of the northeast quadrant in our experiment a ects the precision of estimators. When d is larger, the empirical standard deviations of the IGMME forρ n andλ n are relatively larger. The MLE forλ n ,β 1n , andβ 2n has relatively smaller empirical standard deviations, and the standard errors of the OGMME-Crc. forρ n is relatively smaller. (iii) Forρ n , when n = 485 or n = 486, our results indicate that the downward bias in the corrected standard errors is less than 5%, while the downward bias in the estimated asymptotic standard errors is generally larger than 5%. When n = 945 or n = 974, the corrected standard errors still perform better than the estimated asymptotic standard errors and again they are less than 5% in almost all cases. In order to give an overall picture, we compute for each estimator the proportions of percentage deviations that fall within a (−5%, 5%) interval across all combinations of (λ 0 , ρ 0 ) and (n, d) values. These proportions are (i) 0.33 for the IGMME, (ii) 0.40 for the OGMME-Asy., (iii) 0.89 for the OGMME-Crc., and (iv) 0.67 for the MLE. (iv) The corrected standard errors forλ n has relatively less downward bias than the estimated asymptotic standard errors. If we consider the proportions of percentage deviations that fall within a (−5%, 5%) interval for each estimator, we have (i) 0.27 for the IGMME, (ii) 0.80 for the OGMME-Asy., (iii) 0.88 for the OGMME-Crc., and (iv) 0.84 for the MLE. (v) Forβ 1n , both the OGMME-Asy. and the OGMME-Crc. have a downward bias of more than 5% in some cases. The proportion of percentage deviations that fall within a (−5%, 5%) interval equals (i) 0.57 for the IGMME, (ii) 0.36 for the OGMME-Asy., (iii) 0.24 for the OGMME-Crc., and (iv) 0.93 for the MLE. It is obvious that the MLE is performing best in this case. (vi) Onβ 2n , both the OGMME-Asy. and OGMME-Crc. provide similar results with a bias amount less than 5% in many cases. The proportion of percentage deviations that fall in a (−5%, 5%) interval is (i) 0.48 for the IGMME, (ii) 0.83 for the OGMME-Asy., (iii) 0.95 for the OGMME-Crc., and (iv) 0.92 for the MLE.

Inference results
For each test, we consider the empirical rejection frequencies under the null hypothesis of the form H 0 : λ 0 = 0 or H 0 : ρ 0 = 0. We use p-value plots suggested by Davidson and MacKinnon (1998) to illustrate the actual size of these tests under the null hypothesis for various nominal size values. Figures 3 and 4 are the p-value plots for the null hypothesis of no spatial dependence, respectively, in the disturbance term and in the dependent variable. We present the results for the Wald tests and its bootstrap version for two representative cases and leave the results for other cases and other test statistics to Web Appendix E. In the gures, W 1 , W 2 , W 2c , W m , and W 2b denote respectively the Wald tests based on the IGMME, OGMME-Asy, OGMME-Crc, MLE, and the bootstrap version. Conventional signi cance levels in hypothesis tests are o en chosen to be 10% or smaller. Therefore, we use nominal size values that are less than or equal to 12% in these gures. If the asymptotic approximation used for the distributions of these test statistics perform well, we would expect that the line graphs in these gures to follow closely the 45 • line. Hence, a discrepancy from the 45 • line indicates that the sampling distribution of the test statistic di ers from the asymptotic distribution that is used to establish the critical values.
First, we evaluate properties of tests under the null of H 0 : ρ 0 = 0. For the null hypothesis under consideration, we only present the p-value plots for the case of λ 0 = 0.3 in Fig. 3. The p-value plots for all other cases are provided in the web appendix. The salient properties of these tests are summarized as follows: (i) The Wald test based on the IGMME (W 1 ) in Fig. 3 is not properly sized. It is undersized when the northeast density is low and is over-sized when the northeast density is high. The increase in the sample size does not improve the size property of W 1 but rather makes it worse for the low density case. (ii) The standard Wald test based on the OGMME-Asy (W 2 ) is over-sized in all cases which con rms previous ndings in the literature on non-spatial econometric models. Therefore, inference based on this test could be very misleading. For example, at the nominal size of 6%, this test falsely reject the null in around 10% of the replications. This test performs worse than the W m and W 2c tests in all cases. This result is principally due to the downward bias in the estimated asymptotic standard errors of the OGMME reported in Section 6.2. (iii) For large sample sizes of n = 945 and n = 974, W 2 seems to improve but is over-sized relative to W m and W 2c tests. The size of the northeast quadrant appears to a ect the performance of this test in small sample sizes. (iv) The corrected version based on the OGMME-Crc (W 2c ) outperforms W 2 in all cases, which con rms our results reported in Section 6.2. Moreover, it performs as well as the MLE-based test (W m ). Thus, the correction method improves the size properties of the Wald test, which is consistent with the simulation results presented in Section 6.2. (v) The MLE based Wald test (W m ) has a proper size in all cases: the asymptotic distribution derived under the null hypothesis provides a good approximation to the nite sample distribution. (vi) As the bootstrapped version of the Wald test (W 2b ) is computationally intensive, we have computed results only for the OGMME and only for the cases where n = 485 and n = 486. 16 The W 2b test performs better than the W 2 test and is slightly over-sized for larger nominal size values.
In comparison with the W m and W 2c tests, the size is slightly worse. (vii) The size properties of remaining tests, LM, MC, D NW , D RU , and C(α), are evaluated in detail in Web Appendix. 17 Overall, the D NW and D RU tests perform better than the other tests. The C(α) test is properly sized in many cases and performs better than LM and MC. Next, we evaluate the size properties of these tests under the null of H 0 : λ 0 = 0. For the null hypothesis under consideration, we only present the p-value plots for the case of ρ 0 = 0.3 in Fig. 4. 16 For each resample, the number of bootstrap samples is limited to 99. This choice for the number of bootstrap samples satis es the rule of thumb provided by Davidson and MacKinnon (1998): (99 + 1) × 0.01 is a positive integer. A larger number of bootstrap samples is computationally challenging for the spatial model under consideration. 17 Here, MC denotes the minimum chi-square test, and D NW and D RU denote criterion based tests. For details, see Web Appendix C.
(v) The MLE based Wald test is undersized when (n, d) = (485, 25%) and is slightly over-sized when (n, d) = (945, 25%) for larger nominal values. As the northeast quadrant density increases, the actual sizes converge to the nominal sizes. (vi) The size properties of remaining tests are examined in the Web Appendix. Both LM and MC tests are over-sized in all cases and are not recommended for applied research to test this null hypothesis. The two criterion based tests, D NW and D RU , are properly sized only when there exist negative spatial dependence in the disturbance term. The p-value plots for the C(α) test are close to the 45 • line in many cases, indicating that this test is properly sized.

Results on the e ects estimators
We use the same Monte Carlo setup described in Section 6.1 to study the nite sample properties of the estimators of the impact measures de ned in Section 4. For the sake of brevity, we consider only (n, d) = {(486, 75%), (485, 46%), (485, 25%), (974, 75%), (945, 46%), (945, 25%)}. Besides the IGGME, OGMME, and MLE, we also consider the performance of the multi-step estimator (GS2SLSE) of Drukker et al. (2013) in this analysis. 18 For each impact measure, we provide the bias, the estimated X 1,n X 2,n X 1,n X 2,n X 1,n X 2,n X 1,n X 2,n X 1,n X 2,n X 1,n X 2,n n = 486, d = 75% asymptotic standard errors, the empirical standard deviations, and the rejection rates of Wald test for a nominal size value of 0.05. The simulation results are given in 36 tables presented in Web Appendix G. Here, we summarize these results by providing scalar measures. We start by comparing the bias properties of e ect estimates across estimators. To evaluate the performance of estimators in terms of bias, we calculate the average absolute bias for each impact measure over 25 combinations of (λ 0 , ρ 0 ). These results are given in Table 1. In general, the average absolute bias reported by each estimator is less than 0.05 for each impact measure. The IGMME and GS2SLSE impose slightly larger bias on impact measures. Table 1 also indicates that the Delta method performs slightly better than the simulation method. The magnitude of bias is slightly larger when the density of the northeast quadrant is high.
Next, we compare the estimated standard errors with the empirical standard deviations for each estimator. For the total e ects, we provide the percentage deviations of the estimated standard errors from the empirical standard deviations in Fig. 5 for the case of (n, d) = (486, 75%). Generally, the percentage bias falls within the (−5%, 5%) interval for each estimator, except in the case of IGMME. To give an overall picture, we calculate the proportions of percentage deviations that fall within (−5%, 5%) across all combinations of (λ 0 , ρ 0 ) and (n, d). For the simulation method, these proportions for the total e ects of X 1,n are 0.48 for the IGMME, 0.746 for the OGMME-Asym., 0.68 for the OGMME-Crc., 0.94 for the MLE, and 0.933 for the GS2SLSE, and for X 2,n these rates are 0.526 for the IGMME, 0.80 for the OGMME-Asym., 0.88 for the OGMME-Crc., 0.953 for the MLE, and 0.933 for the GS2SLSE. For the Delta method, these proportions for the total e ects of X 1n are 0.453 for the IGMME, 0.70 for the OGMME-Asym., 0.74 for the OGMME-Crc., 0.913 for the MLE, and 0.926 for the GS2SLSE, and for X 2n these rates are 0.506 for the IGMME, 0.873 for the OGMME-Asym., 0.933 for the OGMME-Crc., 0.92 for the MLE, and 0.94 for the GS2SLSE. While the IGMME performs relatively poorly, the MLE and the GS2SLSE perform relatively better. In general, the OGMME-Crc performs better than OGMME-Asy, which is consistent with our results in Sections 6.2 and 6.3. The simulation method seems to be performing slightly better than the Delta method in many cases. The standard errors of impact estimators seem to be a ected by the density of northeast quadrant, and the proportions of percentage deviations that falls in the (−5%, 5%) interval are relatively higher for the case of (n, d) = (485, 46%). Simulation Delta X 1,n X 2,n X 1,n X 2,n X 1,n X 2,n X 1,n X 2,n X 1,n X 2,n X 1,n X 2,n n = 486, d = 75% In Table 2, we report the average size distortions of the Wald tests over 25 combinations of (λ 0 , ρ 0 ) for each impact measure for a nominal size of 0.05. 19 The average size distortions are smaller than 0.03 in all cases, and the Wald tests based on IGMME have relatively larger average size distortions. Overall, the average size distortions of the Wald tests based on OGMME-Crc are smaller than those based on OGMME-Asy, which suggests that the correction method can lead to more accurate inference.

Conclusion
In this study, we review the GMM estimation approach for spatial autoregressive models and evaluate the nite sample properties of the GMM estimators through simulation studies. To the best of our knowledge, this study is the rst extensive study to evaluate various GMM inference techniques for spatial autoregressive models. We extend the nite sample correction method of Windmeijer (2005) to the SARAR(1,1) model and formulate a variance formula to improve inference properties of the GMM estimator for detecting the presence of spatial dependence. We show that the corrected standard errors can be a good alternative to their estimated asymptotic counterparts.
Simulation studies con rm that the estimated asymptotic standard errors based on the OGMME can be substantially downward biased and can make inference problematic for the spatial autoregressive parameters. This result is consistent with the simulation results reported in the econometrics literature on the nonspatial models. More speci cally, we evaluate the percentage deviations of the corrected standard errors from the empirical standard deviations under various scenarios. The results show that the bias in the corrected standard errors of OGMME and the e ects estimators (or the impacts estimators) is mostly less than the bias in the estimated asymptotic standard errors of OGMME and the e ects estimators. Hence, the correction method can provide more accurate inference in nite samples.
For the SARAR(1,1) model, we also compare the size properties of various test statistics through simulations. These tests included standard asymptotic Wald tests based on MLE and GMMEs, a bootstrapped version of Wald test, two versions of C(α) test, the standard LM test, the minimum chisquare test, and two versions of GMM criterion test. Our results show that the standard asymptotic Wald test based on the asymptotic standard errors of the OGMME is generally oversized, whereas the Wald test based on the corrected standard errors of the OGMME is properly sized. Our ndings also con rm that the GMM criterion tests and the C(α) test can be useful for detecting the presence spatial dependence. All of these ndings o er guidance to applied researchers who estimate and test spatial models with the GMM estimators.