QML and Efficient GMM Estimation of Spatial Autoregressive Models with Dominant (Popular) Units

Abstract This article investigates QML and GMM estimation of spatial autoregressive (SAR) models in which the column sums of the spatial weights matrix might not be uniformly bounded. We develop a central limit theorem in which the number of columns with unbounded sums can be finite or infinite and the magnitude of their column sums can be if . Asymptotic distributions of QML and GMM estimators are derived under this setting, including the GMM estimators with the best linear and quadratic moments when the disturbances are not normally distributed. The Monte Carlo experiments show that these QML and GMM estimators have satisfactory finite sample performances, while cases with a column sums magnitude of O(n) might not have satisfactory performance. An empirical application with growth convergence in which the trade flow network has the feature of dominant units is provided. Supplementary materials for this article are available online.


Introduction
Spatial econometrics can be applied to empirical fields with strategic interactions or spillover effects. There are various model specifications for cross sectional data, and the spatial autoregressive (SAR) model proposed by Cliff and Ord (1973) is the most popular one. To estimate the SAR model, various estimation methods such as the two-stage least square (2SLS) method by Kelejian and Prucha (1998), quasi-maximum likelihood (QML) estimators by Lee (2004) and generalized method of moments (GMM) estimation by Lee (2007) have been proposed. Recently, there has been much progress in the specification and extension of SAR models, such as the specification of endogenous spatial weights matrix (Qu and Lee 2015;Qu, Lee, and Yang 2021), a SAR model with numerous covariates (Gupta and Robinson 2015;Martellosio and Hillier 2020), a SAR model with limited dependent variables (Xu and Lee 2015a, 2015b, 2018, semiparametric and nonparametric estimation of spatial models (Robinson 2010;Robinson 2011;Su and Jin 2010;Su 2012;Malikov and Sun 2017), etc.
In the established spatial econometrics theory, the row and column sums of the spatial weights matrix are assumed to be uniformly bounded (UB), which is analogous to the absolute summability in the time series literature. While UB row sums are natural in a SAR model, UB column sums might not be natural in some situations. Recently, there is some interest in relaxing this restriction on uniform boundedness in column sums. Behrens, Ertur, and Koch (2012) develop a dual gravity model to investigate the multilateral resistance of trade flows among states and provinces in the United States and Canada in which the theory-based spatial weights matrix is not UB. Jin, Lee CONTACT Jihai Yu jihai.yu@gmail.com Peking University, Beijing, China. Supplementary materials for this article are available online. Please go to www.tandfonline.com/UBES. 1 Other specifications include the spatial Durbin model (SDM) and spatial error model (SEM) in which the error could be a SAR or spatial moving average (SMA) process. A more general specification is a SAR model with SAR process in the error (SARAR) or a SAR model with SMA process in the error (SARMA).
and Yu (2021) develop an asymptotic theory for the estimation of the dual gravity model by using a special feature of the spatial weights matrix in the trade theory. Olejnik and Olejnik (2020) provide a positive solution to relax the UB property of the spatial weights matrix by imposing boundedness in the spectral norm instead. Pesaran and Yang (2020a) introduce the notions of dominant units in networks, and extremum estimators of the degree of dominance in networks are proposed. Pesaran and Yang (2020b) develop a central limit theorem (CLT) for a SAR model with dominant units and investigate a sectoral price change with dominant sectors present in the U.S. economy. In some social network data, there could be superstars in the network such that their in-degrees are high. These are situations in which a spatial weights matrix (interaction matrix) would not be UB in the column sums. When there are dominant (popular) units, the CLT developed by Kelejian and Prucha (2001) using a martingale difference sequence cannot be used directly in spatial econometrics because the stated regular sufficient conditions for the validity of the CLT heavily rely on the UB property of the relevant spatial weights matrix. To address dominant units in a spatial weights matrix W n , Pesaran and Yang (2020b) extend the CLT developed by Kelejian and Prucha (2001) to allow the presence of a finite number of dominant units, which can be applied to GMM estimation in which a spatial weights matrix is used to construct a quadratic moment. The order of the column sums of those dominant units in their extension can only be O(n δ ) with 0 ≤ δ < 1/2. However, when using the best quadratic moment in the GMM setting or using an ML estimation, the quadratic matrix involved would be G n = W n (I n − ρ 0 W n ) −1 instead of W n , which might not have a finite number of unbounded column sums as n increases, unless some conditions on the magnitude of those column sums are specified (see Pesaran and Yang 2020a). This can be the case when nondominant units might eventually influence the dominant units. As the dominant units in W n might influence many other units through some higher levels of neighboring units, those nondominant units would influence many other units via the columns of G n as a connection matrix if ρ 0 were not zero. In addition, the condition 0 ≤ δ < 1/2 could be too restrictive for some empirical applications, for example, the spatial weights matrix in the dual gravity model proposed by Behrens, Ertur, and Koch (2012).
This article aims to develop a CLT that addresses some of the above issues. Specifically, we rely on Liapunov conditions to derive a CLT for linear and quadratic forms of disturbances, where the magnitude of the column sums of those dominant units can be O(n δ ) with 0 ≤ δ < 1, and we allow the possibility that the number of dominant units m n would not be a finite number but increase as n tends to infinity. The CLT can be applied to not only GMM estimators using W n as a quadratic matrix but also relatively more efficient GMM with an additional quadratic matrix based on G n . We investigate the asymptotic distributions of QML and best GMM estimators under the presence of dominant units where the spatial weights matrix is not necessarily UB in the column sums. The asymptotic results in the this article can be applied in various empirical settings in which dominant (popular) units (players) are an issue, such as large social networks with superstars and international trade export flows with large countries.
The article is organized as follows. Section 2 presents an LLN and a CLT for linear and quadratic forms with dominant (popular) units. We show that under the presence of a finite eighth moment, the CLT can be established to allow the column sums to have the magnitude O(n δ ) with 0 ≤ δ < 1. Section 3 derives the asymptotic distribution of QML estimators under unbounded column sums, where the number of unbounded columns can be finite or infinite. Section 4 investigates a GMM estimation and an estimation with the best linear and quadratic moments when the disturbances are not normally distributed. Section 5 presents the Monte Carlo results of QML and GMM estimators and compares their finite sample performances when the number of unbounded columns and magnitude degrees of the column sums vary. Section 6 provides an empirical application of growth convergence in which the spatial weights matrix constructed from international trade flow has the feature of dominant units. Section 7 concludes the article. Due to space limitations, some useful lemmas and proofs are presented in the supplementary materials along with more simulation results.

A CLT with Dominant Units
For a (sequence of n × n) matrix A n , the row sum is A n ∞ = sup 1≤i≤n n j=1 a ij,n , and the column sum is A n 1 = sup 1≤j≤n n i=1 a ij,n . A matrix A n is UB in the row sum in the absolute value if sup n≥1 A n ∞ < ∞. The UB in the row sum of a SAR model is natural as the neighbors' influence would not be extremely large to dominate the outcomes of a spatial unit. This section aims to develop an LLN and a CLT of linear and quadratic forms with dominant units.

CLT Related to W n
An LLN will help us establish the consistency of the estimators. The following proposition presents convergence of relevant statistics by using Chebyshev's inequality.
Proposition 1. Let ε n be an n-dimensional column vector of iid (0, σ 2 0 ) disturbances with finite variance σ 2 0 < ∞ and E|ε ni | 4 < ∞, and d n be an n-dimensional column vector of an exogenous variable with uniformly bounded elements with respect to n. Let {A n } be a sequence of nonstochastic matrices that satisfy sup n A n ∞ < ∞, that is, UB in the row sums, and We notice that if d n is G n X n β 0 where G n is a row sum norm bounded and elements of X n are bounded, the elements of d n will also be uniformly bounded. However, in the uniform convergence in the consistency proof, we might have terms, such as 1 n d n A n ε n , which can have a different order because elements of d n A n are not bounded when the column sum of A n is not bounded. This is our motivation for developing the LLN in Proposition 1.
In the following, we aim to investigate the CLT of 1 √ n ε n A n ε n − σ 2 0 tr(A n ) , where A n is an n × n square matrix. The A n can be a spatial weights matrix W n or its related G n = W n S −1 n , where S n = I n − ρ 0 W n . Lemmas S.1-S.5 in the supplementary materials are helpful in obtaining the magnitude of moments of quadratic forms under the features that A n ∞ = O(1) and A n 1 = O(n δ ) with 0 ≤ δ < 1. The uniform boundedness in the row sum property of A n , that is, A n ∞ = O(1), implies that all elements in A n are uniformly bounded and, hence, A n 1 = O(n). The assumption with 0 ≤ δ < 1 means that an additional condition is imposed in deriving the CLT. When the disturbances are not normally distributed, Lemma S.4-S.5 provide the relevant orders.
Denote B n = 1 2 (A n + A n ), which is symmetric. Then, the quadratic form of CLT can be written as where {X ni , I ni } is an adapted double array stochastic sequence. We observe that this adapted stochastic sequence {X ni , I ni } is a martingale difference sequence as E(X ni |I n,i−1 ) = 0. To prove the martingale difference sequence CLT, we need to establish some well-known regularity conditions, namely, condition (a): → 0 for ν > 0 and condition x,n = 1 n var( n i=1 X ni ). Recall that we consider the situation in which A n is uniformly bounded in the row sum, but its columns can be unbounded. Without loss of generality, assume that the first m columns of A n have unbounded column sums with the magnitudes O(n δ j ) for δ j > 0 and j = 1, . . . , m, where m < n. Thus, by denoting δ = max j=1,...,m δ j , each of these m columns has the column sum of the uniform order O(n δ ), and the remaining n − m columns have the uniform order O(1) in their column sums. 3 Denote b in = (2b i1 , 2b i2 , . . . , 2b i,i−1 , b ii , 0, . . . , 0) and B in = e in b in where e in is the n-dimensional unit column vector with its ith component being 1 and the other entries being 0. 4 Then, X ni can be rewritten as a quadratic form as follows: where the forms of k * i 's and coefficients γ i 's can be inferred from Section S.1 in the supplementary materials. 5 Due to the special form of B in , we have Therefore, from the power expansions and Lemma S.1, we have tr [(B in Pesaran and Yang (2020b) argue that the regularity condition (a) requires 0 ≤ δ < 1 2 , condition (b) requires 0 ≤ δ < 1, and m is a finite number that does not depend on n. By checking the fourth moment of E|X ni |, we show that the conditions (a) with ν = 2 and (b) can be satisfied in the situation with 0 ≤ δ < 1 when m is finite or infinite, that is, in the latter case, m can be a sequence m n that diverges to infinity as n tends to infinity. Our result generalizes the special situation that m is a finite constant as in Pesaran and Yang (2020b) and relaxes the magnitude of the column sums. 6 To obtain a CLT with many dominant units, we first consider the magnitude of the first component in (2.1), that is, . We note that this order holds regardless of whether m n = m is a finite constant independent of n or divergent as n tends to infinity.
Regarding the remaining components in (2.1 ), as we need to consider the moments Eε k ni for k = 3, . . . , 8, we have from Lemma S.5 due to the nonnormality of ε ni . Thus, the magnitude of those components related to the nonnormallity of ε ni is O(n), which is not larger than that of regular components.
Regarding nσ 2 x,n in the denominator in condition (a), because As tr(A 2 n ) ≤ tr(A n A n ) by Cauchy's inequality and n i=1 a 2 ii ≤ tr(A n A n ), we notice that even though A n has some unbounded column sums, tr(A n A n ) = tr(A n A n ) can be obtained from the row sums of the latter, which is O(n), such that σ 2 x,n = O(1). Additionally, the σ 2 x,n is a variance term bounded away from zero, which can be observed by rewriting σ 2 x,n in a positive form as it is a sum of two nonnegative terms, σ 2 ii is bounded away from zero, the claim holds; however, if 1 n n i=1 a 2 ii tends to zero, we consider the second component of σ 2 x,n and realize that 1 n tr(A n + A n ) 2 = 1 n A n + A n 2 e , where · e is the Euclidian norm of the matrix, would usually be bounded away from zero. Hence, condition (a) requires O(n 1+δ ) To further extend the above results to accommodate a SAR model with exogenous variables, we consider a linear and quadratic form instead of a quadratic form only. We define d n = (d 1 , . . . , d n ) as an n×1 vector of exogenous variables. The CLT we are investigating is 1 Proposition 2. Let ε n be an n -dimensional column vector of iid (0, σ 2 0 ) disturbances with finite variance σ 2 0 < ∞ and E|ε ni | 8 < ∞ and d n be an n-dimensional column vector of exogenous variables with bounded elements with respect to n. Let {A n } be a sequence of nonstochastic matrices that satisfies A n ∞ < ∞ and A n 1 = O(n δ ). We define B n = 1 2 (A n + A n ) and σ 2 x,n = 1 n var(ε n B n ε n + d n ε n ). Then, assuming 0 ≤ δ < 1 and lim n→∞ σ 2 x,n > 0, we have The CLT in Proposition 2 complements that in Pesaran and Yang (2020b) as 0 ≤ δ < 1 is needed instead of their condition 0 ≤ δ < 1/2, and m n would not be restricted to be finite. However, instead of the existence of the (4 + η)th moment as these authors assume, we require iid disturbances and the existence of a eighth moment. The requirement on σ 2 x,n would be satisfied as long as 1 n tr(B 2 n ) or 1 n d n d n are not zero in the limit. This CLT can be applied to the GMM estimation of a SAR model when the relevant matrix is W n , and it holds for fixed m and large m n .
When the relevant matrix is G n , we need to obtain the column sum properties of S −1 n and G n . 7 We assume that m n = O(n η m ) in W n , where η m ≥ 0. In the case η m > 0, m n → ∞ as n → ∞; in the case η m = 0, m n = m would be a finite constant.

CLT Related to G n
To investigate whether the CLT in Proposition 2, which is applicable in the case with W n as the matrix in the quadratic form, can be applied to G n , the column sum properties of S −1 n and G n are needed. We notice that S −1 n and G n are still rowsum bounded given that W n is row sum norm bounded (see Pesaran and Yang 2020b). As the first m n columns of W n are unbounded with column sum O(n δ ) and the remaining n − m n columns sum are bounded, we need to consider a situation in which a nonpopular agent 8 in W n would become popular in G n and investigate their column sum orders. By the inverse of the partitioned submatrices and UB property of these nonpopular agents in W n , we can obtain the more specific order of the popular units and these intermediate units for G n . 9 The following Proposition 3 provides the results of the column sums of S −1 n and G n , where G n is the relevant square matrix in the QML and best GMM estimation. Depending on whether a nonpopular unit in W n has influence on the dominant units, we have different features of the column sums of G n . We define ] are the columns of the dominated units. 11 We note that all elements in W n are uniformly O(1) due to the 7 The A n ∞ is UB regardless of the magnitude of the column sums of A n because if A n ∞ is not bounded, in a situation with A n = W n , some row sums of W n will be large in the magnitude order; thus, the spatial coefficient would need to be small and tend to zero as n → ∞. Such a model is not of interest. 8 An agent is popular or dominant in A n if it influences many other units such that the numbers of units being influenced by that agent in A n , that is, the number of nonzero entries in the corresponding column, increase to infinity as n increases, while an agent is nonpopular in A n if the nonzero entries in its column of A n are uniformly bounded in n. 9 This way of analyzing S −1 n is motivated by the analysis reported in Pesaran and Yang (2020b). 10 As pointed out by a referee, denote δ * = min j=1,...,m n δ j and assume the rest (n − m n columns) of the column sums are bounded by a finite K. With j i w ij = i j w ij , m n ×n δ * +(n−m n )×K can't have an order greater than O(n), which implies η m + δ * ≤ 1, so that η m < 1 since δ * > 0. This, in turn, implies that η B = 1 as n − m n = O(n) and m n < n. 11 We observe that δ m and δ B are related to the magnitude orders of the column sums of W n,11 and W n,B1 . As columns sum W n,11 and W n,B1 are a part of the column sum of row sum norm boundedness, and we have a relationship that δ m ≤ η m and δ B ≤ η B .
Proposition 3. Let W n be partitioned as in (2.2) and with respect to W n , we have m n dominant (popular) units, q n intermediate units who will eventually influence some popular units as in certain levels of neighbors, and n − m n − q n nonpopular units who will not influence the m n dominant units in any level of neighbors. Assume that S −1 n,BB is UB in both the row and column sum norms, and q n can be finite or infinite. Then, the properties of the column sums of S −1 n and G n are given below.
(a) The column sum of each column of S −1 n and G n is of the Proposition 3 helps track the column sum orders of the relevant matrix G n such that the CLT developed for W n can be extended to G n . From Proposition 3, we note that for the column sums of the popular units. This result includes the special case of finite m, which corresponds to η m = 0 and δ B = δ and is stated below in Corollary 1.
Corollary 1. Suppose m n = m is finite (not dependent on n) and W n is partitioned as in (2.2) and with respect to W n , we have m dominant (popular) units, q n intermediate units who will eventually influence some popular units, and n − m − q n nonpopular units who will not influence the m dominant units in any level of neighbors. Assume that S −1 n,BB is UB in both the row and column sums and that q n can be finite or infinite. Then, the column sums of S −1 n and G n are provided below. From Corollary 1, regarding the square matrix G n , the column sums of j = m + 1, . . . , n columns could be O(1) or O(n δ ) when m is finite. With the additional restriction W n,1B = 0 such that all remaining units would not connect to the m dominate units, the CLT of martingale differences (used in Pesaran and Yang 2020b) can be applied to the best GMM estimation with G n as the optimal quadratic moment because only the m columns of G n might have column sums of order O(n δ ), but the remaining (n − m) column sums are uniformly bounded. In this case, the CLT requires 0 ≤ δ < 1 but is not restricted to be 0 ≤ δ < 1/2. Without the restrictions W n,1B = 0, even the total number of dominant and intermediate agents in W n is finite, and the stated CLT of martingale difference in Pesaran and Yang (2020b) cannot be applied directly to investigate the asymptotic property of the best GMM estimator with the quadratic matrix G n because the implied dominant agents in G n might not be finite. 12 However, in general, the CLT of martingale difference might still be applied as our proof of the validity of this CLT does not require the number of unbounded columns in G n to be finite.
To investigate the requirement of CLT to hold when the quadratic matrix is G n , we need to investigate how the rate restriction 0 ≤ δ < 1 for W n would change for G n .
Proposition 4. Let ε n be an n-dimensional column vector of iid where all the entries of d n are uniformly bounded. Then, if lim n→∞ σ 2 x,n > 0 and δ B +η m < 1, we have From Proposition 4 , we observe that the rate restriction on G n would be stronger than that on W n . The CLT for W n from Proposition 2 requires 0 ≤ δ < 1, while the rate restriction for G n is δ B + η m < 1. Because η m < 1 (from δ B + η m < 1 and δ B ≥ 0) and η B = 1 from max(η m , η B ) = 1, it is reasonable to set δ B = δ . Then, the condition δ B + η m < 1 for the CLT of G n would be reduced to δ + η m < 1, which is more restrictive than 0 ≤ δ < 1 for the CLT of W n . Therefore, when m n tends to infinity as n tends to infinity, the CLT for G n would more likely hold if m n is small relative to n − m n because η m is small. We notice that the large m n case nests the finite m case such that δ B + η m < 1 collapses to 0 ≤ δ < 1 when m is finite.
Recall that η B is related to the order of the number of nondominant units defined in W n and δ B is related to the column sum order of dominated units on the links from the nondominant units. Due to the row sum boundedness of G n , each element of G n is bounded. Therefore, η B ≥ δ B and η m ≥ δ m . Combined with max(η m , η B ) = 1 and max(δ m , δ B ) = δ, the CLT condition δ B + η m < 1 in Proposition 4 can have the following forms: (i) When m n is finite such that η m = 0, δ B would be equal to δ, and δ B + η m < 1 is reduced to 0 ≤ δ < 1. (ii) When m n is large but relatively smaller than n − m n (such that η m < 1 and η B = 1) and, simultaneously, the column sum order δ m is smaller than or equal to δ B (such that δ B = δ), δ B + η m < 1 is reduced to δ + η m < 1. (iii) When m n is large but relatively smaller than n − m n (such that η m < 1 and η B = 1) but the column sum order δ m is larger than δ B (so that δ m = δ and δ B < δ), δ B + η m < 1 cannot be further simplified.
We notice that, as η m < 1 and η B = 1 such that m n must be small relative to n − m n , case (iii) corresponds to an interesting situation in which even though m n is small relative to n − m n , the column sum of W n,11 is larger than the column sum of W n,B1 , that is, the interaction among the dominant units is much stronger than the interaction between the dominant and nondominant units. Under this situation, we have δ m = δ and δ B < δ. When δ B = 0 such that the dominant units mostly interact with dominant units and only a few nondominant units, the CLT will only require that η m < 1. 13

QML Estimation
We consider the cross sectional (first order) SAR model 14 Y n = ρ 0 W n Y n + X n β 0 + ε n , where Y n is n × 1 vector of dependent variables, X n is an n×k x nonstochastic exogenous variables with the first column being l n = (1, 1, . . . , 1) , W n is a nonstochastic spatial weights matrix and the disturbance ε ni , i = 1, 2, . . . , n, of the n-dimensional vector ε n are iid (0, σ 2 0 ). W n Y n is usually referred to as a spatial lag of Y n . 15 As S n = I n − ρ 0 W n and G n = W n S −1 n , the reduced form of the SAR equation is Y n = S −1 n (X n β 0 +ε n ), and that of the spatial lag W n Y n is W n Y n = G n (X n β 0 + ε n ). With S −1 n = I n + ρ 0 G n and G n = W n S −1 n = S −1 n W n , the reduced form equation of Y n is Y n = ρ 0 (G n X n β 0 ) + X n β 0 + S −1 n ε n .

Likelihood Function
The first order derivatives of the log-likelihood function (3.1) w.r.t. β, ρ and σ 2 are Evaluated at θ 0 = (β 0 , ρ 0 , σ 2 0 ) , by using Y n = ρ 0 (G n X n β 0 ) + X n β 0 + S −1 n ε n , we have where the score with respect to ρ involves G n . Although the CLT in Proposition 2 can be applicable to W n , matrix G n does not have the same structure as W n in terms of the number of unbounded columns because there could be spatial units that are not dominant but may be neighbors of some dominant units in certain neighboring levels. From Proposition 3, we can 13 Notably, δ m ≤ η m . Thus, as δ m = δ, the requirement η m < 1 for G n might still generalize the requirement δ < 1. 14 From proof of Propositions 3 and 4, the CLT can be applied to the quadratic matrix such as W n and G n and similarly W n G n . However, the CLT cannot be directly applied to the quadratic matrix G n G n where more than one matrix inversion (i.e., S −1 n ) is involved. Therefore, a SAR model with SMA or SAR disturbance with dominant units in both regression and disturbances spatial weights matrices might need further investigation. 15 Under heteroscedasticity, the adjusted MLE in Liu and Yang (2015) and the robust GMM in Lin and Lee (2010)  obtain the number of unbounded column sums of G n and their column sum magnitudes; thus, the relevant CLT obtained in Proposition 4 can study the asymptotic distribution of the score vector in (3.3).

Asymptotic Properties of QMLE
For our analysis of QMLE, the following assumptions are made.
Assumption 1. W n is a nonstochastic spatial weights matrix and is UB in row sum in the absolute value. The m n largest column sums of W n are of the order O(n δ ), where 0 ≤ δ < 1.
Assumption 2. sup ρ∈P ρW n ∞ < 1 and sup ρ∈P ρW n,BB ∞ < 1 such that S n (ρ) and S n,BB (ρ) are invertible for all ρ ∈ P, where P is a compact interval. Furthermore, ρ 0 is in the interior of P. 16 Assumption 3. The elements of X n are nonstochastic and bounded, 17 uniformly in n. Additionally, the limit of 1 n X n X n exists and is nonsingular.
Assumption 1 specifies that W n is row sum bounded, but its columns can be unbounded with the order O(n δ ), where 0 ≤ δ < 1. As the row sums in absolute values are bounded for all n, all entries of W n are uniformly bounded, and hence, each column sum in absolute value is at most of order O(n). The invertibility of S n (ρ) and S n,BB (ρ) in Assumption 2 guarantees that the reduced form of the SAR model is valid, and the column sum properties of S −1 n and G n can be analyzed with these assumptions. Furthermore, compactness is a typical condition for asymptotic analyses of nonlinear econometric models. When exogenous variables X n are included in the model, it is convenient to assume that they are uniformly bounded as in Assumption 3. Assumption 4 provides the iid regularity assumption for ε ni . In contrast to the requirement of E|ε ni | 4+η < ∞ for some positive η in the general martingale CLT for the linearquadratic moment in Kelejian and Prucha (2001), here we need the existence of the eighth moment. The existence of the eighth moment is needed for the Liapunov CLT we use. Kelejian and Prucha (2001) established a CLT for the SAR model based on the regularity condition where the spatial weight matrix is UB in both the row and column sum norms. Pesaran and Yang (2020b) generalize the UB in the column sum norm by allowing the column norm to be O(n δ ), where 0 ≤ δ < 1 2 , but the number of such unbounded columns must be finite for all sample sizes. In this article, we extend the CLT proposed by Kelejian and Prucha (2001) and Pesaran and Yang (2020b) in the spatial econometric literature.
From the log-likelihood function (3.1), denote H n = 1 n (X n , G n X n β 0 ) (X n , G n X n β 0 ), which can be considered the 16 Due to the nonlinearity in ρ of the reduced form of the model, the compactness of P is needed. However, the compactness of β and σ 2 is not necessary because the β and σ 2 estimates given ρ are least squares type estimates. 17 If X n is allowed to be stochastic and unbounded, appropriate moment conditions can be imposed instead.
variance matrix of the regressors in the reduced form. The following assumption provides the conditions for the parameter identification.
Assumption 5. Either (a) the limit of H n is nonsingular or (b) S n (ρ)S n (ρ) is linearly independent of S n S n for any ρ = ρ 0 .
The asymptotic distribution of the QMLEθ n can be obtained by the Taylor expansion of ∂ ln L n (θ) ∂θ at θ 0 , where the first order derivative of the concentrated log-likelihood function at the true parameters involves both the linear and quadratic functions of ε n . Its asymptotic distribution can be derived from a CLT of linear-quadratic moments with iid disturbances from Proposition 4. Assumption 6. The limit of 1 n tr(G s nG s n ) − tr 2 (G s n ) is strictly positive, whereG n = G n − trG n n I n andG s n =G n +G n .
Assumption 6 is a condition for the nonsingularity of the limiting information matrix of the QML estimation. Assumption 7 is a condition of the spatial weights matrix W n allowing the CLT related to G n to be applied. 18 When m is finite, we have η m = 0 and δ B = δ; thus, the rate restriction is reduced to 0 ≤ δ < 1.
Hence, the QMLEθ n is consistent, asymptotically normal, and properly centered. When ε ni 's are normally distributed, θ 0 ,n would be equal to zero. Although G n has large column sums, with the eighth moment restriction, we can still use the relevant CLT to derive the asymptotic distribution of the QML estimators, and the rate of convergence of the QML estimators is still √ n. The rate of convergence can also be implied from the order of the elements in the information matrix. Although the column sums of W n and G n are unbounded, their row sums are still bounded, implying that the elements in H n and finite number of groups, as the overall weighted matrix would be too dense. As the column sum boundedness of W n is assumed, there are no dominated units in the model studied in Lee (2004). Under the current setting where dominant units are allowed with proper rates, the QMLE is still √ n-consistent due to the row sum boundedness of relevant matrices and the feature on the orders of those dominant units.

Moment Conditions and Asymptotic Properties of OGMME
With α = (ρ, β ) and ε n (α) = S n (ρ)Y n − X n β, the moment conditions are g n (α) = (ε n (α)P n1 ε n (α), . . . , ε n (α)P np ε n (α), ε n (α)Q n ) (4.1) where the linear IV matrix Q n is constructed from exogenous variables, and the quadratic IV matrix is P nl ε n for l = 1, . . . , p with tr(P nl ) = 0 such that E(ε n P nl ε n ) = σ 2 0 tr(P nl ) = 0. The quadratic moments increase the efficiency of the estimation and are required for the identification of the parameters if the true value of β is zero. As in Hansen's GMM setting (1982), one considers a linear transformation of the moment conditions, a n g n (θ ), where a n is a matrix with a number of rows greater than or equal to (k x + 1), and a n is assumed to converge in probability to a constant full rank matrix a 0 . The GMM estimator is obtained from a minimization of the objective function min α g n (α)a n a n g n (α).
Let d A be the column vector formed by the diagonal elements of any square matrix A, vec(A) be the column vector formed by stacking the columns of A; and A s = A + A . Denote μ 3 and μ 4 as the third and fourth moments of ε ni , and let ω nd = [d P n1 , . . . , d P np ] and ω n = [vec(P s n1 ), . . . , vec(P s np )]. The variance matrix of the moments in (4.1) at the true parameter value is When ε ni is normally distributed, the second component of n will be zero because μ 4 − 3σ 4 0 = 0 and μ 3 = 0 under normality. For the optimal GMM, −1 n is used instead of a n a n . Assume that lim n→∞ 1 n Q n Q n is of full rank q, where q ≥ k x +1. The following assumption summarizes the sufficient conditions for the identification of α 0 from the moment equations lim n→∞ 1 n E(g n (α)) = 0 .
Assumption 8. Either (i) lim n→∞ 1 n Q n (G n X n β 0 , X n ) has the full rank (k + 1), or (ii) lim n∞ 1 n Q n X n has full rank k, lim n→∞ 1 n [tr(P s n1 G n ), . . . , tr(P s np G n )] is linearly independent of lim n→∞ 1 n [tr(G n P n1 G n ), . . . , tr(G n P np G n )] , and lim n→∞ 1 n tr(P s nl G n ) = 0 for some l .
With this identification, the GMM estimatorα is consistent. For its asymptotic distribution, √ n(α − α 0 ) = − ∂g n (α)/∂α n a n a n ∂g n (ᾱ)/∂α n −1 ∂g n (α)/∂α n a n a n g n (α 0 ) √ n by the Taylor expansion, whereᾱ lies betweenα and α 0 . By denoting D n = − 1 n σ 2 0 C pn 0 p×k x Q n G n X n β 0 Q n X n where C pn = (tr(G n P s n1 ), . . . , tr(G n P s np )) = 1 2 ω n vec(G s n ), we have 1 n ∂g n (α) (1). Thus, we have ∂g n (α)/∂α n a T a T ∂g n (ᾱ)/∂α n = D n a n a n D n + O p 1 √ n + o(1). By using the CLT with dominant units, the (linear combination of) the moment condition evaluated at the true parameter values, that is, 1 √ n a n g n (α 0 ), will be asymptotically normally distributed. We obtain the following result.
Assumption 9. The limit of n exists and is a nonsingular matrix.
Theorem 3. Suppose P nl , l = 1, . . . , p satisfies the CLT requirement in Proposition 2. Under Assumptions 1-4 and 7-9, the GMM estimatorα is consistent and plim n→∞ D n a n a n D n −1 D n a n a n n a n a n D n D n a n a n D n −1 ).
With dominant units, the OGMM still converges at the usual √ n rate, and the asymptotic variance matrix is (D n −1 n D n ) −1 . From D n and n in the variance matrix, we observe that even though G n has unbounded column sums, their row is still row sum bounded such that the H n and 1 n tr(G n P s nl ) in D n is still O(1), and those 1 n ω n ω n , 1 n ω nd ω nd , and 1 n Q n Q n in n are still O(1) where Q n is from X n and its spatial expansions. We notice that the GMME has the same asymptotic expressions as in Lee (2007) and Pesaran and Yang (2020b), although their underlying CLTs have different required conditions.

Best Moment Conditions
In this section, we present the best linear and quadratic moments under the setting of dominant units. We define X n2 as a submatrix of X n with the intercept column deleted. For any square matrix C n , let diag(C n ) be the diagonal matrix formed by the diagonal elements in C n , andC n = C n − tr(C n ) n I n , which has a zero trace. For any column vector c n , letc n = c n − 1 n l n l n c n where l n is vector of ones. Denote Q * n1 = X n , Q * n2 = G n X n β 0 , Q * n3 = d G n , and P * n1 =G n , P * n2 = diag(G n ), P * n3 = diag( G n X n β 0 ) and P * n,k+3 = diag( X n2,k ) for k = 1, . . . , k x − 1. From Liu, Lee and Bollinger (2010), the best linear IV is Q * n = [Q * n1 , Q * n2 , Q * n3 ] and the best quadratic matrices are P * n1 , P * n2 , P * n3 , P * n,k+3 for k = 1, . . . , k x − 1.
Under the best linear and quadratic IVs, the variance matrix of the moments in (4.1) at the true parameter value is * and ω * n = [vec(P * s n1 ), . . . , vec(P * s n,k x +2 )]. Let η 3 = μ 3 /σ 3 0 and η 4 = μ 4 /σ 4 0 be the skewness and kurtosis of the disturbance, respectively. The variance matrix of the BGMM estimators becomes * and 19 By using the best linear and quadratic moments, we have the following theorem for the best GMM (BGMM) estimators.
Assumption 10. The limit of * n exists and is a nonsingular matrix.
Theorem 4. Suppose we use the moment conditions under which Q * n assumes the special form in (4.4), and P * n1 =G n , P * n2 = diag(G n ), P * n3 = diag( G n X n β 0 ) and P * n,k+3 = diag( X n2,k ) for k = 1, . . . , k x − 1. Under Assumptions 1-4, 7-8, and 10, the best feasible GMME (BGMME)α * n derived from min α g n (α)ˆ −1 n g n (α), whereˆ −1 n − −1 Similar to OGMME, we notice that the rate of convergence of BGMME is still √ n. Here, even though we have unbounded column sums for G n in Q * n and P * nl , the relevant limit of these matrices can still be O(1) because the row sum is bounded, and the column sum magnitude is O(n δ ) with 0 ≤ δ < 1. By using the best linear and quadratic moment, the BGMME has the smallest variance-covariance matrix. When μ 4 − 3σ 4 0 = 0 and μ 3 = 0, the best linear and quadratic moments are Q * n = [G n X n β 0 , X n ] and P * n = G n − tr(G n ) n I n , which is reduced to the BGMM case in Lee (2007). 19 The P * nρ involves the true parameter ρ 0 , σ 2 0 , μ 3 , and μ 4 , where ρ 0 can be estimated from Theorem 3, and the moments parameters σ 2 0 , μ 3 , and μ 4 can be consistently estimated from the residuals.

Monte Carlo
We investigate the finite sample performance of the QML and GMM estimators of SAR models under dominant (popular) units. The DGP in the simulation is Y n = ρ 0 W n Y n + l n β 01 + X n β 02 + ε n , where X n is an n × 1 vector of random variables generated from an independent standard normal distribution. The regression coefficients are β 01 = β 02 = 1, and the spatial effect coefficient ρ 0 has the value 0.2, 0.5 or 0.8. We have various spatial weights matrix specifications depending on the values of δ and η m , where δ is related to the magnitude of the column sum, and η m is related to the number of unbounded columns. The spatial weights matrix has the structure W n = W n,11 W n,1B W n,B1 W n,BB where W n,11 is the m × m submatrix, and W n,B1 is the (n − m) × m submatrix. We generate W n by a rook matrix specification, 20 then replace the left blocks W .m ≡ (W n,11 , W n,B1 ) with dominant units. We denote [n δ ] as the integer part of n δ . We first set m = 1, and all units in the first [n δ ] rows in W .m are generated from iid uniform (0, 1), while the remaining n − [n δ ] rows in W .m are specified to be zero. 21 Therefore, the column sums of W .m would be O(n δ ) as the sum of [n δ ] uniformly distributed random variables is of the order O(n δ ). The δ equals 1 4 or 1. In another dimension, the number of dominant units m n = [n η m ] also increases, and η m increases from 0 to 1/2. Therefore, in the simulation, we have δ = 1 4 , 1 and m n = 1, [n 1/2 ]. The δ = 1 is included to investigate whether the ML and GMM estimators would break down as the asymptotic properties of these estimators requires 0 ≤ δ < 1. To assure row sum boundedness when m n is large, the final spatial weights matrix is row-normalized. For each set of generated sample observations, we obtain various estimators and evaluate their (finite sample) biases and variances. These estimators are 2SLS in Kelejian and Prucha (1998) using [X n , W n X n , W 2 n X n ] as linear IVs, B2SLS in Lee (2003) using estimated [G n X n β 0 , X n ] as the best linear IV, and OGMM with the linear moments [X n , W n X n , W 2 n X n ] and quadratic moments W n and W 2 n − tr(W 2 n ) n I n . BGMM1 uses the best feasible estimated linear IV [G n X n β 0 , X n ], and the quadratic IV is the consistently estimated G n − tr(G n ) n I n . BGMM2 uses the initially estimated [Q * nβ , Q * nρ ] in (4.4) as the linear IV and P * nρ in (4.5) as the quadratic IV, while BGMM3 uses the initially estimated Q * n as the linear IV and P * nl as the quadratic IV. For the BGMM estimators, the initial consistent estimators to construct the best linear and quadratic moments are 2SLSE. QMLE uses the estimated −1 θ 0 ,n + −1 θ 0 ,n θ 0 ,n −1 θ 0 ,n to obtain the variance matrix of estimators, and θ 0 ,n considers the nonnormality of disturbances. We repeat this 500 times to obtain the empirical bias. The finite sample properties of the estimators of ρ 0 = 0.5 are summarized in Tables S.1-S.2 in the supplementary materials 20 The rook matrix represents a square tessellation with a connectivity of four for the inner fields on a chessboard and two and three in the corner and border fields. Most empirically observed regional structures in spatial econometrics comprise regions with connectivity close to the range of the rook tessellation. 21 In Pesaran and Yang (2020b), W n,11 = 0. Then, the first [n δ ] units in W n,B1 are generated as iid uniform (0, 1), and the remaining units in W n,B1 are zeros. Their specification restricts the interaction among the dominant units to be zero.
with sample sizes of n = 400 and n = 900, and the disturbance vector ε n can be derived from a standard normal distribution or demeaned gamma(2, 1) disturbances in which its kurtosis is 6. Table S.1 shows the result with normal disturbances in the DGP, and Table S.2 shows the result with demeaned Gamma(2, 1) disturbances in the DGP. QMLE and BGMME supposedly have the same asymptotic distribution under normal disturbances. As shown in Table S.1, (i) the GMM and QML estimators have a much smaller SD than those of 2SLS and B2SLS; (ii) the GMM and QML estimators have a similar SD, which is consistent with the theoretic prediction; (iii) the case of δ = 1 would increase the bias and SD for the QMLE of ρ but only increase the SD of the 2SLS and GMM estimators of ρ; however, the bias and SD of various estimators of β do not increase; (iv) when m is large, the patterns are similar to the small m case; (v) the CPs of β and ρ are satisfactory in various cases under δ = 1 4 , but the CPs of ρ are smaller than the theoretical value (95%) of GMMEs and QMLE under δ = 1; and (vi) the cases of n = 400 and n = 900 have similar patterns, while the case of n = 900 yields a smaller bias and SD. Table S.2 shows the results with nonnormal disturbances, and BGMM2 and BGMM3 supposedly are more efficient than the QML estimators and BGMM1 asymptotically. The disturbances are generated as demeaned Gamma(2, 1). We observe that (i) BGMM1 has a similar SD as QMLE, while the SD of BGMM3 is the smallest; interestingly, the SD of BGMM2 is larger than that of QMLE possibly due to estimation precision in μ 3 and μ 4 with a finite sample; (ii) when m is small, the jump to δ = 1 would change various estimators similarly to that shown in Table S.1; and (iii) when m is large, the jump to δ = 1 would increase the bias and SD of all estimators.
To investigate how the performances of various estimates change when δ approaches 1, we also report the results of δ = 3 5 , 3 4 , 9 10 under n = 400. Even though δ is close to 1 and m n = n 2 , the performances of the various estimates are still satisfactory. Then, we investigate the performance of various estimates under ρ = 0.2 and ρ = 0.8. As shown in Tables S3-S4, the results are basically similar to those in the case of ρ = 0.5. When δ = 1, the bias and SD would be larger on average when ρ is larger.
We finally investigate the finite sample performance of various estimators with different values of ρ 0 . When ρ = 0.2, compared with ρ = 0.5, the bias is smaller on average, while the SD is larger, and the resulting RMSE is larger. When ρ = 0.8, compared with ρ = 0.5, both the bias and SD are smaller on average; thus, the resulting RMSE is smaller. These results are derived from a case with 0 ≤ δ < 1. However, when δ = 1, the bias and SD of ρ = 0.2 are smaller than those of ρ = 0.5, and those of ρ = 0.8 are larger on average. 22 22 We also conduct various simulations such as (i) using the same W n in Pesaran and Yang (2020b); (ii) using w ij = 1 2 j where its row sum is 1 as n → ∞ and its number of unbounded columns is not fixed; (iii) the case with a smaller sample size n = 100; and (iv) the case with small β 0 so that linear moment would be weak for 2SLS and B2SLS. Due to space limit, details are provided in the supplementary materials.

An Empirical Illustration
The OECD STAN Bilateral Trade by Industry and End-use (BTDIxE) provides information regarding the values of annual imports and exports across countries worldwide with details concerning the categories of the traded goods and services. By using information extracted from this data set, we construct networks of trade. Combined with the Penn World Table (PWT) of macro variables from 1995 to 2019, we can investigate the conditional growth convergence (see Mankiw, Romer, and Weil 1992;Islam 1995), and a spatial model is employed with the spatial weights matrix constructed from the trade flows among countries. Studies increasingly suggest that spatial spillover is important for explaining economics growth across countries (see Ertur and Koch 2007;Bottazzi and Peri 2007;Mancusi 2008;Behrens, Ertur, and Koch 2012;Yu and Lee 2012, etc.). We empirically examine the international spillover effect of growth via trade partners with the Solow model. In this article, we investigate the networks constructed from production intermediaries imports/exports of 158 reporting countries. 23 According to Pesaran and Yang (2020a), the column sum of the row-normalized network matrix , provides a measure of the network centrality. In addition, the maximal column sum serves as an index of the order of dominance. An extreme estimator is , where d max denotes the maximal column sum. 24 We calculate this index for the imported/exported intermediaries. The characteristics of these networks between 1995 and 2004 are reported in Table 1. We can observe that for imported intermediaries, the average d j increases from 1995 to 2004. The standard deviation of d j 's in 2000 and 2004 is larger than that between 1995 and 1999. The data of the exported intermediaries during the same period show similar features. By setting 2012 as the base year and using the GDP deflator from BEA of the United States, we derive the real values of the imports and exports and calculate the average between 1995 and 1999 23 A few countries are partners with reporters but do not report their trade information. As their trade information is not complete, we remove them from the sample and focus on bilateral trade across the reporting countries. On average, among the reporters, the sum of imports/exports with reporting counties account for more than 90% the imports/exports worldwide. 24 If there is one country with a zero column sum, we use n − 1 instead of n. and the average between 2000 and 2004. We can estimate the degree of dominance of each trade network. 25 We can observe that the orders of dominance in all networks are larger than 0.5. The existence of dominant units can also be graphically illustrated. As shown in the figures in the supplementary materials, there is great variation in the column sums of W n across nations. Regarding the G n matrix with ρ 0 = 0.5, we observe that the magnitude of the column sums is much larger than that of W n , but the number of dominant units is similar to that of W n . Thus, regardless of whether the reporting countries are importing or exporting, the orders of dominance of most countries are close to 0, and only a few countries have a large value for the order of dominance on exports. Compared with the export network, the order of dominance in the import network is even larger. 26 The estimation equation is ln y t 0 +T = α 0 + ρW n ln y t 0 +T + γ ln y t 0 + x t 0 +T β + V t 0 ,t 0 +T , (6.1) where y t 0 +T is a n × 1 vector of the real GDP per capita of n = 158 countries, and x t 0 +T includes the population and capital stock. Here, γ = e −Tφ , where φ is the rate of convergence, and 0 < γ < 1 indicates that convergence to the steady state is direct and involves no oscillations. The W n is constructed from the trade flow of intermediate goods during the period t 0 . We focus on the relationship between the growth rate and initial income level.
Numerous studies have investigated growth convergence and differ by the country set, time period, choice of control variables, and econometric models. For the estimation equation (6.1), when controls are absent, 0 < γ < 1 is known as unconditional β -convergence, and the differences in the per capita incomes are only temporary. When controls are present, 0 < γ < 1 is known as conditional β-convergence, and the income differences are permanent because of cross-country structural heterogeneity. As Eaton andKortum (1999, 2001) and Howitt (2000) argue, there is a need for researchers to develop empirical growth frameworks that include flows of goods, capital and knowledge. We follow this approach by specifying a spatial weights matrix constructed from traded intermediate and final goods.
We conduct cross sectional regressions with different specifications of W n for the period 1995-2019 such that ln y t 0 = ln y 1995 and ln y t 0 +T = ln y 2019 . To examine the relative importance of imports and exports for transmitting growth across countries, we first construct spatial weights matrix W im with bilateral import flows and then repeat the analysis with spatial weights matrix W ex constructed with bilateral export flows. We finally sum the imports and exports to construct the spatial weights matrix W im+ex as the total bilateral trade flow. Various estimation methods, including 2SLS, OGMM, BGMM and MLE, are employed. In the estimation, these spatial weights matrices are designed to have zero diagonals and are row-normalized. The results are presented in Table 2. We observe that the estimated rate of convergence is approximately 5% under various specifications of W n . Similar observations are found using different estimation methods.
Notably, most growth regressions find estimated convergence rates of approximately 2% per year. Although the rate of convergence can be as high as 10% in Caselli, Esquivel and Lefort (1996) and 15% and 5% in Evans (1996Evans ( , 1997, the estimates generally range between 1% and 3%. We believe that the estimated rate of convergence of 5% in the current illustration is due to (i) the spatial model approach to growth convergence, which is theoretically higher than the classical growth model (see Ertur and Koch 2007;Yu and Lee 2012), and (ii) the specification of trade flows as the spatial weights matrix, which is more precise and more relevant for the possible technology via traded goods.
As a robustness check, we also present the results obtained when the regressor is constructed from ln y t 0 = (ln y 1995 + ln y 1996 + · · · + ln y 1999 )/5 and ln y t 0 +T = (ln y 2015 + ln y 2016 + · · ·+ln y 2019 )/5. The corresponding studies are presented in the right block in Table 2. The basic results are similar, but the rate of convergence is slightly higher than that in the case with the data constructed from discrete data. To investigate whether the constructions of the spatial weights matrix from intermediate goods or final goods have different implications, we conduct the regression where the relevant W n 's are constructed from final goods. As shown in the supplementary materials, the estimated spatial effect is smaller, but the coefficient of the initial value is nearly the same under various situations. This finding implies that the growth spillover effect relies more on the trade of intermediate goods than of final goods.

Conclusion
Spatial econometrics have assumed that the UB property of column sums is relevant for spatial weights matrix for a long time. This UB assumption excludes many interesting empirical applications in which dominant (popular) units might be present, such as large countries in international trade and superstars in social networks. Pesaran and Yang (2020b) derived a CLT with dominant units in which the corresponding column sums can be large. However, it is assumed that the number of unbounded column sums is finite, and their column sums cannot exceed O(n δ ) with 0 ≤ δ < 1/2. With a higher moment restriction, we can allow the column sum order to be O(n δ ) with 0 ≤ δ < 1, and the number of unbounded column sums can be large. QML and GMM estimation procedures are investigated, and we show their asymptotic properties. Under the dominant unit setting, the rate of convergence of QML and BGMM estimators can still be √ n -consistent and asymptotically normal mainly due to the row sum boundedness of relevant matrices.

Supplementary Materials
Supplementary materials contain algebra for the CLT, proof for theorems, and additional simulation and empirical results.