Unified M-estimation of matrix exponential spatial dynamic panel specification

Abstract In this paper, a unified M-estimation method in Yang (2018) is extended to the matrix exponential spatial dynamic panel specification (MESDPS) with fixed effects in short panels. Similar to the STLE model which includes the spatial lag effect, the space-time effect and the spatial error effect in Yang (2018), the quasi-maximum likelihood (QML) estimation for MESDPS also has the initial condition specification problem. The initial-condition free M-estimator in this paper solves this problem and is proved to be consistent and asymptotically normal. An outer product of martingale difference (OPMD) estimator for the variance-covariance (VC) matrix of the M-estimator is also derived and proved to be consistent. The finite sample property of the M-estimator is studied through an extensive Monte Carlo study. The method is applied to US outward FDI data to show its validity.


Introduction
Dynamic panel data (DPD) models are important elements in Economics literature. Spatial dependence can be incorporated into DPD models to discuss topics in applied economics like regional markets (Keller and Shiue, 2007), labor economics (Foote, 2007) and technological interdependence (Ertur and Koch, 2007). The resulting spatial dynamic panel data (SDPD) models have gained much attention. Some papers (Lee andYu, 2010c, 2013;Xu and Lee, 2019) provide excellent surveys on these models.
Depending on the type of dynamic features allowed in the SDPD model, four categories can be generated (Anselin et al., 2008;Lee and Yu, 2010c): "pure space recursive" with only a spatial time lag, "time-space recursive" with an individual time lag and a spatial time lag, "time-space simultaneous" with an individual time lag and a contemporaneous spatial lag and "time-space dynamic" with all forms of lags. Recent studies have also used the terminology, weak and strong spatial dependence, to refer to the regression models that have spatial lag terms and interactive fixed effects respectively (Chudik et al., 2011;Kuersteiner and Prucha, 2020;Shi and Lee, 2017). While most of the literature in the SDPD models focus on the long panel setting with a large time period T (Anselin et al., 2001;Lee and Yu, 2010a;Yu et al., 2008;Yu and Lee, 2010), the setup with a large cross-sectional unit n and a small time period T, named short panels, has also gained more interest recently. Some papers discuss the likelihood based estimators for short panels (Elhorst, 2010;Su and Yang, 2015;Yang, 2018).
However there is one difficulty with the quasi-maximum likelihood (QML) estimation for the short panel SDPD model: the "initial condition" problem (Hsiao et al., 2002). The first observation for the first differenced data is endogenous in models with fixed effects, no matter whether the initial observation is endogenous or exogenous. To solve this problem, the traditional way is to use the predicted value obtained from the values of regressors (Elhorst, 2010;Hsiao et al., 2002;Su and Yang, 2015). But this method has its disadvantages. First process starting time is unknown and the time-varying regressors need to be trend or first-difference stationary. Second, the method cannot be applied to the SDPD models with spatial lags (SL) because the initial difference contains spatial effect in the exogenous part when expanded using backward substitutions. To deal with this problem, Yang (2018) proposes an initial-condition free M-estimator for the STLE model that includes a dynamic effect, a spatial lag (SL), a space-time effect (STL) and a spatial error (SE). The estimator is derived from a set of estimating equations based on the unbiased adjusted quasi score (AQS) functions and is consistent and asymptotically normal. He also proposes an outer product of martingale difference (OPMD) estimator for the variancecovariance (VC) matrix of the M-estimator and proves that it is consistent.
The matrix exponential spatial specification (MESS) is first proposed by LeSage and Pace (2007). They introduce the cross-sectional MESS and show that it has advantages over the traditional spatial autoregressive (SAR) models: a simpler log-likelihood function without the Jacobian matrix and an unrestricted parameter space for its spatial coefficients. Debarsy et al. (2015) derive the QML estimator and the GMM estimator of the MESS in cross-sectional setting and explore their large sample properties. Similar to the SPD models, MESS can be extended to the panel data models (Figueiredo and Da Silva, 2015;LeSage and Chih, 2018;Zhang et al., 2019). There has been a growing interest in recent literature in using MESS to explore various topics such as technological spillover (LeSage and Pace, 2000), housing price (LeSage and Pace, 2004), third-country effect on FDI (Debarsy et al., 2015), cigarette demands (Figueiredo and Da Silva, 2015), wage rates (LeSage and Chih, 2018) and geographical spillover (Zhang et al., 2019).
In this paper, the M-estimation method in Yang (2018) is extended to the matrix exponential spatial dynamic panel specification (MESDPS) with fixed effect in short panels, which assumes large n and small T and is typical for most real world datasets. Similar to the STLE model, the MESDPS also suffers from the "initial condition" problem. As discussed above, the traditional way of solving this problem, which is to use the predicted value derived from the values of the regressors, does not provide a satisfactory solution. A consistent way to estimate the coefficients and its VC matrix is needed. We first derive a set of conditional quasi score (CQS) functions treating the initial differences as exogenous, even if they are not. Then we modify these score functions to get the adjusted quasi score (AQS) functions which are unbiased. The M-estimator thus is derived by setting the AQS functions equaling to zero. To get a consistent estimate for the VC matrix of the M-estimator, a martingale difference (M.D.) of the AQS functions at the true value is established. The average of the outer product of M.D. (OPMD) is shown to generate a consistent estimate of the VC matrix when being substituted into the "sandwich" estimate of it, which is referred to as the OPMD estimator. In Monte Carlo simulations six types of submodels, MESDPS(1,1,1), MESDPS(1,1,0), MESDPS(1,0,1), MESDPS(0,1,1), MESDPS(1,0,0) and MESDPS(0,1,0) are estimated, where 1's denote the MESS in the dependent variable, the lagged dependent variable and the disturbances respectively. The results show that the M-estimator has good finite sample properties and is robust to the way the initial observation is generated, which implies that it solves the "initial condition" problem. The OPMD estimator of the VC matrix generates asymptotic standard errors that's much closer to the true standard deviation than the other candidates, especially when the disturbance is non-normal, emphasizing its importance in research when the normality of disturbances is in doubt. MESDPS(1,1,1) is applied to US outward FDI data to examine the validity of the model. The estimation results for the STLE model that includes the spatial lag, the space-time effect and the spatial error from Yang (2018) are also reported to emphasize the relation for the spatial coefficients of these two models.
The contribution of this paper is two-fold. First the unified M-estimation method is extended to MESDPS. The unified M-estimation is designed for the STLE model in Yang (2018). The MESDPS and the STLE model are non-nested. So it remains to be explored whether the M-estimation method designed for the STLE model can be extended to the MESDPS. Second, to our best knowledge, this is the first paper to consider MESS in a dynamic panel setting. Previous literature (Figueiredo and Da Silva, 2015;LeSage and Chih, 2018;Zhang et al., 2019) study the MESS in a panel data model. As mentioned previously, the "initial condition" problem remains when the spatial effects in the dynamic panel data model are in forms of the MESS, so consistent estimators for the coefficients and corresponding standard errors need to be designed, which is accomplished in this paper.
The rest of the paper is organized as follows. Section 2 introduces the M-estimation method. Section 3 presents the asymptotic distribution of the M-estimator and introduces the OPMD estimator of its VC matrix. Section 4 presents Monte Carlo simulation results. Section 5 applies the model to US outward FDI. Section 6 concludes. All technical parts and proofs are provided in the online appendix which is available through the journal webpage.

M-estimation of matrix exponential spatial dynamic panel specification
In this section we first discuss the literature that incorporates the MESS in the panel data setting. Although these papers discuss the panel data instead of the dynamic panel data, we include them in the review to underline the importance of our study, i.e., MESDPS has not been explored in the previous literature. The M-estimator and the OPMD estimator thus provide researchers who want to work with the MESDPS a reliable method to estimate the parameters and conduct inference. In the second subsection we present the M-estimation in MESDPS(1,1,1) in short panel. Short panel assumes large n and small T, which is typical for most real world datasets. M-estimation first formulates a set of conditional quasi score (CQS) functions assuming that the initial difference is exogenous, and then modifies it to get a set of adjusted quasi score (AQS) functions which result in consistent parameter estimates.

Matrix exponential spatial dynamic panel specification
The matrix exponential spatial dynamic panel specification with fixed effects is given by where y t is an n Â 1 vector of observations on the dependent variable; W r for r ¼ 1, 2, 3 are three n Â n spatial weight matrices, with corresponding spatial coefficients a r capturing the MESS in the dependent variable, the lagged dependent variable and the disturbances respectively; y tÀ1 is the lagged vector of y t with coefficient s capturing the dynamic effect; X t is an n Â k matrix of time-varying exogenous variables with corresponding coefficient vector b; Z is an n Â p matrix of time-invariant exogenous variables, which might include the intercept, with corresponding coefficient vector c 1 ; l is an n Â 1 vector of unobserved individual fixed effects; k t is the time fixed effects; l n is an n Â 1 vector of 1; and t is a vector of disturbances independent and identically distributed across i and t with mean zero and variance r 2 : The matrix exponential e a r W r is defined as P 1 j¼0 a j r W j r j! for r ¼ 1, 2 and 3 and is always invertible with inverse e Àa r W r (Chiu et al., 1996). The reduced form of the model is given by y t ¼ e Àa 1 W 1 ðsI n þ e a 2 W 2 Þy tÀ1 þ 1 As kindly pointed out by a referee, since the estimation approach is based on the first difference of the model, we cannot estimate the parameters of time-invariant variables. e Àa 1 W 1 ðX t b þ Zc þ l þ k t l n þ e Àa 3 W 3 t ). The model is stationary if all eigenvalues of e Àa 1 W 1 ðsI n þ e a 2 W 2 Þ lie inside the unit circle (Proposition 10.1, Hamilton 1994).
The specification in (2.1) is comprehensive. It incorporates different submodels by setting the spatial coefficients a r ¼ 0 for r ¼ 1, 2 or 3. By setting a 2 ¼ 0, we have MESDPS(1,0,1) with MESS in the dependent variable and the disturbances: Without ðs þ 1Þy tÀ1 , Zc and merging k t l n into X t b, Zhang et al. (2019) study the QML estimation of (2.2) in panel data setting. They allow large n and small or large T and establish the consistency and asymptotic normality under unknown heteroskedasticity when the spatial weight matrices in MESS for y t and u t are commutable, i.e., W 1 W 3 ¼ W 3 W 1 : 2 By setting a 2 ¼ 0 and a 3 ¼ 0, we get MESDPS(1,0,0): 3) without ðs þ 1Þy tÀ1 and Zc: They use the deviation from mean operator to get rid of the individual and time fixed effects and present the ML estimation of the transformed model. This approach, however, results in linearly dependent disturbance terms after transformation. Instead, we can pre-and post-multiply the model by the orthonormal eigenvector matrix of the individual and time mean deviation operators respectively (Lee and Yu, 2010b).
The literature above incorporate MESS into a panel data model. To the best of our knowledge, MESS in a dynamic panel setting has not been studied in the previous literature. The M-estimation proposed in this paper provides consistent and asymptotically normal estimates. The OPMD estimator for the VC matrix is also consistent and provides good finite sample properties. The method is useful for those who want to utilize the MESDPS in empirical research.

The M-estimation of MESDPS
Different from the geometrical decay in the STLE model, (2.1) has an exponential decay. It also has a simpler quasi log-likelihood function without the Jacobian of the transformation. However, they suffer from the "initial condition" problem discussed below.
The comprehensive model in Yang (2018) that includes the spatial lag (SL), the space-time lag (STL) and the spatial error (SE) is denoted as the STLE model. Consider the STLE model given by The log-likelihood function (2.7) and the concentrated log-likelihood function (2.10) are simpler without the Jacobian log jB 1 ðk 1 Þj where B 1 ðk 1 Þ ¼ I TÀ1 B 1 ðk 1 Þ and B 1 ðk 1 Þ ¼ I n À k 1 W 1 : It makes the MESDPS computationally easier, especially for large sample sizes. A correspondence of relation for the parameters also exists for the MESDPS and the STLE model. Consider (2.1), assume the spatial weight matrix is row-normalized and a shock @x tk is applied to all spatial units on the kth independent variable X tk , so that the new variable becomes X tk þ l n @x tk : Then the contemporaneous total impact for the MESDPS is given by @y t ¼ e Àa 1 W 1 l n @x tk b k , so the average contemporaneous total impact is 1 n l 0 n @y t ¼ e Àa 1 @x tk b k : Similarly for the STLE model, the average contemporaneous total impact is given by 1 1Àk 1 @x tk b k : Equating them gives us the relation k 1 ¼ 1 À e a 1 : For y tÀ1 , a shock @ tÀ1 leads to the average total impact e Àa 1 ðs þ e a 2 Þ@ tÀ1 for the MESDPS and qþk 2 1Àk 1 @ tÀ1 for the STLE model. So s þ e a 2 ¼ q þ k 2 : Setting a 2 ¼ 0 and k 2 ¼ 0 gives us q ¼ s þ 1, which implies k 2 ¼ e a 2 À 1: On contrary to the negative relation between a 1 and k 1 , the relation between a 2 and k 2 is positive. When À1 < k 2 < 0, a 2 also takes negative values and vice versa.
The CQML estimatorh ¼ ðb 0 ,r 2 ,s,ã 0 Þ 0 derived above encounters a bias when T is small as shown below. We simplify the notation by denoting R ¼ Rða 3 Þ and R 0 ¼ Rða 30 Þ: Using the simplified log-likelihood function in (2.7), the conditional quasi score (CQS) function SðhÞ ¼ @'ðhÞ @h is derived as We will show that the s, a 1 and a 2 elements of the CQS function (2.11) are biased, meaning their expected values are nonzeros at the true parameter values, leading to the inconsistency of the CQML estimator. First let's make Assumption 1 below.
Assumption 1. For model (2.1), (i) the processes started m periods before the start of data collection, the 0th period, Assumption 1 is the same as the Assumption A in Yang (2018). Compared with the assumptions in previous literature (Elhorst, 2010;Hsiao et al., 2002;Su and Yang, 2015), Assumption 1 requires minimum information about the past processes. It does not require the time-varying regressors to be trend or first-difference stationary. This is one of the advantages of M-estimation, i.e., some restrictive assumptions on the initial values and initial differences are removed. Denote A 21, 0 ¼ A 20 e Àa 10 W 1 : The following lemma is necessary to compute the bias of the CQS function.
21, 0 ðA 21, 0 À I n Þ 2 ::: and D 0 ¼ A 21, 0 À 2I n I n ::: ::: 0 ðA 21, 0 À I n Þ 2 A 21, 0 À 2I n . . . ::: 21, 0 ðA 21, 0 À I n Þ 2 ::: ::: Here we used the fact that it is i.i.d. across i and t, and that e a r0 W r is always invertible for r ¼ 1 and 3. Using Lemma 2.1, we have where W 21, 0 ¼ W 2 e a 20 W 2 e Àa 10 W 1 and B ¼ B I n : These equations imply that Eð @'ðhÞ @s Þ, Eð @'ðhÞ @a 1 Þ and Eð @'ðhÞ @a 2 Þ are nonzero, making s, a 1 and a 2 elements of the CQS function (2.11) biased. The set of CQS functions are estimating functions for the CQML estimator. The consistency of an M-estimator requires that the estimating functions need to have a probability limit of zero at the true parameter values, i.e., plim n!1 1 nT Sðh 0 Þ ¼ 0 (Vaart, 2000). However Lemma 2.1 implies that it does not hold for the CQML estimator. Typically Eð @'ðhÞ @s Þ, Eð @'ðhÞ @a 1 Þ and Eð @'ðhÞ @a 2 Þ are of order n, which implies E½ The bias thus does not vanish in short panels when T is fixed. The bias vanishes when n T ! 0, which refers to a long panel and is not of interest in our study. So the CQML estimation fails to produce consistent estimates.
To have a set of unbiased estimating functions, we modify the CQS functions in (2.11) to get the adjusted quasi score (AQS) functions: The M-estimator derived from the AQS functions is consistent and asymptotically normal, which will be shown in the next section. It is interesting to compare the AQS functions with those for the STLE model in Yang (2018). First the bias term trðD À1 B À1 e Àa 1 W 1 ) in the s element has similar format with that for the q element 3 in the STLE model (with SAR process being replaced by MESS). This means that while the inherent spatial processes are different, the format of the bias that comes from the dynamic effect is not affected by the nature of the spatial structure. Second thing to note is that, similar to the STLE model, the adjustments in the AQS functions are free from MESS in the disturbance term, i.e., e a 3 W 3 does not appear in the trace terms. This implies that the AQS adjustments will not change if MESS in the disturbance term changes to other forms of spatial relationship, e.g., higher order MESS, autoregressive, moving average, etc. Third the adjustments modify the estimation of s, a 1 and a 2 so that they become nonlinear.
To derive the M-estimator, we first solve for the constrained M-estimators of b and r 2 , given f ¼ ðs, a 0 Þ 0 , asb where DûðfÞ ¼ e a 1 W 1 DY À A 2 DY À1 À DXb M ðfÞ: Thenb M ðfÞ andr 2 , M ðfÞ are substituted back into the other four elements of the AQS function S Ã ðhÞ to get the concentrated AQS function: DûðfÞ 0 ðB À1 E 3 ÞDûðfÞ: to derive unconstrained ones, i.e.,bðfÞ ¼b M ðfÞ andr 2 ðfÞ ¼r 2 , M ðfÞ, which are from (2, 3) and (2.17) respectively. The advantage of M-estimation comes from the AQS function (2.15). It adjusts the estimation functions so that they become unbiased. For the CQML estimation, the estimatorsbðfÞ andr 2 ðfÞ are biased because of the spillover from the bias of the estimatorf when being substituted into (2.8) and (2.9).

Asymptotic properties of the M-estimator
In this section we explore the asymptotic properties of the M-estimator. We first prove it is consistent and then derive its asymptotic distribution. To facilitate valid inference, an OPMD estimator of the VC matrix is also proposed. Valid inference can thus be based on the standard errors implied by the OPMD estimator of the VC matrix.

Consistency of the M-estimator
To prove the consistency and to later derive the asymptotic distribution of the M-estimator, we first make some regularity assumptions. Let C n be an n Â n matrix. Then C 0 n , trðC n Þ, C n j j, jC n j jj, c min ðC n Þ and c max ðC n Þ denote the transpose, trace, determinant, Euclidean norm, the smallest and largest eigenvalues of C n respectively.
Assumption 2. Matrices fW 1 g, fW 2 g and fW 3 g are bounded in both row and column sum norms. The diagonal elements of W 1 , W 2 and W 3 are zeroes.
Assumption 3. The time-varying regressors fX t , t ¼ 1, :::, Tg are exogenous with uniformly bounded elements and have full column rank. Also lim n!1 1 nT DX 0 DX exists and is nonsingular.
Assumption 4. There exists a constant d > 0 such that a r j j d for r ¼ 1, 2 and 3, and the true f 0 is in the interior of the parameter space Z. Also there exist a lower bound c a r and an upper bound c a r such that 0 < c a r inf a r 2Z r c min ðe a r W 0 r e a r W r Þ sup a r 2Z r c max ðe a r W 0 r e a r W r Þ c a r < 1 for r ¼ 1, 2 and 3.
Assumption 5. The f it g are i.i.d. with mean zero and variance r 2 , and Ej it j 4þa exists for some a > 0.
Assumption 6. For an n Â n matrix C n uniformly bounded in row and column sums with elements of uniform order g À1 n , and an n Â 1 vector c n with elements of uniform order g À1=2 n , (i) g n n Dy 0 1 C n Dy 1 ¼ O p ð1Þ and g n n Dy 0 1 C n D 2 ¼ O p ð1Þ; (ii) g n n ½Dy 1 À EðDy 1 Þ 0 c n ¼ o p ð1Þ; (iii) g n n ½Dy 0 1 C n Dy 1 À EðDy 0 1 C n Dy 1 Þ ¼ o p ð1Þ; (iv) g n n ½Dy 0 1 C n D 2 À EðDy 0 1 C n D 2 Þ ¼ o p ð1Þ: Assumptions 2-5 are standard in the literature (see, e.g., Lee 2004;Debarsy et al. 2015). Assumption 6 is the same as Assumption F in Yang (2018). It imposes some mild conditions on the initial difference Dy 1 which will be used in the later proofs.
First note that the consistency ofĥ M ¼ ðb To prove the consistency off M , we first define the population counterpart of the AQS function as: Similar to the process of deriving the M-estimator, we can first solve for b M ðfÞ and r 2 , M ðfÞ as: 2 DXðDX 0 R À1 DXÞ À1 DX 0 R À 1 2 and M ¼ I nðTÀ1Þ À P: Then we have The expression will be useful in deriving r 2 , M ðfÞ in (3.3) in the proof for Theorem 3.1 below. Theorem 3.1. Suppose Assumptions 1-7 hold and further the following condition 0 < c DY inf f2Z c min ½Varðe a 1 W 1 DY À A 2 DY À1 Þ sup f2Z c max ½Varðe a 1 W 1 DY À A 2 DY À1 Þ c DY < 1, we haveĥ M ! p h 0 as n ! 1:

Asymptotic distribution of the M-estimator
To derive the asymptotic distribution ofĥ M , we apply the mean value theorem (MVT) to S Ã ðh 0 Þ is asymptotically normal. One thing to note here is that Dy 1 might not be exogenous and is unspecified, so the regular law of large numbers (LLN) and central limit theorem (CLT) for linear-quadratic forms from Kelejian and Prucha (2001) is not sufficient. Instead we use the extended LLN and CLT for bilinear-quadratic forms from Yang (2018) and Su and Yang (2015), which are listed in Lemmas A.3 and A.4 in the online appendix. The following lemma that expresses DY and DY À1 in a convenient format will be crucial in deriving the asymptotic distribution and later a consistent estimate of the VC matrix. Let blkdiagðC 1 , :::, C n Þ be the block diagonal matrix with diagonal n Â n matrices C 1 , :::, C n : Denote A 12, 0 ¼ e Àa 10 W 1 A 20 : Lemma 3.1. Under Assumptions 1, 3 and 5, where Dy 1 ¼ I TÀ1 Dy 1 , G ¼ blkdiag½A 12, 0 , ðA 12, 0 Þ 2 , :::, ðA 12, 0 Þ TÀ1 , G À1 ¼ blkdiag½I n , A 12, 0 , :::, By substituting (3.6) and (3.7) into s, a 1 and a 2 elements and substituting Du ¼ e Àa 30 W 3 D into the b, r 2 and a 3 elements of the AQS function (2.15) at the true value h 0 , we get ðB À1 e a 30 W 3 ÞW 1 e a 1 W 1 d, Using S Ã ðh 0 Þ in (3.8), we can derive the expected score and the variance of the AQS function at the true value to get the asymptotic distribution of the M-estimator.
Theorem 3.2. Suppose assumptions of Theorem 3.1 hold, we have, as n ! 1, are assumed to exist and W Ã ðh 0 Þ is assumed to be positive definite for sufficiently large n.

The OPMD estimator of VC matrix
In this section we derive a feasible estimator for the VC matrix W ÃÀ1 ðh 0 ÞX Ã ðh 0 ÞW ÃÀ1 ðh 0 Þ: Denote the Hessian matrix by H Ã ðhÞ ¼ @S Ã ðhÞ @h 0 : Then a consistent estimate of W Ã ðh 0 Þ is easily derived by substituting the consistent M-estimates in, i.e., W Ã ðĥ M Þ ¼ À 1 nðTÀ1Þ H Ã ðĥ M Þ: The detailed expression of W Ã ðĥ M Þ and the proof for the consistency of it are provided in the proof of Theorem 3.2 in the online appendix.
For X Ã ðh 0 Þ, however, this method does not work. This is because from (3.8) we know that s, a 1 and a 2 elements of S Ã ðh 0 Þ contain the initial difference Dy 1 , which is unspecified. So we need to design a method that is free from the initial condition. Following Yang (2018), we propose an outer product of martingale difference (OPMD) method to consistently estimate X Ã ðh 0 Þ: The OPMD method first transforms S Ã ðh 0 Þ in (3.8) into a sum of vector martingale difference sequence (MDS). Specifically, we will write R 0 r D, D 0 O r D À EðD 0 O r DÞ and D 0 F r Dy 1 À EðD 0 F r Dy 1 Þ for suitable r as sums of MDS. The transformation enables us to write X Ã ðh 0 Þ, which is the variance of the outer product of the sum of elements of a vector MDS, as the expected outer product of the elements of MDS because MDS has mean zero and the terms in the sum are independent (See 3.14 below). Then the averaged sum of the outer product of elements of the estimated vector MDS can be computed to be a consistent estimate of X Ã ðh 0 Þ: For a square matrix A ¼ A u þ A l þ A d , let A u , A l and A d be the upper-triangular, lower-triangular and diagonal matrix of A respectively. In the following we suppress the subscripts in R r , O r and F r for suitable r to simplify notations. Let R t be the n Â k submatrix or n Â 1 subvector of R, where R could be a nðT À 1Þ Â K matrix ðR 1 Þ or nðT À 1Þ Â 1 vector ðR 2 , R 3 and R 4 ). Let O ts and F ts be the n Â n submatrix of nðT À 1Þ Â nðT À 1Þ matrix O and F respectively. Note R t , O ts and F ts are partitioned by t, s ¼ 2, :::, T: Define F þ t ¼ P T s¼2 F ts , for t ¼ 2, :::, T, F þþ 2 ¼ F þ 2 e Àa 10 W 1 e a 30 W 3 , Dy ᭛ 1 ¼ e a 3 W 3 e a 10 W 1 Dy 1 , Dn ¼ ðF þþu Dy 1 : Let d it be the itth diagonal element of BO, where B ¼ B I n is defined after (2.14) in section 2.2. Let fP n, i g be the increasing sequence of r-fields generated by f j1 , :::, jT , j ¼ 1, :::, ig, i ¼ 1, :::, n, n ! 1: Let U n, 0 be the r-field generated by f 0 , Dy 0 g: Define U n, i ¼ U n, 0 P n, i as the r-field on the Cartesian product generated by subset of the form / n, 0 Â p n, i , where / n, 0 2 U n, 0 and p n, i 2 P n, i : We show in the following lemma that S Ã ðh 0 Þ can be written as sums of vector MDS.

12)
and fða 0 1i , a 2i , a 3i Þ 0 , U n, i g n i¼1 forms a vector MDS. Now using Lemma 3.2, for each R r , define a 1ri ¼ P T t¼2 R 0 rit D it for r ¼ 1, 2, 3 and 4; for each O r , define a 2ri ¼ P T t¼2 ðD it Dg rit þ D it D Ã rit À r 2 0 d rit Þ for r ¼ 1, 2, 3, 4 and 5; for each F r , define for r ¼ 1, 2 and 3. Then we can construct a vector a i ¼ ða 0 11i , a 21i , a 31i þ a 12i þ a 22i , À a 32i À a 13i À a 23i , a 33i þ a 14i þ a 24i , a 25i Þ 0 : Here for the first element EðR 0 1 DÞ ¼ 0: For the second element EðD 0 O 1 DÞ ¼ nðTÀ1Þ (3.13) Since Eða i jU n, iÀ1 Þ ¼ 0, fa i , U n, i g form a vector MDS. Together with (3.13), we thus have (3.14) A consistent estimator of X Ã ðh 0 Þ is then given byX  (3.15) and thus The M-estimator and the OPMD estimator of the VC matrix subsume submodels that contain MESS in the dependent variable, the lagged dependent variable and/or the disturbances. Their formats are derived in the online appendix. Different submodels are also explored in the Monte Carlo simulations in the next section.

Monte Carlo simulation
To fully investigate the performance of the M-estimator and the OPMD-based standard error, we establish the following models in the Monte Carlo simulation.
MESDPSð0, 1, 0Þ: The elements of X t is drawn from N(0, 4). Elements of Z and l are drawn from U(0, 1) and N(0, 1) respectively. The spatial weight matrices are based on rook and queen contiguity. To this end, n spatial units are randomly allocated into ffiffiffi n p Â ffiffiffi n p square lattice graph. In the rook contiguity case, w ij ¼ 1 if the jth observation is adjacent (left/right/above or below) to the ith observation on the graph. In the queen contiguity case, w ij ¼ 1 if the jth observation is adjacent to, or shares a border with the ith observation. The weights matrices are then row normalized. Three specifications of the disturbances t are generated: (i) normal, (ii) normal mixture (10% Nð0, 5 2 Þ and 90% N(0, 1)), (iii) standardized gamma (2, 1). Both (ii) and (iii) are standardized to have the same mean and variance with (i). Four sample sizes are considered, corresponding to n ¼ ð49, 100Þ and T ¼ ð3, 7Þ: The values of parameters are b 0 ¼ 10, b 1 ¼ 1, c ¼ 1 and r 2 ¼ 1: For q and a r , r ¼ 1, 2, 3, we select from a set of values ðÀ1:5, À 1:1, À 0:5, À 0:1, 0, 0:5, 1:1, 1:5Þ in different submodels. Each experiment is replicated 1000 times. To compare the performance of the OPMD estimator, we report the empirical standard deviations (sd), OPMD-based standard errors (se), standard errors based onX ÃÀ1 (se) and standard errors based on W ÃÀ1 ðĥ M Þ (b se). Better performance is represented by closer approximation to sd. We only show the results from the full model MESDPS(1,1,1) in the main paper and put the rest of estimated results in the online appendix. Table 1 presents results for the empirical means of the CQMLE and the M-estimator and Table 2 presents the empirical standard deviations and the standard errors for MESDPS (1,1,1). For the empirical means in Table 1, the M-estimator provides closer results to the true values of parameters than the CQMLE in most cases. It gives nearly unbiased results in many cases. For example, when n ¼ 49, T ¼ 3 and ðb 0 , r 2 0 , s 0 , a 10 , a 20 , a 30 Þ ¼ ð1, 1, 0:5, 1:1, 1:1, 1:1Þ, the CQMLE are ð0:9606, 0:8833, 0:4102, 1:0856, 1:1123, 1:1516Þ respectively, leading to the biases of ðÀ0:0394, À 0:1167, À 0:0898, À 0:0144, 0:0123, 0:0516Þ: On the other hand, the M-estimators are ð1:0016, 0:9446, 0:5024, 1:1060, 1:1053, 1:1726Þ respectively, with the biases ð0:0016, À 0:0554, 0:0024, 0:0060, 0:0053, 0:0726Þ: The M-estimator thus provides better results than the CQMLE except for a 3 . For b, s, a 1 and a 2 , the M-estimates are nearly unbiased. For a 3 , the bias of the M-estimator is relatively larger than that of the CQMLE. When n increases to 100, the biases of the CQMLE do not vanish for b, s, a 1 : But when T grows bigger, the biases of the CQMLE are getting smaller. On the other hand, the M-estimators remain unbiased for all n and T, while the bias of a 3 also vanishes as n and T grows bigger. For example, when n ¼ 49 and T ¼ 7, the biases of the CQMLE reduce to ð0:0002, À 0:0176, À 0:0034, 0:0034, 0:0051, 0:0125Þ and the biases of the M-estimator remains small at ð0:0009, À 0:0128, 0:0000, 0:0004, 0:0005, 0:0208Þ: The rational choice of n and T means that the M-estimator is useful in many real-world applications. It brings Note: Disturbance 1 ¼ normal, 2 ¼ normal-mixture and 3 ¼ gamma. Parameters h ¼ ðb, r 2 , s, a 1 , a 2 , a 3 Þ 0 : W 1 , W 2 and W 3 are generated by queen, rook and queen contiguity respectively.
nearly unbiased results for studies with short panels. For the standard errors in Table 2, the OPMD estimator has good performance, exhibiting much closer approximation to the empirical sd than the other two candidates in most cases. The OPMD estimator stays close to the empirical sd for most parameters under all n and T. Paying specific attention to s under disturbance that Table 2. Empirical sd and asymptotic standard errors of M-estimator, MESDPS(1,1,1). follows gamma distribution, we find that the OPMD estimator gives especially better performance than the other two candidates of se. For example, when ðb, r 2 , s, a 1 , a 2 , a 3 Þ ¼ ð1, 1, 0:5, 1:1, 1:1, 1:1Þ and n ¼ 49, T ¼ 3, under the gamma disturbances, sd for r 2 is 0.218. The OPMD based se ¼ 0:196: The other two candidates have estimatesse ¼ 0:139 and b se ¼ 0:149: We can see that se is much closer to sd than the other two candidates. This highlights the importance of conducting inference using the OPMD estimator when the normality of the disturbance is in doubt. Overall the M-estimator and the OPMD-based estimator for the standard error provide unbiased estimates and exhibit good finite sample properties.
The estimation results for other submodels are provided in the online appendix. The main conclusion does not change. The CQMLE is biased while the M-estimator provides nearly unbiased estimates in most cases, regardless of n and T. The OPMD estimator for the VC matrix provides closer approximation to sd in most cases than the other two candidates, especially when the disturbance is non-normal.

Empirical application to US outward FDI
In this section we apply the M-estimation method to US foreign direct investment to explore its usefulness. Recent literature explore third market as a determinant of bilateral FDI. Coughlin and Segev (2000) is the first paper to study FDI using spatial econometrics. They find a positive spatial lag (SL) and spatial error (SE) effect for China's inward FDI for neighboring regions. Baltagi et al. (2007) use the industries and countries FDI data to explore the knowledge-capital model of US outbound FDI using generalized moments (GM) estimators. They find that the spatial coefficients are significant while evidence of various modes of FDI emerges. Blonigen et al. (2007) study the US outward FDI by including spatial lag in the model and find that the estimates of the traditional determinants of FDI are robust to the inclusion of spatial lag. They find a positive and significant spatial lag using the whole sample which suggests complex-vertical motivations for MNE activity. Garretsen and Peeters (2009) apply a spatial lag model (SLM) and spatial error model (SEM) for Dutch FDI and find positive and significant spatial effects in both. Debarsy et al. (2015) utilize a cross-sectional MESS model on Belgium's outward FDI and find evidence of pure vertical FDI. They argue that this is because Belgium has high production costs such as labor. In our study, the focus will be placed on the spatial coefficients since the dynamic nature of the model changes the situation in a significant way.
The model to be estimated is a dynamic panel framework, Here LFDI t is the log of stock of outward FDI from US to host countries in year t. FDI are US outward positions (stocks) from International Direct Investment Statistics. The independent variables are a set of host country variables which includes log of GDP (LGDP), log of population (LPOP), log of an investment risk variable (LRISK), which is found to be important in the International Finance literature, and a surrounding-market potential variable (MP). We follow Garretsen and Peeters (2009) and compute it as the distance-weighted sums of other countries' GDP in the sample where the distance is the bilateral distance between capitals from Mayer and Zignago (2011). GDP and population data are extracted from the World Bank's World Development Indicators (WDI). Risk is the inverse of an investment profile index from International Country Risk Guide. We also add a time trend /tl n to capture the time-series variation. Table 4 contains the summary statistics of these variables. The spatial weight matrix is an inverse arc-distance between capitals of host countries. Similar to Blonigen et al. (2007), we multiply the weights by the shortest distance between capitals (80.98 km between capitals of Estonia and Finland). The same spatial weight matrix will be applied to all spatial processes. As discussed in section 2.2, there exists a relation between the spatial coefficients in STLE model and MESDPS, 4 i.e., k 1 ¼ 1 À e a 1 , q ¼ s þ 1 and k 2 ¼ e a 2 À 1: The M-estimation results of STLE specification is the comprehensive model which contains the spatial lag effect, dynamic effect, space-time effect and spatial error effect. It corresponds to our MESDPS(1,1,1). See section 2.2 for the detailed model specification.
corresponding models in Yang (2018) are thus also reported to highlight the relation in interpretations of the two methods. Table 5 summarizes the estimation results. We run two specifications: the STLE model and MESDPS (1,1,1). The STLE models are based on Yang (2018). Both specifications contain the CQMLE and the M-estimator.
We make three important observations. First we would like to emphasize the fact that the results capture the expected relation between spatial coefficients. In Table 5, the coefficient estimate for dynamic effects of the CQMLE for the STLE model is 0.4756 and for MESDPS(1,1,1) is À0.5619. For the M-estimator they are 0.7038 and À0.2911 respectively. They satisfy the relation q ¼ s þ 1: For W 1 , which represents the spatial lag in the STLE model and MESS in MESDPS(1,1,1) for the dependent variables, we find that the signs of the CQMLE of coefficients are positive and negative respectively. For the CQMLE, the STLE model has a coefficient of 0.3887 and MESDPS(1,1,1) has a coefficient of À0.4772. On the other hand, for the M-estimators the coefficient estimates are À0.1818 and 0.2569 respectively. These are in line with the relation k 1 ¼ 1 À e a 1 : For W 2 , we find that the coefficient estimates have the same signs. The CQMLE is À0.2388 for the STLE model and À0.2116 for MESDPS (1,1,1). The M-estimator has estimates 0.0574 for the STLE model and 0.1489 for the MESDPS(1,1,1). Combined with their magnitudes, the expected relation k 2 ¼ e a 2 À 1 holds. For W 3 , the CQMLE for the STLE model is À0.7770 and for MESDPS(1,1,1) is 0.7255. 5 For the M-estimator they are À0.1237 and À0.0191 respectively. Thus the results confirm our proposed relation between the coefficient estimates in the theory.
The second observation is that the inclusion of dynamic effects makes the coefficients of host country variables insignificant compared with the panel data case. In Blonigen et al. (2007) where the data from 1983 to 1998 are used, the signs for LGDP is positive and for LPOP and RISK are negative (see table 3 on p. 1315). The estimates are mostly significant in their study except MP variable. In our study, however, adding in a lagged dependent variable changes the model estimates extensively. Although the estimates (except LPOP) have the same signs with those in Blonigen et al. (2007), they are no longer significant. The sign for spatial lag of LFDI stays significant but becomes negative. The significance of coefficient estimate for LFDI tÀ1 tells us that the dynamic effect is a relatively important variable in explaining the variation in LFDI. The spatial terms are also significant in most cases.
The third observation is the difference between of the CQMLE and the M-estimator. While in most cases they have same signs in respective groups, their magnitudes differ. For example, the estimate for LGDP is 0.6958 for the CQMLE and 0.2545 for the M-estimator in the STLE model. This tells us that the M-estimator might correctly captures the impact of LGDP on LFDI. Although we do not have a reference in this field to examine its validity, the difference does tell us that we need to be careful in using the CQMLE which provide biased results. As kindly pointed out by a referee, note here 0:7255 > ln2, which implies the corresponding STLE coefficient is 1 À e 0:7255 ¼ À1:0658: This is one of the advantages of the MESDPS compared with the STLE model: the parameter space of the MESS coefficient for the disturbance term is unrestricted.
To investigate the impact measures, we compute the average direct impacts for the STLE model and MESDPS(1,1,1) and summarize them in Table 6. We can see that the CQMLE has similar average direct impacts for the STLE model and MESDPS(1,1,1) model for the 4 independent variables. This is also the case for the M-estimator. Their OPMD ses are also similar, which shows that the M-estimation method works for both MESDPS and the STLE model and provides similar impact measures.

Conclusion
In this paper we propose a consistent M-estimator to estimate the matrix exponential spatial dynamic panel specification (MESDPS) with fixed effects in short panels. To the best of our knowledge, this is the first paper to tackle this problem. The comprehensive model includes matrix exponential in the dependent variable, the lagged dependent variable and the disturbances. We also propose an OPMD estimator for the VC matrix. Valid inference can be based on the standard error derived from the OPMD estimator, especially when the normality of the disturbance is in doubt. The method can be applied to submodels and works well. The method is free from the initial condition specification and simple to use. It provides scholars a reliable way to conduct empirical research. Future research might focus on modifying the type of disturbance in the model to heteroskedastic.