Estimation of Matrix Exponential Unbalanced Panel Data Models with Fixed Effects: An Application to US Outward FDI Stock

Abstract In this article, we consider a matrix exponential unbalanced panel data model that allows for (i) spillover effects using matrix exponential terms, (ii) unobserved heterogeneity across entities and time, and (iii) potential heteroscedasticity in the error terms across entities and time. We adopt a likelihood based direct estimation approach in which we jointly estimate the common parameters and fixed effects. To ensure that our estimator has the standard large sample properties, we show how the score functions should be suitably adjusted under both homoscedasticity and heteroscedasticity. We define our suggested estimator as the root of the adjusted score functions, and therefore our approach can be called the M-estimation approach. For inference, we suggest an analytical bias correction approach involving the sample counterpart and plug-in methods to consistently estimate the variance-covariance matrix of the suggested M-estimator. Through an extensive Monte Carlo study, we show that the suggested M-estimator has good finite sample properties. In an empirical application, we use our model to investigate the third country effects on the U.S. outward foreign direct investment (FDI) stock at the industry level.


Introduction
Unbalanced (or incomplete) panel data are common occurrence in empirical analysis due to early exits, late entries, dormant periods of economic activity, early recordings, and so on.This is also the case for empirical studies using spatial panel data models where missing entities at a given time period do not generate any spillover effects to their neighbors.Researchers dealing with unbalanced panels often turn them to balanced panels by sub-setting the original panels to simplify econometric analysis.This article aims to fill the gap in available estimation and inference methods for the spatial unbalanced panel data models.We consider a general spatial specification that allows for (i) spatial dependence specified through matrix exponential terms, (ii) unobserved heterogeneity across entities and time, and (iii) potential heteroscedasticity in the error terms across entities and time.We will refer to this model as the matrix exponential unbalanced panel data model with fixed effects.
The Matrix Exponential Spatial Specification (MESS) was introduced by LeSage and Pace (2007) as an alternative to spatial autoregressive specification, and has several features that make it more convenient for estimation (Leonard and Hsu 1992;Chiu, Leonard, and Tsui 1996).In a MESS type model, the spatial dependence in the dependent variable and/or the disturbance term is formulated through a matrix exponential term of the form e αW , where α is a scalar spatial parameter, W is an n × n spatial weights matrix whose elements are O(1/h n ) uniformly, and h n is a sequence that can be bounded or divergent (Lee 2004).Therefore, the MESS imposes an exponential rate of decay for the cross-sectional dependence.Moreover, since matrix exponential terms are always invertible, there is no need to impose any restrictions on the parameter space of the spatial parameters to ensure the existence of the reduced form of the model.Finally, the likelihood based estimation of a MESS type model has the computational advantage because the likelihood function does not involve any matrix determinant terms that need to be evaluated at each iteration during estimation.Among others, see also LeSage and Pace (2009), Debarsy, Jin, and Lee (2015), and Yang, Dogan, and Taşpınar (2021).
We consider the likelihood based estimation of the matrix exponential unbalanced panel data model with fixed effects.It is well known in the literature that the Quasi Maximum Likelihood Estimator (QMLE) of the panel data regression models with entity fixed effects suffers from the incidental parameter problem when panels are short, that is, the time dimension is short (Neyman and Scott 1948).With both entity and time fixed effects present, a transformation approach that can wipe out the fixed effects from the model offers a feasible estimation approach.However, this approach does not extend to the unbalanced panel data models with time varying spatial weights matrices (Lee and Yu 2010).Also, the transformation approach requires row-normalized weights matrices, which we will not dictate in this article.Instead of a transformation approach, we adopt a likelihood-based direct approach in which we jointly estimate the common parameters and fixed effects.
In the likelihood-based direct estimation approach, we first concentrate out the fixed effects from the quasi log-likelihood function.Then, for the remaining parameters of the model, we derive their corresponding score functions and their expectations at the true parameter vector.We show that when the number of time periods is fixed, the expectations of the score functions may not tend to zero, suggesting that the probability limit of the score functions is not zero.This result suggests that the QMLE is inconsistent unless T is large.Therefore, we adjust these scores by subtracting their respective expectations from them and then use these adjusted score functions for consistent estimation.We define our suggested estimator as the root of these adjusted score functions, and therefore our approach can be called the M-estimation as suggested by Yang (2018) and Li and Yang (2021).In the case of heteroscedasticity, we also show that the QMLE may not be consistent, and therefore we adopt a similar M-estimation strategy as in the case of the homoscedastic specification by suitably adjusting the score functions so that their probability limit becomes zero under an unknown form of heteroscedasticity.
We formally establish the consistency and the asymptotic normality of the proposed M-estimator under both homoscedastic and heteroscedastic cases.The variancecovariance matrix of the proposed M-estimator takes a sandwich form.We propose an approach based on the sample counterpart and plug-in methods for consistent estimation of the variancecovariance matrix of the proposed M-estimator.We show that the expectation of the Hessian matrix can be estimated consistently by its sample counterpart, evaluated at consistent estimates, in both homoscedastic and heteroscedastic cases.In the case of the variance of the adjusted score functions, we show that the plug-in estimator can be inconsistent, and suggest an analytical bias correction in both homoscedastic and heteroscedastic cases.In an extensive Monte Carlo study, we assess the finite sample properties of the proposed estimator and the inference method.Our results attest that the proposed estimators perform satisfactorily in terms of finite sample bias and inference.
In an empirical application, we use our model to investigate the third country effects on the U.S. outward Foreign Direct Investment (FDI) stock at the industry level.In the literature, there are a few noteworthy empirical studies that employed unbalanced spatial panel data models.Among others, see Baltagi and Chang (1994), Baltagi, Song, and Jung (2001), Bai, Liao, and Yang (2015), Wansbeek and Kapteyn (1989), Antweiler (2001), Davis (2002), and Wooldridge (2019).Our unbalanced panel data cover 47 host countries across 10 industries over the period 2008-2014.Our model allows for three types of spatial interaction: (i) spatial dependence in FDI specified through a matrix exponential term (Coughlin and Segev 2000;Blonigen et al. 2007;Debarsy, Jin, and Lee 2015), (ii) spatially weighted thirdcountry determinants of FDI that are motivated by the threefactor knowledge capital model considered in Baltagi, Egger, and Pfaffermayr (2007), and (iii) spatial dependence in the error terms specified through a matrix exponential term in order to account for the transmission of shocks across host countries.Also, our model allows for the presence of an unknown form of heteroscedasticity in the error terms of the model.Our estimation results show evidence for the presence of the third country effects on the U.S. outward FDI stock.
Theoretically, our article belongs to the recent expanding literature on the specification and estimation of spatial panel data models.Our article is closely related to Lee and Yu (2010) and Meng and Yang (2021).Lee and Yu (2010) consider a direct likelihood-based estimation approach for a balanced spatial panel data model with both entity and time fixed effects, and show that the variance parameter cannot be consistently estimated when the number of time periods is fixed.Similarly, Meng and Yang (2021) consider a direct likelihood-based estimation approach for an unbalanced spatial panel data model that allows for entity and time fixed effects, as well as potential heteroscedasticity in the error terms.Following Yang (2018), they propose an M-estimator for their model and establish the large sample properties of the M-estimator.Wang and Lee (2013) consider a panel data model with random effects and spatially lagged outcome variable in which some observations on the dependent variable are missing at random.Using an imputation approach for missing observations, they propose a nonlinear least squares approach and a generalized method of moments approach for the estimation of the model.Our article differs from these papers since we specify the spatial dependence through matrix exponential terms instead of spatial autoregressive processes.
In the literature, it is well known that the Quasi Maximum Likelihood Estimator (QMLE) of spatial autoregressive models may not be consistent in the presence of an unknown form of heteroscedasticity (Lin and Lee 2010;Kelejian and Prucha 2010;Liu and Yang 2015).However, in the case of MESS type models, the QMLE remains to be consistent under an unknown form of heteroscedasticity provided that the panels are balanced, the spatial weights matrices are time invariant and commute, and the model involves only entity fixed effects (Debarsy, Jin, and Lee 2015;Zhang, Feng, and Jin 2019).We show that the QMLE of the matrix exponential unbalanced panel data model with entity and time fixed effects may not be consistent.Following Yang (2018), we suggest a heteroscedasticity robust M-estimator based on suitably adjusted score functions for estimation.Our estimator has the standard large sample properties irrespective of whether the number of time periods and h n are bounded or unbounded.
The rest of the article is organized as follows.In Section 2, we provide the specification details of the matrix exponential unbalanced panel data model with fixed effects.In Section 3, we present the details of the M-estimation methodology for the homoscedastic specification and establish the formal results for the large sample properties of the proposed estimator.In Section 4, we extend the main ideas of the M-estimation methodology to the heteroscedastic specification and establish the formal results.In Section 5, we investigate the finite sample performance of the proposed estimators through an extensive Monte Carlo study.In Section 6, we provide our empirical application on the third country effects on the U.S. outward FDI stock at the industry level.In Section 7, we offer concluding remarks.All technical details are left to a supplementary web appendix.

Model Specification
The matrix exponential unbalanced panel data model requires stacking observed units for a given time period so that the spatial dependence among these units can be accommodated.This stacking scheme is similar to Wansbeek and Kapteyn (1989) for their unbalanced panel data error component model.Then, the matrix exponential unbalanced panel data model that has the spatial dependence in the dependent variable and in the error terms can be written as (2.1) Here, y t = (y 1t , . . ., y n t t ) is the n t × 1 dependent variable and n t is the number of spatial units available at time t, W t , and M t are n t × n t spatial weights matrices that have zero diagonal elements, X t is the n t × k matrix of exogenous variables with the matching parameter vector β 0 , u t = (u 1t , . . ., u n t t ) is the n t × 1 vector of regression error terms, and t = ( 1t , . . ., n t t ) is the n t × 1 vector of idiosyncratic error terms (or innovations).
In a given time period, we assume that there can be at most n spatial entities, and we use the n × 1 vector μ 0 to denote the entity fixed effects.Thus, C t μ 0 denotes the entity fixed effects for the spatial units observed at time t, where C t is the n t × n sub-matrix obtained by deleting the rows corresponding to the missing entities at time t from the n × n identity matrix I n .
The time fixed effect at period t is denoted by λ t0 , and l n t is the n t × 1 vector of ones.The matrix exponential terms e α 0 W t and e τ 0 M t in (2.1) are defined as e α 0 W t = ∞ i=0 (α 0 W t ) i /i! and e τ 0 M t = ∞ i=0 (τ 0 M t ) i /i!, where α 0 and τ 0 are the scalar spatial parameters.These terms are always invertable, and the respective inverses are given by e −α 0 W t and e −τ 0 M t .Thus, the reduced form of (2.1) is y t = e −α 0 W t X t β 0 + C t μ 0 + λ t 0 l n t + e −τ 0 M t t and imposes no restrictions on the parameter space of the spatial parameters.

Estimation under Homoscedasticity
In this section, we consider the quasi maximum likelihood estimation of (2.3) under the following assumption.
Assumption 1.The { it } are iid across i and t with mean zero and variance σ 2 0 , and E| it | 4+ < ∞ for some > 0.
Under this assumption, the elements of the innovation term are iid across i and t.The moment condition E| it | 4+ < ∞ is required for the application of the central limit theorem for linear and quadratic forms (Kelejian and Prucha 2001) and ζ = (α, τ ) .Then, under Assumption 1, the quasi log-likelihood function of the model can be expressed as where (β, ζ , δ) = e τ M e αW y − Xβ − Cδ and | • | is the determinant operator.Since the spatial weights matrices have zero diagonal elements, we have ln e αW = T t=1 ln e αtr(W t ) = 0 and ln e τ M = T t=1 ln e τ tr(M t ) = 0, where tr(•) denotes the trace operator.Thus, (3.1) becomes free of the Jacobin terms and simplifies to . Then, using (3.4), we obtain the following score functions: where W = blkdiag(W 1 , . . ., W T ) and M = blkdiag(M 1 , . . ., M T ).A necessary condition required for the consistency of the QMLE θ is plim N→∞ 1 N S c (θ 0 ) = 0. To investigate this necessary condition, we derive E (S c (θ 0 )) as The order of the elements of E (S c (θ 0 )) is required for the large sample analysis.As such, we make the following assumptions.
Assumption 2. (i) n is large, and T can be large or small, (ii) n t increases with n at the same rate for all t, and (iii) all spatial entities are observed for at least two time periods.Assumption 3. (i) The elements of the time varying spatial weights matrices {W t } and {M t } are at most of order h −1 n uniformly such that h n /n → 0 as n → ∞. (ii) The spatial weights matrices {W t } and {M t } are uniformly bounded in both row sum and column sum matrix norms.
C t e τ M t is bounded in both row sum and column sum matrix norms, uniformly in τ ∈ τ for all s and t, where K t (τ ) = I n 1 for t = 1, and K t (τ ) = I n t − e τ M t l n t l n t e τ M t e τ M t l n t −1 l n t e τ M t for t = 2, . . ., T.
Assumption 2 lays out the asymptotic setting.The first part requires that n is large and T can be large or small, and does not impose any conditions on the relative growth rates of n and T. The second part requires that n t increases with n ensuring that the number of observed spatial units in each period is not small relative to n.The third part ensures that the spatial structure is complete after the entity fixed effects are concentrated out from the model.
Assumption 3 provides the essential properties of the spatial weights matrices.The first part is considered in Lee (2004), and requires that the elements of the spatial weights matrices have order O(h −1 n ) uniformly.Here, h n can be bounded or divergent with the property h n /n → 0 as n → ∞.The second part ensures that the spatial correlation is limited to a manageable degree so that the large sample analysis remains tractable (Kelejian and Prucha 2001).The assumption rules out the cases where as n tends to infinity each unit remains spatially correlated with all remaining (n − 1) units and h n is divergent.For instance, take h n = O(n) and each unit remains spatially correlated with all remaining (n − 1) units.Although Assumption 3 (ii) is satisfied in this case, Assumption 3 (i) is violated.As another example, take h n = O(n 1/2 ) and each unit remains spatially correlated with all remaining (n − 1) units.Although Assumption 3 (i) is satisfied in this case, Assumption 3 (ii) is violated.Such cases can arise for example when the domain of the spatial units is kept fixed and n tends to infinity (Cressie 1993, p. 350).Assumption 4 requires that the parameter space of the spatial parameters is compact, as in Debarsy, Jin, and Lee (2015).Note that the parameter space does not depend on the features of the spatial weights matrices.This assumption and the second part of Assumption 3 imply that the matrix exponential terms in our model are uniformly bounded in both row sum and column sum matrix norms.This can be seen from , which is bounded if |α| and W t are bounded, where • is either the row sum or the column sum matrix norm.
The first part of Assumption 5 states that the minimum and the maximum eigenvalues of e αW e αW and e τ M e τ M are bounded uniformly.Note that these terms are positive definite, so their smallest eigenvalues are positive.In the first part of Assumption 5, we require that the eigenvalues of these terms are strictly uniformly positive over the parameter space.In addition, the largest eigenvalues of e αW e αW and e τ M e τ M are bounded since these terms are uniformly bounded in both row and column sums under Assumptions 3 (ii) and 4. We further require that the eigenvalues of these terms are uniformly bounded above over .Finally, the last part of Assumption 5 ensures that Q C (τ ) is uniformly bounded in both row sum and column sum matrix norms.
Under these assumptions, it follows that 1 1) when either T or h n is unbounded.Therefore, it is important to adjust the score functions such that plim N→∞ 1 N S c (θ 0 ) = 0 holds in all cases.
To that end, we consider the general form of the adjusted quasiscore functions S * (θ 0 ) = S c (θ 0 ) − E (S c (θ 0 )): where From the first two adjusted score functions in S * (θ ), we can formulate the estimators of β and σ 2 for a given ζ value as where  0 (van der Vaart 1998).Given ζ , we can derive the following population counterparts of (3.8) and (3.9) as where where Using the result in (3.14), we state the identification condition in our setting by the following assumption.
Assumption 6 is a high level assumption and can be difficult to verify in practice.However, we can investigate sufficient low level conditions from the requirement that S * c (ζ ) = 0 whenever ζ = ζ 0 .In Section F of the accompanying web appendix, we provide such conditions under which Assumption 6 holds.
Using (3.10) and (3.14), we can express The results in (3.15) coupled with (3.13) will be useful for showing the consistency of ζ * .To that end, we also require the following assumption.
Assumption 7. X is exogenous, with uniformly bounded elements, and has full column rank.Also lim N→∞ 1 N X (τ )X(τ ) exists and is nonsingular, uniformly in τ ∈ τ .
Assumption 7 provides a regularity condition for the exogenous variables (Debarsy, Jin, and Lee 2015).In particular this assumption ensures that Q X (τ ) is uniformly bounded in row sum and column sum matrix norms, uniformly in τ ∈ τ .Under our stated assumptions, the uniform convergence The following theorem formalizes the consistency of θ * under our stated assumptions.
Proof.See Section C in the web appendix.
To derive the asymptotic distribution of θ * , we apply the mean value theorem to S * ( θ * ) = 0 at θ 0 , to obtain , where θ lies between θ 0 and θ * elementwise (Jennrich 1969, Lemma 3) , we can express S * (θ 0 ) in terms of linear and quadratic forms of in the following way: where S(τ 0 ) = e τ 0 M We −τ 0 M .Then, we can apply the Central Limit Theorem (CLT) for linear-quadratic forms in Kelejian and Prucha (2001) to establish the asymptotic normality of 1 S * (θ 0 ).Also, our assumptions ensure that 1 . These results, as shown in the proof, lead to the following theorem.
Theorem 3.2.Under Assumptions 1-7, as N → ∞, we have )) are assumed to exist and * (θ 0 ) is assumed to be positive definite for sufficiently large N.
Proof.See Section C in the web appendix.
Theorem 3.2 indicates that we need a consistent estimate of the variance-covariance matrix for inference.In the case of * (θ 0 ), we can use the observed counterpart given by * ( θ for a ∈ {β, σ 2 , α, τ } and A s = A + A for any square matrix A. Then, it is easy to show that where , and S(τ ) = e τ M We −τ M .Under our stated assumptions, it can be shown that * ( θ * ) = * (θ 0 ) + o p (1) (see the proof of Theorem 3.2 for the details).
In the case of * (θ 0 ), we suggest a plug-in estimator.Let vec D (A) be the vector that contains the diagonal elements of a square matrix A. Define Then, using Lemma A.6 of the web appendix, we can derive the following closedform expressions for the elements of * (θ 0 ): Since φ appears in the elements of * (θ 0 ), we need a consistent estimator of φ to derive a consistent estimator of * (θ 0 ).Let δ * = δ( β * , ζ * ) be the plug-in estimator of δ 0 obtained from (3.3), and ρ and κ be consistent estimators of ρ and κ, respectively.Then, the plug-in estimator of * (θ 0 ) is given by In the next theorem, we show that this plug-in estimator is a biased estimator of * (θ 0 ) under our stated assumptions.
The bias term arises because we may not be able to consistently estimate δ 0 when T is fixed.Under our assumptions, both Q 2 (τ 0 ) and P C (τ 0 ) are bounded in row sum and column sum matrix norms.Moreover, in Lemma A.2 (iii) of the web appendix, we show that the elements of P C (τ 0 ) are O(max{1/n, 1/T}) uniformly.Also, it follows from Lemma A.1 of the web appendix that the elements of Therefore, in settings with large T and bounded h n , or with fixed T and divergent h n , the bias term becomes negligible.However, in settings with fixed T and bounded h n , the bias correction is necessary for valid inference.
The bias corrected estimator is given by ˆ * = * ( θ * ) − Bias * ( τ * ).Note that this bias corrected estimator requires the consistent estimators of ρ and κ.To that end, we start from a consistent estimator for Let q jh be the (j, h)th element of Q C (τ 0 ).Let j and ˜ j be the jth element of and ˜ , respectively.Then, j ) over j and solving for ρ, we obtain (3.20) This result suggests the following sample counterpart as a consistent estimator for ρ: where ˆ j is the jth element of ˆ ( β * , ζ * ) and qjh is the Summing E(˜ 4 j ) over j and solving for κ, we have Thus, we can consider the following sample counterpart of this equation as a consistent estimator for κ: The next theorem shows that ρ and κ are consistent estimators.
Proof.See Section C in the web appendix.

Estimation Under an Unknown Form of Heteroscedasticity
In this section, we allow for both cross-sectional and timeseries heteroscedasticity, and present a robust estimation and inference method for our matrix exponential unbalanced panel data model.Heteroscedasticity in matrix exponential models has been discussed in Debarsy, Jin, and Lee (2015) for crosssectional data, and in Zhang, Feng, and Jin (2019) for balanced panel data.However, in both papers the discussion on the (quasi) likelihood based estimation scheme has been limited to the case where the spatial weights matrix W involving the dependent variable and M involving the error terms are commutative, that is, WM = MW.These papers show that the QMLE is consistent, if the commutativity property holds.However, this desirable feature of the matrix exponential specification extends neither to the matrix exponential unbalanced panel data models nor to the matrix exponential balanced panel data models with two-way error components.Therefore, a robust method for estimation and inference is highly desired for the matrix exponential unbalanced panel data models with an unknown form of heteroscedasticity.The following assumption specifies the form of heteroscedasticity in our setting.
Assumption 8.The { j } for j = 1, . . ., N are independent but not identically distributed (i.n.i.d), that is, As in the case of homoscedasticity, we will compute the expectations of the α and τ elements of the concentrated quasi score function in (3.5) under Assumption 8, and then adjust these elements using their corresponding expectations.First, we consider the score function with respect to α. Recall that G(ζ 0 )y = e τ 0 M φ + , S(τ 0 ) = e τ 0 M We −τ 0 M and ˜ = Q C (τ 0 ) .Then, using the fact that tr(DA) = tr(Ddiag(A)) for a diagonal matrix D and any conformable matrix A, we have where the last equality follows from the fact that diag S (τ 0 ) Thus, our suggested robust adjusted quasi score function for α is the sample counterpart of the result in (4.2): By construction, the quadratic form in (4.3) has an expectation of zero at the true parameter value θ 0 under Assumption 8. Using similar steps for the τ element of the concentrated quasi score function in (3.5), we obtain where A quadratic form that has a zero mean can be constructed from (4.4) as Since e τ 0 M e α 0 W y − Xβ = e τ 0 M Cδ 0 + , the sample counterpart of (4.5) is given by Then, using (4.3) and (4.6) together with the β element of the adjusted quasi score functions in (3.5), which is robust against the unknown heteroscedasticity, we suggest the following robust adjusted quasi score functions under Assumption 8: Note that from (4.1), we have 1 1 and A.2 of the web appendix.Note that if M and W are commutative, then tr S (τ 0 ) = 0, suggesting that 1 N E y e α 0 W W e τ 0 M ˜ = O(1/ max{h n , T}).Also, from (4.4), we have 1 To derive the robust M-estimator from (4.7), we can similarly first solve for β given ζ , which is the same as (3.8): (4.8) Then, substituting β † (ζ ) into the α and τ elements of (4.7), we obtain the concentrated robust adjusted quasi score functions as where where Proof.See Section C in the web appendix.
Proof.See Section C in the web appendix.
We need consistent estimators of † (ω 0 ) and † (ω 0 ) to conduct valid inference.Given the expressions of ∂S † (ω) ∂ω in the proof of Theorem 4.2, † (ω 0 ) can be estimated by its sample counterpart where . The proof for the consistency of † ( ω † ) is provided in the proof of Theorem 4.2.For † (ω 0 ), we first derive its closed-form expression using Lemma A.7 of the web appendix: . When deriving the above terms we used the fact that diag S(τ which implies that the diagonal elements of S(τ 0 ) are zeroes.Similarly, the diagonal elements of Q(τ 0 ) are also zeroes.By Lemma A.7 of the web appendix, all terms involving the 3rd and 4th moments of the disturbances disappear, that is, † (ω 0 ) is free from the 3rd and 4th moments of the disturbances.For convenience of exposition, let us write † (ω 0 ) as † (ω 0 , δ 0 , ).Let δ † be the estimator of δ 0 derived from the RME using (3.3).Let † ( ω † , δ † , ) be the estimator of † (ω 0 ) given .Then, we can derive a bias-corrected estimator for † (ω 0 , δ 0 , ) as follows.
Under our assumptions and Lemma A.2 of the web appendix, it can be shown that Bias † δ (τ 0 , ) = O(1/ max{T, h n }).Therefore, in settings with large T and bounded h n , or with fixed T and divergent h n , the bias term becomes negligible.However, in settings with fixed T and bounded h n , the bias correction is necessary for valid inference.Now we need to provide consistent estimators for the trace terms involving .Since , where denotes the Hadamard product.This implies an estimator for (σ 2 1 , . . ., σ 2 N ) as where A − denotes the generalized inverse of A and Note that elements of † (ω 0 ) takes forms of either tr( C) or tr( A B).We now show the effect of replacing the unknown by ˆ = diag( σ 2 1 , . . ., σ 2 N ) in large samples.
Theorem 4.4.Assume (τ ) = (Q C (τ ) Q C (τ )) −1 exists for τ in a neighborhood of τ 0 and is bounded in row and column sum norms.Let A and B be two N × N matrices that have zero diagonal elements with elements a ij and b ij , respectively, and be bounded in row and column sum norms.Let C be an N × N matrix that has uniformly bounded diagonal elements.Then, Proof.See Section C in the web appendix.

A Monte Carlo Study
In this section, we investigate the finite sample performance of the proposed ME and RME.To this end, we consider the following specification for n entities with elements randomly missing over T periods, e τ 0 M t u t = t , t = 1, 2, . . ., T. (5.1) We first generate n elements for X 1t and X 2t by drawing independently from the standard normal distribution and U(0, √ 12), respectively, where U(0, √ 12) denotes the uniform distribution (0, √ 12).Some of these observations are dropped subsequently when we introduce the unbalancedness.For the entity fixed effects, we consider a Mundlak type specification, μ 0 = 1 T T t=1 X 1t + e t , where the elements of e t are drawn independently from the standard normal distribution.The elements of the λ t0 are drawn independently from the standard normal distribution.
For the spatial weights matrices, we consider the rook and the queen contiguity cases.To this end, we first generate a vector containing a random permutation of the integers from 1 to n without repeating elements.Then, we reshape this vector into an k × m rectangular lattice, where m = n/k.In the rook contiguity case, we set the ijth element of the weights matrix to 1 if the j'th entity is adjacent (left/right/above or below) to the i'th entity on the lattice.In the queen contiguity case, we set the ijth element of the weights matrix to 1 if the j'th entity is adjacent to, or shares a border with the i'th entity.Subsequently, the rows and columns corresponding to the missing entities are deleted from W t and M t when we introduce the unbalancedness, and then we row normalize the weights matrices.
We set it = σ it η it , and consider two cases for η it .In the homoscedastic scenario, we set σ it = 1 for all i and t, and generate η it 's independently from either (i) the standard normal distribution or (ii) the standardized Gamma(2, 1), where Gamma(a, b) denotes the gamma distribution with the shape parameter a and the scale parameter b.In the heteroscedastic scenario, we consider σ 2 it = exp(0.1 + 0.35X 2,it ), where X 2,it is the (i, t)th element of X 2 , and η it 's are again drawn independently from either (i) the standard normal distribution or (ii) the standardized Gamma(2, 1).In the accompanying web appendix, we consider two additional specifications for the variance terms.
The unbalancedness of the panels is introduced by randomly dropping at most 10% of the n entities in each period, while making sure that each entity is observed at least twice over T periods.For n and T we consider the following values, n ∈ {50, 100} and T ∈ {3, 7}.The true parameter values are The number of resamples is 1000 in each experiment.The matrix-vector products approach considered in Yang, Dogan, and Taşpınar ( 2021) can be used for fast estimation of our model.We provide pseudo estimation algorithms describing the steps in the M-estimation in the accompanying web appendix.
The results from the experiments are presented in Tables 1-4.In these tables, we report the empirical mean, the empirical standard deviation (in brackets) and the average of the asymptotic standard error (in square brackets) for the QMLE, ME and RME.The asymptotic standard errors are calculated using the consistent estimators of the variance-covariance matrices of the ME and RME provided in Theorems 3.2 and 4.2, respectively.Below, we summarize our findings from the tables.
1.As expected, the QMLE suffers from the incidental parameter problem.When T = 3, the QMLE of σ 2 is downward biased significantly, close to −40%.As T increases to 7, the bias gets smaller, but there still remains a significant amount of bias, about −20%.2. In general, the ME and the RME show excellent finite sample performance in terms of finite sample bias.This result is not sensitive to the presence of heteroscedasticity and/or the type of distribution chosen for the disturbances.For example, in Table 1 when T = 3 and τ 0 = −1, we observe that the bias of the ME and the RME are very close to each other for all parameters.The largest of the bias values occur for τ in the case where n = 50, W=Queen and M=Rook.In this case, the bias of the ME is about −2.6% and the bias of the RME is about −2.5%.When T increases to 7, we observe that the bias of the ME and the bias of the RME decrease to about −1.1%.
For the other parameters both ME and RME impose almost no bias.3. Similarly, in the small T case, when τ 0 = 1, we observe that the bias of the ME and the RME are very close to each other for all parameters.The largest of the bias values occur for τ in the case where W=Rook and M=Queen.For example, in Table 3 for n = 50, the bias of the ME is about 7.2% and the bias of the RME is about 7.8%.It gets significantly smaller when T increases to 7, and we observe that the bias of the ME and the bias of the RME decrease to about 2%. 4. In terms of finite sample inference, we observe that both the ME and the RME show excellent performance.The reported average values for the asymptotic standard errors are very close to the corresponding empirical standard deviations for all parameters.We also observe that as T increases both the ME and the RME become more precise as both report smaller empirical standard deviations and average asymptotic standard errors.These findings are not sensitive to the choice of weights matrices, and/or the distributional specification for the disturbance term.
5. In the heteroscedastic scenario, we expect that the RME should outperform the ME in terms of finite sample bias and inference.However, the results from Tables 2 and 4 show that our heteroscedasticity specification using the scedastic function does not seem to deteriorate the finite sample properties of the ME significantly.

The Third Country Effects on the U.S. Outward FDI Stock
In this section, we consider an extension of the bilateral specification of the three-factor knowledge capital model suggested in Baltagi, Egger, and Pfaffermayr (2007) to investigate the third country effects on the U.S. outward FDI stock at the industry level.We collect data on the U.S. outward FDI stock in 47 host countries at the industry level between 2008 and 2014.These host countries include developed, developing and undeveloped ones, and thus yielding a comprehensive sample.
In our analysis, each unit is a country-industry pair, denoted by a subscript i.The dependent variable is LFDI it , which is the natural log of U.S. outward FDI stock in country-industry pair i at year t, where i = 1, . . ., n t and t = 1, . . ., T. In our sample, there are seven periods, that is, T = 7, covering the period 2008-2014, and n t varies from period to period, from a minimum of 340 country-industry pairs in 2013 to a maximum of 364 country-industry pairs in 2011.The dataset is thus unbalanced with a total number of observations, N = T t=1 n t = 2454, spanning 47 host countries and 10 industries.The set of independent variables includes a measure of absolute bilateral country size, given by G it = ln(GDP US,t + GDP it ), where the U.S. subscript denotes the parent country and the i subscript denotes the host country.We also include a similarity measure of country size proposed by Helpman and Krugman (1985) and Helpman (1987) ), where d US,i is the physical distance between the capitals of the United States and the host country i.The last independent variable is an indicator of host country investment risk, denoted by R, computed as the inverse of an investment profile index.The sources and the descriptive statistics of our sample data are given in the web appendix.
We augment the model suggested in Baltagi, Egger, and Pfaffermayr (2007) with three types of spatial dependence: (i) spatial dependence in FDI specified through a matrix exponential term, (ii) spatial lags of the third-country determinants of FDI, and (iii) spatial dependence in the error terms specified through a matrix exponential term.Our estimation equation allows for heterogeneity in two dimensions, namely, the country-industry pairs and time dimensions.Thus, our estimation equation takes the following form: The elements of the n t × n t spatial weights matrix W t are specified as w ij = d −1 ij / N j=1 d −1 ij for i = j and w ij = 0 for i = j, and d ij is the bilateral physical distance between capitals of i and j taken from Mayer and Zignago (2011).The countryindustry pairs and time fixed effects are respectively denoted by μ and λ t .We also allow for the presence of an unknown form of heteroscedasticity in the error term t .
We use our suggested M-estimators to estimate the parameters of the model.Table 5 provides the estimation results.The first column contains the estimation results obtained under homoscedasticity assumption from the ME.The second column provides the estimation results obtained from the RME under the assumption of an unknown form of heteroscedasticity.In the case of spatial parameters, the ME reports statistically insignificant results, whereas the RME reports statistically significant results.The RME estimates of α and τ are −0.1692 and 0.2756, respectively.Since the U.S. outward FDI stock is likely to be affected differently by the unobserved factors in different host countries, it is more likely that the disturbance terms are heteroscedastic in our model.The ME reports a negative and insignificant estimate for G, while the RME reports a negative and statistically significant estimate.Both estimators report a positive and statistically significant estimate for the similarity variable, which is in line with prediction of the bilateral specification of the three-factor knowledge capital model suggested in Baltagi, Egger, and Pfaffermayr (2007).In the case of k, h u , , and , both estimators report statistically insignificant estimates.
The third-country effects are captured by the spatially weighted regressors.We can see that most of their coefficient estimates are significant, which means that the third-country effects are important and should not be neglected.Both estimators report negative and statistically significant estimate for the spatial lag of bilateral country size.This result indicates that there can be reallocation of plants from the host country to the third country when there is an increase in the size of third country (Baltagi, Egger, and Pfaffermayr 2007).Both estimators report positive and statistically significant estimates for the spatially weighted similarity variable and the relative skilled labor endowment variable.These results are in line with the horizontal and export-platform FDI modes described in Baltagi, Egger, and Pfaffermayr (2007).Both estimators report positive and statistically significant estimates for the relative unskilled labor endowment, which supports horizontal and vertical FDI modes.Both estimators report negative and statistically insignificant estimates for W , while positive and significant estimates for W .These results support vertical mode in the case of W , and horizontal, export-platform and complex vertical modes in the case of W .The last row of the table provides a pseudo-R 2 measure for model fit.Both estimators report a pseudo-R 2 value of 94.93%.Finally, both estimators report negative and statistically significant estimates for WR.Overall, our results show that the third-country effects are important in the U.S. outward FDI stock.

Conclusion
In this article, we considered a direct likelihood based estimation of the matrix exponential unbalanced panel data model that has the entity and time fixed effects.Our estimation approach allowed for the estimation of the common parameters and fixed effects simultaneously.We first showed that the QMLE of our model may not be consistent under both homoscedastic and heteroscedastic disturbance terms.We then adjusted the score functions under both homoscedastic and heteroscedastic cases, and defined valid M-estimators based on the adjusted score functions.We established the large sample properties of the resulting M-estimators, the ME and the RME, respectively.For inference, we suggested an analytical bias correction approach involving the sample counterpart and plug-in methods to consistently estimate the variance-covariance matrices of the suggested estimators.Through an extensive Monte Carlo study, we showed that our suggested estimators have good finite sample properties.In an empirical application, we showed that there is statistical evidence for the presence of third country effects on the U.S. outward FDI stock.
= 0 for the consistency of the QMLE is satisfied if h n is divergent.However, if h n is bounded, then we should use S † (β, ζ ) for ensuring consistent estimation.
T}) by Lemmas A.1 and A.2 of the web appendix.Thus, under heteroscedasticity, the necessary condition plim N→∞ 1 N S c (θ 0 )
NOTE:We report the empirical mean (standard deviation) [average asymptotic standard error].
, and is given by S it = 1 − s 2 US,t − s 2 it , where s US,t = GDP US,t / GDP US,t + GDP it and s it = GDP it / GDP US,t + GDP it are the share of respective country's GDP in the bilateral GDP.Another independent variable is a measure of relative capital stock defined by k it = ln(K US,t /K it ),

Table 3 .
Homoscedastic case with η it ∼ Gamma(2, 1).K US,t and K it are the capital stocks of the United States and the host country i, respectively.A measure of relative skilled labor endowment is also considered as an independent variable.This measure is defined by h s it = ln(H US,t /H it ), with H US,t and H it denoting the skilled labor endowments in the United States and the host country i, respectively.Similarly, we also have a measure of relative unskilled labor endowment, given by h u it = ln(L US,t /L it ), with L US,t and L it denoting the unskilled labor endowments in the United States and the host country i, respectively.The model includes two interaction terms: it where

Table 5 .
Estimation results for the U.S. outward FDI Stock.