Semiparametric Spatial Autoregressive Panel Data Model with Fixed Effects and Time-Varying Coefficients

Abstract This article considers a semiparametric spatial autoregressive (SAR) panel data model with fixed effects and time-varying coefficients. The time-varying coefficients are allowed to follow unknown functions of time, while the other parameters are assumed to be unknown constants. We propose a local linear quasi-maximum likelihood estimation method to obtain consistent estimators for the SAR coefficient, the variance of the error term, and the nonparametric time-varying coefficients. The asymptotic properties of the proposed estimators are also established. Monte Carlo simulations are conducted to evaluate the finite sample performance of our proposed method. We apply the proposed model to study labor compensation in Chinese cities. The results show significant spatial dependence among cities and the impacts of capital, investment, and the economy’s structure on labor compensation change over time.


Introduction
Panel data analysis has been widely used in many fields of social sciences as it usually enables strong identification and increases estimation efficiency. A comprehensive review about these methodologies can be found in Arellano (2003), Baltagi (2008), and Hsiao (2014). To model spatial dependence, spatial econometric models have drawn a lot of attention as they provided a way to model the cross-sectional dependence with a clear structure and intuitive interpretations.
A class of spatial autoregressive (SAR) models was first proposed in Cliff and Ord (1973). Since then, spatial econometrics has become an active research area in econometrics. One issue with spatial econometric models is that the spatial lag term is endogenous. Various estimation methods have been proposed to deal with this issue, for example, the instrumental variable (IV) method (Kelejian and Prucha 1998), the generalized method of moments (GMM) framework (Kelejian and Prucha 1999) and the quasi-maximum likelihood (QML) method (Lee 2004). More details of spatial econometrics can be found in classic spatial econometrics books, for example, Anselin, Florax, and Rey (2004) and LeSage and Pace (2009). As more temporal data became available, spatial panel data models have received considerable attentions. Spatial panel data models with SAR disturbances are considered in Baltagi, Song, and Koh (2003) and Kapoor, Kelejian, and Prucha (2007). Fingleton (2008) studies a spatial panel data model with a SAR-dependent variable and a spatial moving average-disturbance. Lee and Yu (2010) focus on a spatial panel model with individual fixed effects. More recent studies on spatial dynamic panel data models can be CONTACT Xuan Liang xuan.liang@anu.edu.au Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, ACT 2600, Australia.
Supplementary materials for this article are available online. Please go to www.tandfonline.com/UBES. found in Yu, De Jong, and Lee (2008), Lee and Yu (2014) and Li (2017). A common feature of the aforementioned models is that they are fully parametric with a linear form in regressors, which may lead to model misspecification. To enhance model flexibility, nonparametric and semiparametric spatial econometric models have been studied in the literature. Su and Jin (2010) considered a partially linear SAR model. Su (2012) proposed an SAR model with a nonparametric regressor term. Functionalcoefficient SAR models are also studied in Sun (2016) and Malikov and Sun (2017). These studies are all about crosssectional data. In terms of nonparametric and semi-parametric spatial econometric models for panel data, Zhang and Shen (2015) considered a partially linear SAR panel data model with functional coefficients and random effects. Sun and Malikov's (2018) study a functional-coefficient SAR panel data model with fixed effects. It is worth noting that they focus on the case of large N and finite T. In addition, the functional coefficients in these spatial models are mostly permitted to be unknown smooth functions of some exogenous variables. Sometimes, finding such appropriate exogenous variables in practice is challenging.
It has been noted that especially when the time span of data is long, coefficients of covariates are likely to change over time in many real examples (see discussions in Cai 2007;Silvapulle et al. 2017). The reasons for such changes may include changes in the economic structure or environment, policy reform, or technology development. To accommodate such cases, timevarying coefficient models have been studied extensively in the panel data setting, where the regressor coefficients are allowed to be unknown smooth functions of time (e.g., Li, Chen, and Gao 2011;Chen, Gao, and Li 2012;Robinson 2012). One advantage of time-varying coefficient models is that the time variable can be self-explanatory and naturally captures the nonlinear time variation in the coefficients. To the best of our knowledge, timevarying coefficient models have not been studied in spatial econometrics.
In this article, we propose a semiparametric time-varying coefficient spatial panel data model with fixed effects for large N and large T. Specifically, the coefficient of the spatial lag term in the model is assumed to be constant over time, while the regressor coefficients vary with time. In addition, the regressors can be trending nonstationary. To obtain consistent estimators for both the parametric components and nonparametric time-varying coefficients, we propose a local linear concentrated quasimaximum likelihood estimation (LLQML) method. When the time-varying coefficients are constant and the regressors are stationary, our model reduces to a fully parametric SAR panel data model that has been considered in Lee and Yu (2010). Our model allows only the coefficients of the explanatory variables to be time-varying. A more general setting with a nonparametric coefficient for the spatial lag term would be less restrictive. The model studied in Sun and Malikov (2018) has a functional-coefficient for the spatial lag term but for fixed T. We would like to leave more general models for future research.
Our contributions in this article are summarized as follows.
(i) We propose a semiparametric time-varying coefficient spatial panel data model. This model is suitable for panel data with spatial interaction and time-varying features, as it combines the strengths from different models, including strong identification of panel data models, clear interpretation of crosssectional dependence in spatial econometric models, and the flexibility of time-varying coefficient models. In the existing literature of spatial econometrics, the regressors are often assumed to be nonstochastic (see, e.g., Lee and Yu 2010;Su and Jin 2010). We relax such assumptions so that the regressors can be trending non-stationary, which renders our model and method of estimation more general and practically useful.
(ii) Since the model consists of both unknown parametric and nonparametric components, we propose the LLQML method to consistently estimate the parameters and the unknown time-varying functions by incorporating the local linear estimation (Fan and Gijbels 1996) into the QML estimation. We also establish the consistency and asymptotic normality of the proposed estimators.
(iii) We evaluate the finite-sample performance of our proposed model under several scenarios. We find that our estimators are consistent and robust against different model specifications, including not only time-varying models or non-stationary covariates, but also time-invariant models and stationary covariates. The results also show that if the time-varying coefficients are misspecified as constants, it would lead to severely inconsistent estimation.
(iv) As an empirical application of our model, we analyze the time-varying effects of factors on labor compensation in urban China over 1995-2009, a period that has seen continuous reforms and dramatic changes in the economy. Consistent with our conjecture, the estimated effects show quite strong timevarying features.
The rest of the article is organized as follows. Section 2 discusses the model setting and the estimation procedure. Section 3 lays out the assumptions. Asymptotic properties of the proposed estimators are established in Section 4. We report the results of Monte Carlo simulations and of the empirical application in Sections 5 and 6, respectively. In Section 7, we conclude. Appendix A provides the justification of the identification condition and then gives the proofs of the main theorems. Technical lemmas and their proofs as well as additional numerical results are given in Appendices B-D of the supplementary material.

Model
The model we consider in this article takes the following form: where Y it is the response of location i at time t, is the corresponding d-dimensional timevarying coefficient, α 0,i reflects the unobserved individual fixed effect, the error component e it has mean zero and variance σ 2 0 , and T and N are the time length and the number of spatial units, respectively. In this model, w ij describes the spatial weight of location j to i, which can be a decreasing function of spatial distance between i and j. The scalar parameter ρ 0 is a measure of the strength of spatial dependence. Hence, the term ρ 0 w ij Y jt captures the spatial interaction and X it β 0,t measures the covariate effects over time.
When β 0,t does not vary over time, it reduces to a vector of constants. In this case, model (1) reduces to the traditional SAR panel data model discussed in Lee and Yu (2010). If only some components of β 0,t change over time, model (1) takes the form of a partially time-varying spatial panel data model, which means that a few covariates have effects changing over time while the effects of other covariates stay constant. In this article, we assume that β 0,t is fully nonparametric and follows the specification where β 0 (·) is a d-dimensional vector of unknown smooth functions defined on R d and τ t = t/T ∈ (0, 1]. The same specification is used in Gao (2011) andLi (2012). The reason to rescale time onto the interval (0,1] is for convenience when estimating the model with the kernel method. For the purpose of identifying β 0 (τ t ) when the constant 1 is included in X it , the individual fixed effects are assumed to satisfy N i=1 α 0,i = 0. Such a condition is standard in the literature, for example, Su and Ullah (2006) and Chen, Gao, and Li (2012) with its justification provided in Appendix A.1.
Let 0 n and 1 n be the vectors with n elements of zeros and ones, respectively. Let 0 m 1 ×m 2 denote an m 1 × m 2 matrix with all zero elements and let I m denote an m-dimensional identity matrix. Define an N × N spatial weight matrix W = (w ij ) N×N with zero diagonal elements, that is, (1) can be written as (3) can further be written as In (4), we move the spatial lag term (ρ 0 WY t ) to the left side so that S N (ρ 0 )Y t would be regarded as the new response variable as if ρ 0 were known. The goal is to construct consistent estimators for the spatial coefficient ρ 0 , the variance σ 2 0 , and the unknown time-varying coefficient function β 0 (τ ).

Estimation
The joint quasi log-likelihood function of model (4) can be written as  Su and Ullah (2006) and Su and Jin (2010), we propose the LLQML method, which is a two-step procedure: (i) Estimate β 0 (τ ) for fixed ρ and α by the local linear kernel method (Fan and Gijbels 1996) and denote it as β ρ,α (τ ); (ii) Plug β ρ,α (τ ) into (5), and obtain the QML estimators ρ, σ 2 and α. With ρ and α estimated, the estimator of β 0 (τ ) can then be updated by β ρ, α (τ ). To be more specific: Step 1. Assume that β 0 (·) has continuous derivatives of up to the second order. Let K(·) and h be the kernel function and the smoothing bandwidth, respectively. Applying the Taylor is the first derivative of β 0 (·) and τ ∈ (0, 1]. We also have that X t β 0 (τ t ) ≈ X t β 0 (τ ) + τ t −τ h X t hβ 0 (τ ). According to the local linear kernel method, the estimators of β 0 (τ ) and hβ 0 (τ ) for given (ρ, α) can be obtained by minimizing the weighted loss function L(a, b) with respect to (a , b ) , that is, where the NT × 2d matrix M(τ ) and the NT × NT matrix (τ ) are defined as follows: Taking the first derivative of L(a, b) with respect to (a , b ) and equating it to zero, we obtain , the estimator of the time-varying coefficient β 0 (·) can be expressed by Step 2. In this step, we plug β ρ,α (τ ) into the original loglikelihood (5) and estimate ρ 0 and σ 2 0 by maximizing the quasi log-likelihood function: whereỸ(ρ) = (I NT − S)Y * (ρ) andD = (I NT − S)D are the smoothing versions of Y * (ρ) and D by the NT × NT matrix S =X˜ , in which the NT × dT matrixX is a diagonal block matrix with X t being its tth diagonal block, and dT × NT matrix = ( (τ 1 ) , . . . , (τ T ) ) . Taking the derivative of (7) with respect to α and setting it to be zero, we have Then, taking the derivative of Equation (8) with respect to σ 2 and equating it to zero, we have the estimator of σ 2 0 as the following function of ρ: Replacing σ 2 with σ 2 (ρ) in Equation (8), we obtain the concentrated quasi log-likelihood function: Therefore, we estimate the parameters θ 0 = (ρ 0 , σ 2 0 ) and α 0 by θ = ( ρ, σ 2 ) and α as follows: Finally, the updated estimator of β 0 (τ ) is obtained by plugging ρ and α into Equation (6): In order to establish asymptotic properties for the proposed estimators, we need to introduce the following assumptions.
(ii) Denote v t = (v 1t , . . . , v Nt ) . Suppose that {v t , t ≥ 1} is a strictly stationary sequence with mean zero and α-mixing with mixing coefficient α mix,N (t), and that there exists a function α mix (t) and a constant δ such that α mix,N (t) ≤ α mix (t) and ∞ t=1 α mix (t) δ/(4+δ) < ∞ for some δ > 0. (iii) Let v it be identically distributed in index i for any given t.
In addition, we assume E|v itk | 4+δ < ∞ for k = 1, . . . , d Remark: Assumption 1 is a list of assumptions about the ddimensional explanatory variable X it .
Assumption 1(i) assumes that the time trend g(τ ) is continuous, which is a standard assumption to model the trend in X it . With this structure, the regressors can be either stationary or non-stationary over time. Specially, if g(τ t ) reduces to a constant vector, it covers the case with stationary X it . Otherwise, X it is generally nonstationary. By assuming this, we take the nonstationarity of X it into account when we derive the theoretical properties of the estimators. The reason why g(τ ) is defined over (0, 1] is to scale the time domain to a bounded set, for the same reason as for β 0 (τ ). Note that g(τ ) here can be further generalized to allow for an individual time trend g i (τ ). To make theoretical derivations less complicated, we consider the homogeneous trend. The trend g(τ ) can be estimated by To allow for serial dependence in v t , we impose the stationarity and α-mixingness of v t in Assumption 1(ii) (see, e.g., Fan and Yao 2003;Gao 2007). Since v t consists of N vectors v it for i = 1, . . . , N, its α-mixing coefficient depends on N and hence is denoted as α mix,N (t). We further assume that there exists an upper bound α mix (t). Similar assumption can be found in Chen, Gao, and Li (2012). Moreover, the assumption ∞ t=1 α mix (t) δ/(4+δ) < ∞ is commonly used in the literature; see, for example, Dou, Parrella, and Yao (2016). This assumption is weaker than the exponentially decaying α-mixing coefficient α mix (t) = c α ψ t for 0 < c α < ∞ and 0 < ψ < 1; see, for example, Chen, Gao, and Li (2012) and Chen, Li, and Linton (2019).
It is worth noting that we only assume v it to be identically distributed in index i in Assumption 1(iii), which is weaker than the iid assumption for covariates in Sun and Malikov (2018). It ensures that the cross-sectional dependence of v it in the vector v t can be allowed.
Meanwhile, it is allowed the constant 1 term to be included in X it . When g 1 (τ ) reduces to constant 1 and v it1 degenerates to v it1 ≡ 0, X it1 ≡ 1 is exactly the constant 1 term.
= m j , almost surely for j = 3 and 4.
Remark: Assumption 2 summarizes the conditions on the error term. Assumption 2(ii) ensures that the first and second conditional moments of e t given E t−1 do not depend on the information generated by {v it : i ≥ 1, t ≥ 1}. In particular, Assump- are independent of each other, this result naturally holds for the mean zero error vector e t . Assumption 2(iii) assumes the conditional independence of e it (i = 1, . . . , N) given E t−1 as well as its conditional moments. A sufficient condition for Assumption 2(ii) and (iii) is that e it are independent in both i and t (e.g., see Assumption 2 of Yu, De Jong, and Lee 2008) and {e it } is independent of F V . It is worth noting that the conditional independence of e it in Assumption 2 (iii) along with Assumption 2 (ii) can help form a martingale difference array in both i and t in the theoretical derivations; see, for example, the proof of Theorem 2 in Appendix A.2. Further, this technique of the proof can be adapted to model (3) if a crosssectional dependent random structure is specified. For example, we can still impose Assumption 2 but we replace e t in model (3) by a cross-sectional dependent random error ε t = Le t , where L is a non-stochastic matrix and E(ε t ε t |E t−1 ) = σ 2 0 LL can measure the cross-sectional dependence. If we assume that L is uniformly bounded in both row and column sums in absolute value (analogously to Assumption 4), similar theoretical results can be established but more technical derivations are involved.
Assumption 3. The kernel function K(·) is a continuous and symmetric probability density function with compact support. The bandwidth is assumed to satisfy h → 0 as min(N, T) → ∞, Th → ∞ and NTh 8 → 0.
Remark: Assumption 3 first imposes the conditions on the kernel function, which is common in the literature; see, for example, Chen, Gao, and Li (2012). Conditions on the bandwidth h along with T and N are also considered; see similar conditions in Assumption A5 of Chen, Gao, and Li (2012).
Assumption 4. To be more specific on the dimension of the N × N spatial weight matrix W, we denote W as W N in this assumption. We assume W N is a non-stochastic matrix with zero diagonals and is uniformly bounded in both row sum norm and column sum norm (for short, UB), that is, where is a compact interval with the true value ρ 0 as an interior point. Also, S N (ρ) and S −1 N (ρ) are both UB, uniformly in ρ ∈ .
Remark: Assumptions 4 and 5 are standard assumptions originated from Prucha (1998, 2001) and also used in Lee (2004). When W is row-normalized, a compact subset of (−1, 1) has often been taken as the parameter space for ρ. The UB conditions limit the spatial correlation to a manageable degree. To save space, we refer readers to Kelejian and Prucha (2001) for more discussions.
Assumption 6. The time-varying coefficient β 0 (·) has continuous derivatives of up to the second order.
Remark: Assumption 6 is a mild condition on the smoothness of the functions which is required by the local linear fitting procedure. Such an assumption is common for nonparametric estimation methods, for example, Li and Racine (2007, condit. 2.1), Gao (2007, assump. 2.7) and Assumption A3 of Chen, Gao, and Li (2012). Assumption 7 guarantees the uniform boundedness of the sum of absolute fixed effects.
To proceed, we need to introduce the following notations.
Remark: Assumption 8 is a condition for the identification of ρ 0 , which is similar to Lee (2004, assump. 8), Lee and Yu (2010, assump. 4), and Su and Jin (2010, assump Similar to the explanation of Assumption 7 in Su and Jin (2010), R * N,T can be interpreted as the remainder of R N,T deviated from the timevarying projection onto X it β(τ t ). In practice, one may use the estimate of R,R to check this assumption.
where the existence proofs of the limits are shown in Lemma C.7 of the supplementary material.
where θ 0 = lim N,T→∞ NT,θ 0 with NT,θ 0 being defined by in which p ii and (gp) ii are the ith main diagonal elements of P N,T and P N,T G N,T , respectively, and θ 0 = is positive definite as shown in Lemma C.9.
Since we use the QML method to estimate θ 0 , it relaxes the normality assumption on the error term but adds an additional term to the variance that is a function of the error term's third and fourth moments. If the third and fourth moments are satisfied with m 3 = 0 and m 4 = 3σ 2 0 , the asymptotic covariance matrix in Equation (10) reduces to −1 θ 0 , as shown in the following proposition.
provided that X (τ ) is positive definite for each given τ in Thus, the rate of convergence of β(τ ) is √ NTh, which is standard in the nonparametric estimation. It is also clear that the covariance matrix is related to g(τ ) since it involves the trend of X it . When X it is stationary, the asymptotic covariance matrix in (11) reduces to a constant matrix σ 2 One can use the following sample version to estimate the unknown covariance matrices involved: The consistency of these sample estimators is shown in Lemma C.10 in the supplementary material.

Monte Carlo Simulations
We now conduct a number of simulations to evaluate the finite sample performance and the robustness of our proposed model and estimation under a rich set of scenarios, which are different in stationarity of the covariates, variation in time of the coefficients, and the degree of spatial dependence. The simulated data are generated from the following model: The data generating process for our simulation is summarized below. First, the spatial matrix W in the data generating process is chosen as a "q step head and q step behind" spatial weights matrix as in Kelejian and Prucha (1999) with q = 2 in this section. The procedure is as follows: all the units are arranged in a circle and each unit is affected only by the q units immediately before it and immediately after it with the weight being 1. Then following Kelejian and Prucha (1999), we normalize the spatial weights matrix by letting the sum of each row equal to 1 so that it generates an equal weight influence from all the neighboring units to each unit. Then, the regressor is set to be and v −100 = 0 N . It is obvious that {v it2 } is both serially and cross-sectionally dependent. The error term e it is independent and identically generated from the distribution of N(0, 1) so that where β 0,1 (τ ) and β 0,1 (τ ) represent the time-varying coefficients associated with the constant 1 and X it2 in X it , respectively. Various simulation settings are defined by changing the specification of g(τ ), β 0 (τ ) and ρ 0 . Specifically, we consider the following scenarios: • Set I (Setting of g(τ )): (I-1) g(τ ) = 0; (I-2) g(τ ) = 1 and (I-3) g(τ ) = 2 sin(π τ ); • Set II (Setting of β 0 (τ )): (II-1) β 0 (τ ) = (1, 1) ; (II-2) β 0 (τ ) = (1, 1 + 2τ + 2τ 2 ) , (II-3) β 0 (τ ) = (1 + 3τ , 1 + 2τ + 2τ 2 ) ; • Set III (Setting of spatial coefficient): (III-1) ρ 0 = 0.3, (III-2) ρ 0 = 0.7.
Each of these sets (and combinations of them) will generate data of (i) covariates of different stationarity (Set I): in Sets I-1 and I-2 X it2 is stationary and in Set I-3 X it2 is non-stationary; (ii) coefficient β 0 (τ ) with different time-varying features (Set II): from Set II-1 to Set II-3, β 0 (τ ) changes from time-invariant, partially time-varying to fully time-varying respectively; and (iii) different SAR coefficient or spatial dependence among crosssectional units (Set III). For each scenario, simulations are conducted on 1000 replications. The Epanechnikov kernel K(u) = 3/4(1−u 2 )I(|u| ≤ 1) is used where I(·) is the indicator function. The bandwidth is selected through a leave-one-unit-out crossvalidation method explained in Appendix A.3. The simulated data are first estimated by our proposed model and estimation method, and then estimated by a standard timeinvariant spatial panel data model considered in Lee and Yu (2010) and their proposed estimation. In short, we call it the "Lee-Yu model". Tables 1 and 2 report the means and standard deviations (SDs) (in parentheses) of the bias for the estimates of our model for ρ 0 and σ 2 0 under different settings of g(τ ) and β 0 (τ ), together with those of the Lee-Yu model (with ρ 0 fixed at 0.3, that is, Set III-1). A few comments can be made on the results. First, our estimates of ρ 0 and σ 2 0 are consistent under all settings as the means and SDs of the bias of ρ 0 and σ 2 0 are getting smaller when either N or T is increasing. It shows the robustness of our model against different settings on the covariates and time-varying coefficients.
Second, if the data are generated by a time-invariant process (Set II-1), the estimates of ρ 0 and σ 2 0 from the Lee-Yu model are consistent with smaller biases compared to ours. It makes sense as a time-invariant spatial panel date model is a special case of our model. However, when the coefficients of the covariates have time-varying features (Set II-2 and Set II-3) in the data generating process, the estimates of ρ 0 and σ 2 0 from the Lee-Yu model are not consistent and exhibit large biases. For example, under the combination of Set I-2 and Set II-2, the biases are around 0.27 for ρ 0 and 1.9 for σ 2 0 . When there are more coefficients having time-varying features, (e.g., from Set II-2 to II-3), the biases become larger. These findings confirm that when the time-varying model is misspecified as a time-invariant model, following the estimation of the Lee-Yu model will lead to inconsistent estimates.
Third, comparing different data-generating processes, if the data are generated from a fully time-varying model (Set II-3), our estimates have the smallest biases and SDs, followed by a partially linear model (Set II-2) and then a time-invariant model (Set II-1) given the setting of X it2 . For example, when N = 15, T = 15 and X it2 follows Set I-3, the means and SDs of biases of our estimates for ρ 0 are −0.0301 (0.665), −0.0148 (0.0297), −0.0086 (0.0212), respectively, from Set II-1 to Set II-3.
We also evaluate our model by examining the finite-sample performance of the estimators for the two time-varying coefficients β 0,1 (τ ) and β 0,2 (τ ). In Table 3, the means and SDs (in parentheses) of the mean squared errors (MSEs) of the estimates of the time-varying coefficients β 0,1 (τ ) and β 0,2 (τ ) are reported, where for an estimator β k (τ ) (k = 1, 2), the MSE is defined by The results show that both the means and SDs of MSE for β k (τ ) (k = 1, 2) decrease when either N or T increases, confirming the consistency of our estimators. The results for a different spatial coefficient ρ 0 = 0.7 (Set III-2) are similar to our benchmark case of Set III-1 (ρ 0 = 0.3), and the corresponding simulation results can be found in Tables D.1-D.3 of the supplementary material.

Empirical Application
As a case study, we apply our model to analyze the level of real wages in 159 Chinese cities over the period between 1995 and 2009. The level of real wages measures the demand for labor and is closely related to productivity; see for example Van Biesebroeck (2015) for a literature review and Combes, Démurger, and Li (2017) for another example. We believe that this is an ideal empirical application for our proposed model for the following two reasons. First, China's vast internal urban labor markets are inter-linked. The wage level in each city is not only determined by the characteristics and/or performance of itself, but also depends on those around it. The spatial effects of wages are also discussed in the literature; see, for example, Braid (2002) and Baltagi, Blien, and Wolf (2012). Second, China has experienced unprecedented economic growth and change in the economic structure during the last four decades or so. During the period, the organization of the economy, the environment in which economic agents operated, and perhaps even the agents themselves have changed dramatically. For example, the reforms of State-owned Firm started in 1997 changed the ownership of most of the firms, the way they are managed, the productivity of labor, and how remuneration was determined. The series of reforms in housing, the health system, and education also changed both demand and supply of labor. In other words, it is likely that the economy has experienced "structural changes" over the years and that many of the key relations between economic variables may not remain constant over time.
In this case study, we explain the logarithm of the average wage level of a city by a number of variables, including capital (measured by asset), investment (measured by FDI), and the economic structure of the city (measured by the proportion of industries and sectors). Ideally, a variable reflecting the level of labor input should be included in the model. The only variable that is available to us is the size of the population in each city. The variable appears to be highly correlated with the time trendthe correlation coefficient between the average (log) population Table 3. Means and standard deviations of MSE of β(τ ) = ( β 1 (τ ), β 2 (τ )) (ρ 0 = 0.3, σ 2 0 = 1).
(a) β 1 (τ ) Proportion of employed persons in the manufactural sector out of the nonagricultural sector size of the cities and the time trend is about 0.98. Thus, we chose not to include the variable. The impact of labor input is absorbed by the coefficient of the time trend when the constant term is included in the time-varying model. Table 4 provides the definitions of these variables. Our model captures spatial inter-dependence and potential change in the effects of these explanatory variables. Our data are derived from Statistic Year Books of China (various years), for 1995-2009 (T = 15) and cover 159 (N) cities, including four cities like Beijing, Shanghai, Tianjin, and Chongqing directly administrated by the central government, and other 155 cities at or above prefectural levels in China. A prefectural city in China means a city that is directly controlled by provincial governments. The geographical location of these cities can be found in Figure 1. Following the convention, we divide China into seven regions: East China (EC), South China (SC), Southwest (SW), North China (NC), Northwest China (NW), Central China (CC), and Northeast China (NE). The densities of cities in these regions are quite different that EC has 56 cities, almost one-third of the whole country while it is very sparse in the western region, reflecting the uneven distribution of cities. We use highway distances between pairs of cities in kilometers to measure the spatial distances between cities, as they can reflect economic distances between cities. These data are collected using the service provided by Google Map Services. We specify W as the inverse of highway distances between cities, standardized by its maximum eigenvalue.
As described in the assumptions of Section 3, regressors are allowed to be trending stationary. To check whether the macrolevel regressors considered here are trending stationary, we first  fit the unknown trend in the regressors with the local linear estimation method. After removing the fitted trend, we obtain the residuals in regressors. Then, we conduct the Im-Pesaran-Shin (IPS) panel unit root tests on these residuals. Refer to Im, Pesaran, and Shin (2003, eqs. (4.10) and (4.6)) for the IPS test statistics W tbar and Z tbar , respectively. According to the test statistics and p-values in Table 5, the null hypotheses of panel unit root for these variables are all rejected, indicating that the assumption of the trend stationary regressors is valid for our dataset.
It is known that the choice of bandwidth is important for a nonparametric kernel estimation. We first estimate the model with the optimal bandwidth (h opt = 0.4000) obtained by the leave-one-unit-out cross-validation method, and then compare the results with a number of different bandwidths around it: 2/3h opt , 4/5h opt , 5/4h opt , and 3/2h opt . Figure 2 shows that the results under different bandwidths are consistent. Table 6 reports that the estimates of the spatial coefficient ρ 0 and variance σ 2 0 , which shows that the estimates are quite similar in these specifications. Given the robustness of the results, we decide to use the average bandwidth h ave = 0.4173 of those five bandwidths for the rest of the article. We estimate the model for China as a whole and then for East China (Figure 3). Table 7 reports the estimates of the spatial coefficient ρ 0 , σ 2 0 and the time average of β 0 (τ ) for the whole country and for East China, respectively. The significance of the spatial coefficient estimate reflects the spatial dependence and confirms the existence of spillover effects between cities. From the table we can see that, over the years, FDI contributed positively to the wage level on average, if FDI increases one percent, the average level of real wage would increase by 0.025%. If capital increases by one percent, the average real wage would increase by 0.026%. The estimates also show that economic structure affects wages as well. For example, if the share of the manufactural or service sector increases by one percent, the average real wage would increase by 0.365% or 0.326%, respectively, but the increase in wage would be 0.380% higher if the size of the service sector is one percent larger relative to the manufactural sector. Comparison between the whole country and East China illustrates that the spatial dependence is much bigger in EC than in the whole country. It makes sense as the economic connection is spatially stronger in small regions. The impact of FDI on wage is smaller in EC. This is likely due to a smaller difference in economic development between EC and the rest of the country so that the additional effect of FDI on the local economy is not as large as for less developed parts of the country. The average effect of capital and the share of manufactural in EC are larger than the whole country. This is because it is the densest region in the country and its manufactural sectors are more developed. Figure 4 displays the time-varying coefficient curves for all variables with their 95% confidence bands for the whole country. It shows the parameters of the explanatory variables evolve clearly over time. For example, the impact of FDI on the wage level has been decreasing over time. This can be explained by the fact that in the early stage of reform, foreign investment brought advanced technology and management knowhow, which also push up the labor demand, but as the domestic economy catches up, the impact of FDI on the labor market becomes less important. Meanwhile, the impact of capital on the wage level kept increasing over time. This could be because as  the economy becomes less labor-intensive, as the capital level increases, the demand for work also increases accordingly. The effects of economic structure on wage have also changed over time. These findings confirm that indeed the relations between economic variables changed dramatically during such a period of fast development. Figure 5 shows the time-varying features of these variables and the intercept appears quite strong in EC as well. The results imply that if a time-invariant model were used, the impacts of these variables would be estimated with biases.
In addition, we conduct model diagnostics. To save space, we just report the results of the model for the whole country since the results of the model for the EC region are quite similar. To check the stationarity of residuals, we implement IPS unit root test. Two test statistics are W tbar = −7.3741 and Z tbar = −7.4154 with p-values less than 0.0001. So we reject the null hypothesis of panel unit root and conclude the residuals are   we apply the Benjamini and Hochberg (1995)'s multiple testing procedure that controls the false discovery rate (FDR) of (12) at the rate of 0.05. We also apply the Bonferroni correction method (see Miller Jr 1966) that controls the familywise error rate (FWER) of (12) at the rate of 0.05. Both of these two multiple testing procedures show that the null hypotheses for all cities cannot be rejected, which means there is no serial correlation in the residuals. All the aforementioned diagnostics results support the validity of our regression model.

Conclusion
We have considered a semiparametric SAR panel data model with fixed effects. This model is designed particularly for situations where covariate effects on the dependent variable change over time. The spatial dependence structure between units is assumed to be time-invariant presented by a parametric spatial lag term. To consistently estimate both the parametric  and nonparametric components, we have proposed a local linear concentrated QML estimation method. Asymptotic properties for the estimators have been derived with parametric √ NT and nonparametric √ NTh rate of convergence, respectively, when both the cross-sectional size N and the time length T go to infinity.
The finite-sample performance of our model has been evaluated and compared with those from a time-invariant spatial panel data model using Monte Carlo simulations. The results showed that when the time-varying coefficients are misspecified to be constants, using the standard time-invariant spatial panel data model would lead to inconsistent estimates while our proposed model is always consistent and robust.
We have also applied the proposed model to study labor compensation in Chinese cities. Our results have illustrated that as China became more developed, the impacts of capital, investment, and the structure of the economy on labor compensation have changed over time. The results also imply that for a fast-changing economy such as China, many important economic parameters may not be consistently estimated with a time-invariant model.

Funding
Xuan Liang and Jiti Gao thank to the Australian Research Council Discovery Grants Program under grant numbers: DP150101012 & DP170104421 for its financial support. Xuan Liang also acknowledges the financial support of the ANU RSFAS Cross-Disciplinary Grant.

Appendix A: Justification of Identification Condition
Considering the specification of β 0,t = β 0 (τ t ) in Equation (2), our model (1) becomes where the constant 1 is included in the regressor X it .
Without loss of generality, let . Then, our model becomes Without imposing the assumption α = 0, model (A1) implies that there is an identification issue with the identifiability and then estimability of β 0,1 (τ ). Hence, in order to identify β 0,1 (τ ) we require the condition N i=1 α 0,i = 0. In fact, the advantage of this identification condition is to allow us to estimate β 0,1 (τ ) as a smooth time-varying or trending effect in contrast to the fixed effects structure. Therefore, we believe that our model is more flexible and applicable.

Appendix B: Proofs of Theorems
Proof of Theorem 1. Even though we have the nonparametric terms in our model, the idea for proving the consistency of the parametric estimators and the identification can be adopted from Lee (2004). Define Q N,T (ρ) = max σ 2 E{logL N,T (θ )}, where θ = (ρ, σ 2 ). In order to show the consistency of θ, it suffices to show and the uniqueness identification condition that lim sup by using White (1996) and Lee (2004), where N c (ρ 0 ) is the complement of an open neighborhood of ρ 0 on of diameter .
Observe that Q N, Due to P N,T D = 0 NT,N−1 , P N,T = P N,T and S N,T (ρ) = S N,T + (ρ 0 − ρ)I T ⊗ W, we can rewrite σ 2 * (ρ) as To show Equation (A2), it is sufficient to show that According to Lemma B.2, we know By Equations (A4) and (A5), According to Lemma B.3, we get Equation (A6) so that Equation (A2) holds.
(2) Proof of Equation (A3). Consider an auxiliary SAR panel process: Y t = ρ 0 WY t + e t where e t ∼ N(0 N , σ 2 0 I N ) and t = 1, . . . , T. Denote the log-likelihood of this model as logL a N,T (ρ, σ 2 ). Letσ 2 (ρ) = (NT) −1 σ 2 0 tr{S −1 N,T S N,T (ρ)S N,T (ρ)S −1 N,T }. It can be verified that  Next, we show the asymptotic normality of (NT) −1/2 {Q − E(Q|F V )}. By the Cramér-Wold device, it is sufficient to derive the asymptotic normality of (NT) −1/2 (a 1 , a 2 ){Q − E(Q|F V )}, where (a 1 , a 2 ) is any given two-dimensional constant vector. Note that In order to adapt the proof of Theorem 1 in Kelejian and Prucha (2001) in our case, the conditions for e in Assumption 2 can help us establish a martingale difference array. Accordingly, the central limit theorem for martingale difference arrays used in the proof of Theorem 1 in Kelejian and Prucha (2001) can be applied to derive the asymptotic normality of the linear quadratic form e A N,T e + b N,T e.
To make use of the above constructed martingale difference array to derive the asymptotic normality of the linear quadratic form e A N,T e + b N,T e, we let (A) j 1 j 2 be the (j 1 , j 2 )-th element of matrix A and (b) j be the j-th element of vector b. Note that both A N,T , b N,T ∈ F V because they are functions of {X t , t = 1, . . . , T} and also A N,T is symmetric. Hence we obtain e A N, (j) .
Since A N,T , b N,T ∈ F V , we then have E(Z j |F j−1 ) = 0. This implies that {(Z j , F j ) : 1 ≤ j ≤ NT} forms a martingale difference array. At this stage we can apply the central limit theorem for martingale difference arrays similarly to the proof of Theorem 1 in Kelejian and Prucha (2001), and obtain that (2) Proof of (A8). The second derivative can be obtained as follows: ∂ 2 logL N,T (θ) ∂ρ 2 = −Ttr G 2 N (ρ) − According to Lemma C.5, we can get Thus, Equation (A8) has been proved.