Heterogeneous spatial dynamic panels with an application to US housing data

ABSTRACT This paper proposes two models that incorporate both heterogeneity and multiple sources of spatial correlation for dynamic panels. One uses convex combinations of them to form a single weight matrix. The second one includes explicitly different spatial weight matrices to form a higher order model. We use a Bayesian scheme for model estimation by deriving the full conditional distributions of heterogeneous parameters. Our Monte Carlo experiments demonstrate their finite-sample performance relative to a baseline model. In our empirical study we find the importance of including both geographical and non-geographical information in capturing correlations in real house price growth in the United States.


INTRODUCTION
Correlation among economic variables over time and across sections has long been the focal point of empirical analysis. The notion of autocorrelation or autoregression is the traditional approach to capturing correlation in the time domain and it is also extended to the cross-sectional framework by the so-called spatial autoregression. 1 The paradigm spatial autoregressive (SAR) model incorporates the correlations among different units in econometric models through a spatial weight matrix that characterizes the strength of connections among regions, markets, stores, individuals, etc. A SAR model can be interpreted as a reduced-form regression of the outcome variable on its spatial lag via the weight matrix. For a given unit, its spatial lag can be understood as a weighted average across its possible 'neighbours.' As such, SAR models can be naturally extended to deal with panel data, when at each point of time there is a cross-sectional SAR. Further, dynamics can be straightforwardly introduced so that spatial dynamic panel models can be used when 'lagged' variables, in both time and space, appear on the right-hand side of a panel regression and these lags aim to account for both spatial and temporal correlations. Cross-sectional SAR and spatial dynamic panel models are widely applied to research in various fields of economics, including economic growth, public economics, social networks and real estate economics. 2 The approaches of quasi-maximum likelihood (QML), generalized method of moments (GMM), instrumental variables (IV), indirect inference (II), and Bayesian Markov chain Monte Carlo (MCMC) have been used to estimate these models. 3 Recently, there has been increasing attention in the literature to the estimation of spatial panels with heterogeneous coefficients. It is argued that on many occasions there are theoretical reasons to believe that the strength of spatial dependence would differ greatly across regions. LeSage and Chih (2018) propose the Bayesian MCMC approach to estimating heterogeneous spatial autoregressive (HSAR) panel models. LeSage et al. (2017) apply the Bayesian MCMC procedure to study duopoly pricing in the German gasoline retail market. Autant-Bernard and LeSage (2019) estimate the region-specific knowledge production functions based on a Bayesian HSAR model for 94 NUTS-3 regions in France. Aquaro et al. (2015Aquaro et al. ( , 2021 study the QML estimation of heterogeneous spatial autoregressive dynamic panel (HSARDP) models.
Despite the great interest in heterogeneous models, most current research assumes that the spatial weight matrix in the model is taken as given. However, sometimes there may be multiple candidate weight matrices that are relevant to the question under study. More importantly, different types of spatial correlation characterized by different weight matrices may have different signs and/or magnitudes. For example, sometimes we may want to incorporate information from both the firstand second-order contiguity matrices, where the first-order one considers direct geographical neighbours and the second-order one characterizes neighbours of direct neighbours. It is usually expected that the first-order spillovers from direct neighbours are relatively stronger than the second-order spillovers. Also, in some applications it may make more sense to use both spatial and non-spatial information to generate the spatial weight matrix (e.g., Debarsy & LeSage, 2022). For instance, when we examine the interactions of fiscal policies by local governments, the degree of connection between two regions may depend not only on how close the regions are but also on how similar they are in terms of various aspects of socio-economic conditions. To incorporate the information from multiple sources, Debarsy and LeSage (2022) discuss how to construct a spatial weight matrix using the convex combinations of different types of connectivity matrices. In their approach, the estimated convex combination weights also provide some implications on the relative importance of different information in determining the distance between two units. Lam and Souza (2020) specify the spatial weight matrix by adding a sparse adjustment matrix to the estimated best linear combination of different connectivity matrices based on the adaptive least absolute shrinkage and selection operator (LASSO).
In the heterogeneous framework, with the presence of multiple types of spatial correlation, it is natural and reasonable to allow for heterogeneity in the way that different sources of information are combined. For example, while geographical distance may matter equally for regions A and B, economic distance (or the degree of similarity of economic status) relative to other regions may matter a lot for region A, but not so much for region B, when each region is determining its spending on welfare programmes, depending on their industrial structures, labour market characteristics, government compositions and demographics.
The main contribution of this paper is to take both heterogeneity and multiple sources of spatial correlation into account and model them simultaneously. To achieve this, we propose two approaches. The first approach is to allow for the convex combination weights being heterogeneous across units when a single combined spatial weight matrix is used in HSARDP. We allow for different combination weights for the spatially lagged term and the spatial-temporally lagged term. Our second approach is to use a high-order HSARDP model to incorporate explicitly different information from multiple sources. Note that estimation and inference of the homogeneous version of high-order SAR models have been widely discussed in the literature (e.g., Gupta & Robinson, 2015;Han et al., 2017;Lee & Liu, 2010). However, incorporating heterogeneous spatial parameters allows for a more flexible pattern of spatial correlation, where the magnitudes and signs of different types of spatial spillover can vary across spatial units. Notably, the two approaches give rise to two mathematically equivalent models, but they have totally different implications. We use the Bayesian MCMC scheme for model estimation by deriving the full conditional distributions of various heterogeneous parameters, including the heterogeneous combination weights in the first approach. As an application, we apply our models to the real house price growth rates in 338 metropolitan statistical areas (MSAs) in the United States from 1975 to 2019. We find that when both geographical and non-geographical information are used to capture possible spatial correlations, our models find many more MSAs that display significant positive net spatial parameter estimates and more MSAs where income and population growths have significant positive spill-in impact estimates than otherwise would be found if only geographical information is used.
The remainder of the paper is structured as follows. The next section introduces two models that incorporate both heterogeneity and multiple sources of spatial correlation. We derive the full conditional distributions of various parameters, which are needed for us to implement the Bayesian MCMC cycle for model estimation. The third section presents results from some Monte Carlo experiments by comparing a baseline model and our models. The fourth section contains our empirical study of US housing price growth. The last section concludes.

MODEL SPECIFICATION AND ESTIMATION
The standard HSARDP can be written as, for i = 1, · · · , N , t = 1, · · · , T , where 1 it is the heteroskedastic idiosyncratic error term (with mean 0 and variance s 2 i ), y w it = N j=1 w ij y jt defines a composite variable consisting of weighted outcome from neighbouring units, c i and f i represent the spatial dependence parameters for unit i, capturing the contemporaneous and temporal effects on units i's outcome from its neighbouring units, respectively, l i is the first-order autoregressive parameter that relates unit i's outcome to its lagged value, b i is a k × 1 parameter vector associated with the exogenous regressors x it . Heterogeneity is modelled explicitly by indexing the parameters by i. In this set-up, a single spatial weight matrix is used on the assumption that we are sure about how to define neighbours based on a single measure of spatial contiguity or connectivity. On many occasions there may exist different ways to quantify the degree of connectivity and even if a single measure is agreed on, we may not be sure about what threshold value of this measure should be used to define neighbours. Thus, in this paper we propose two approaches in the presence of possibly multiple sources or measures of spatial correlation. The first is to combine possible spatial weight matrices and form a single weight matrix, where the combination weights are also allowed to be heterogeneous across units. The second approach is to use a higher order HSARDP model, where all possible weight matrices are used. In what follows, S = Dg(s 2 1 , · · · , s 2 N ), Dg( · ) denotes an operator that stacks diagonally its arguments in order such that it results in a matrix, dg( · ) with a matrix argument stacks in order its argument's diagonal elements to form a column vector, 1(·) is the indicator function that takes on value 1 if its argument is true, and 0 otherwise, and L is the lag operator that shifts a time-indexed element/vector one period backward.

HSARDP with weight matrices combined
Suppose there are in total q number of possible row-normalized weight matrices {W s } q s=1 . Correspondingly, the (i, j)-th entries of W s are denoted by w (s) ij . Now define y w it (g i ) = q s=1 g is N j=1 w (s) ij y jt , where 0 ≤ g is ≤ 1 and q s=1 g is = 1 are the combination weights. This constitutes for unit i a composite term from neighbouring units of q different natures with corresponding weights g is , s = 1, · · · , q. Note that the vector of weights g i = (g i1 , · · · , g iq ) ′ is heterogeneous across i. This means that a unit may view differently the importance of various neighbours. We also define a composite lagged variable y w ij y j,t−1 , where 0 ≤ d is ≤ 1 and q s=1 d is = 1, to allow for heterogeneous combination of neighbours that lag in both time and space. We treat g i and d i as two different sets of parameters, recognizing that the importance of various contemporaneous neighbours may be different from that of neighbours from yesterday. Under this specification, we have: which has the parameter vector u For the homogeneous SAR with a single convexly combined weight matrix from row-normalized weight matrices, Debarsy and LeSage (2022) point out that the weight matrices cannot be highly correlated and that the spatial parameter cannot be equal to zero for identification purpose. We expect that similar conditions on the weight matrices need to be imposed in our heterogeneous framework and that Pr (c i = 0) = 0 and Pr (f i = 0) = 0 for each i. It is beyond the scope of this paper to derive a full set of conditions for identification. When there are only two candidate weight matrices, however, it is relatively straightforward to show sufficient conditions for identification of the spatial dependence parameters and combination weights. Since g i affects only the i-th row of W (g), denoted by w i (g i ), and c i is the coefficient on w i (g i )y •t , where y •t = (y 1t , · · · , y Nt ) ′ , we can look at the identification issue for (c i , g ′ i ) ′ separately for each i. When q = 2, g i = (g i1 , 1 − g i1 ) ′ , and therefore we only need to identify (c i , g i1 ) ′ . Suppose (c i , g i1 ) ′ is not identifiable. In other words, there exist two different sets of (c i , g i1 ) ′ , which we respectively denote as (c (a) where w (s) i is the i-th row of W s . Since both weight matrices are row-sum normalized, by multiplying a column vector of ones to both sides of (3), we obtain: Therefore, we have c (a) We can then obtain g (a) i . In other words, as long as the i-th row of W 1 and W 2 are not identical, (c i , g i1 ) ′ is identifiable, assuming that both weight matrices are row-sum normalized and that the spatial coefficients are non-zero. Following the same procedure for (f i , d i1 ) ′ , we see that similar results hold. 4 Let i as its i-th row, C = Dg(c 1 , · · · , c N ), F = Dg(f 1 , · · · , f N ), L = Dg(l 1 , · · · , l N ) ′ and B = Dg(b ′ 1 , · · · , b ′ N ). Then we can write (2) compactly as: where A = I N − CW (g) and C = FW (d) + L. In (6), the matrix W (g) captures contemporaneous spatial effects from different sources, whereas spatial-temporal effects are absorbed in W (d).
Given the exogenous x •t , if we assume stationarity (so that y •t and y •t−1 have the same distribution), then E( and Cov(y •t , y •t−1 ) = CV. Note that the condition for stationarity is that the characteristic roots of the matrix A −1 C are within the unit circle. For a homogeneous spatial dynamic panel with one spatial weight matrix (q = 1), Elhorst (2012) discusses the implied parameter con- such that the characteristic roots of the matrix A −1 C are inside the unit circle. 5 (When we put h i [ H, it means that h i is in the relevant subspace of H. The same applies to g i [ H and d i [ H.) Thus, under the stationarity condition, we can put down the joint sample log likelihood function of y = (y ′ •1 , · · · , y ′ •T ) ′ as: where E(y) consists of (A − C) −1 Bx •t , t = 1, · · · , T and Y is an NT × NT tri-block-diagonal matrix with V, CV and VC ′ spanning its main diagonal, super-diagonal and sub-diagonal blocks, respectively. On the other hand, conditional on The joint sample log likelihood function of (y ′ •0 , y ′ ) ′ can then be put as: We could establish asymptotic results of the QML estimator that maximizes (7) or (8) following the lines of Aquaro et al. (2021) and a necessary condition for consistency of the estimator is that T 1, namely, the so-called large panels. 6 In the following we focus on instead the Bayesian approach of estimation and inference. Given the priors and likelihood function, we can derive the full conditional distribution for each set of parameters and start the MCMC cycle. Instead of using the likelihood function (7) that is based on the unconditional distribution, which involves the inverse of the NT × NT matrix Y, we treat y •0 as a vector of additional parameters and use the likelihood function (8).
For parameters indexed by i, we can assign the following prior distributions and they are independent across i. We assign a normal prior b For the autoregressive parameters c i , f i , and l i , we assign uniform (on the interval (−1, 1)) priors (subject to the stationarity parameter constraint). For the combination parameters, g i DIR(g 1 , · · · , g q ) and d i DIR(d 1 , · · · , d q ), namely, Dirichlet distributions with corresponding parameters. If no useful prior information regarding the combination weights is available, we can set uniform priors for g i and d i , namely, g 1 = · · · = g q = 1 and d 1 = · · · = d q = 1. Lastly, the prior distribution of y •0 is specified as N(Jx •0 , I N ), where J = Dg(j ′ 1 , · · · , j ′ N ). The prior mean and variance of y •0 are based on the derived unconditional mean and variance of y •t by imposing To facilitate the presentation of the full conditional distributions for each set of parameters in the MCMC cycle, we use the following notation to denote cross-sectional terms: Obviously, First, the full conditional distribution of each b i is: Next, the full conditional distribution of each s 2 i is: It follows that (s 2 i |y, This is not a standard distribution and we need to use the Metropolis-Hastings algorithm to sample. However, the standard Metropolis-Hastings algorithm used by LeSage and Chih (2018) turns out to be computationally inefficient for sampling h i . Hence, we are going to instead apply the adaptive Metropolis (AM) algorithm adopted by Han and Lee (2016), which specifies the covariance matrix of the proposal distribution based on the historical MCMC draws of h i . To be more specific, the proposal distribution under the AM algorithm of Roberts and Rosenthal (2009) for the r th iteration during the burn-in period is given as i ) denotes the historical draws from the previous r − 1 iterations, and D is the sample covariance matrix of the historical draws h [1:r−1] i . Therefore, after the first six iterations, the proposal distribution is a combination of two normal distributions, where the first component is based on the historical MCMC draws and the second component is aimed at preventing the procedure from producing a singular proposal covariance matrix. The combination weight may be set as z = 0.05 following Roberts and Rosenthal (2009). By making use of the information from historical draws, the AM algorithm can lead to more efficient mixing and faster convergence of the MCMC draws. After the burn-in period, the proposal distribution will no longer be adjusted and will be fixed as the proposal distribution used in the last burn-in iteration. Furthermore, the acceptance probability of a candidate h c i is calculated as: where h † i is the current draw of h i . The full conditional distribution of each g i is: The conditional distribution of each d i can be derived similarly. Again, since the full conditional distributions for each g i and d i are both non-standard, we resort to the AM algorithm. Lastly, the full conditional distribution of y •0 is: To summarize, starting from some initial values b (0) , s 2(0) , h (0) , g (0) , d (0) and y (0) •0 , we can implement the Gibbs sampler by sampling through the full conditional distributions of where R denotes the total number of simulations. 7 Point estimates for parameters are from the posterior means and 'significance' may be signified by the highest posterior density Bayesian credible intervals.

High-order HSARDP
Instead of using the combination of multiple connectivity matrices to specify a single spatial weight matrix, high-order SAR models treat these connectivity matrices as different spatial weight matrices. A high-order HSARDP can be written as follows: where . We can derive similarly the sample log likelihood function, either the unconditional or conditional one as in the previous subsection. The cross-section T × 1 vector of errors is defined accordingly as 2q+1) such that the characteristic roots of the matrix A −1 C are inside the unit circle. (When we put k i [ K, it means that k i is in the relevant subspace of K.) The vector of parameters in (16) can be collected as The cycle of the MCMC algorithm outlined in the previous section carries over to b i , s 2 i and k i for each i. When sampling k i conditional on y, k −i , b, s 2 , y •0 , we need to maintain the stationarity condition k i [ K. Mathematically, one can think of (6) as a restricted version of (16). In (6), the first term on the right-hand side is ij y jt and in (16) (Similar analysis applies to the second terms in (6) and (16) 1], spatial connections from all sources may matter and they have the same direction (determined by the sign of c i ) for each unit under (6). Yet, for the higher order HSARDP, the spatial parameters on different weight matrices for the same unit can have different signs.

Marginal effects
In addition to the point estimates obtained using the MCMC procedure, we may also be interested in estimating the marginal effects of exogenous explanatory variables on the outcome variable. Let B l = Dg(b 1l , · · · , b Nl ) and x •tl = (x 1tl , · · · , x Ntl ) ′ , for each l [ {1, · · · , k}, where b il and x itl denote the l-th elements of b i and x it , respectively, i [ {1, · · · , N } and t [ {1, · · · , T }. As discussed previously, under stationarity and given x •t , we can obtain E(y •t ) = (A − C) −1 Bx •t . The partial derivative matrix of E(y •t ) with respect to the values of l th explanatory variable is then: Again, note that the definitions of matrices A and C under HSARDP with combined matrices are different from those under the high-order one. The i-th diagonal element of M l , denoted by m (l) ii , estimates the direct effect of the l-th covariate on the i-th unit's outcome; the sum of all the elements of the i-th row, excluding m (l) ii , of M l estimates the total marginal effects on unit i's outcome of the l-th covariate from all the other units; the sum of all the elements of the i-th column of M l , excluding m (l) ii , estimates the total marginal effects of unit i's l-th covariate on the outcomes of all the other units. Following LeSage et al. (2017) and LeSage and Chih (2018), we can interpret them as the direct impact (DI), indirect spill-in impact (SII), and indirect spill-out impact (SOI), respectively. Numerically, we can stack these measures of impact of unit i's, i = 1, · · · , N , l-th covariate as N × 1 vectors, namely, dg(M l ), M l 1 − dg(M l ) and (1 ′ M l ) ′ − dg(M l ), respectively, where 1 is an N × 1 vector of ones. We can calculate them from each simulated draw and use the posterior means as point estimates of them. Similarly, as before, we signify their levels of significance using the highest posterior density credible intervals.

MONTE CARLO EXPERIMENTS
In this section we present the results from a Monte Carlo study to investigate the small-sample properties of the Bayesian estimator under three different model specifications, named as models A, B and C, respectively. Model A refers to a standard HSARDP as specified by (1); model B is an HSARDP with heterogeneous convex combinations of two weight matrices (see (2)); model C is a high-order HSARDP (see (16)). In the experiments we include a constant term and one exogenous regressor in all three model specifications. 8 Two weight matrices are considered for model B when they are combined as a single weight matrix and the same matrices are used for model C (which is now a second-order HSARDP): W 1 is a first-order contiguity matrix, where w (1) ij = 1 if j = i − 1 or j = i + 1, and 0 otherwise; W 2 is a second-order contiguity matrix where w (2) ij = 1 if j = i − 2 or j = i + 2, and 0 otherwise. Model A uses W 1 as its weight matrix. To make it comparable with models A and B, we define pseudoparameters under model C, namely, c i = c (1) i + c (2) i and f i = f (1) i + f (2) i , and report the average biases and root mean squared errors (RMSEs) of the estimated c i and f i across the three models. We set the number of replications to be 1000, and for each replication, we set the number of simulations for the MCMC sampler as 5000 with the first 2000 as the burn-in draws. We include results when T = 25, 50, 100 and N = 25, 50, 100 in this section and provide additional results when N = 200 in the supplemental data online. The true parameter values, values of the exogenous regressor and the error terms are redrawn in each replication of data simulation. We denote for each region i the constant as a i and the coefficient of the single exogenous regressor as b i . Moreover, we choose the values of prior hyperparameters such that the priors are relatively uninformative. 9 Table 1 lists parameter configurations we use across the three models.
Since in practice we do not know the data-generating process (DGP), we first discuss the issue of model selection before presenting the bias and RMSE results. We use the observeddata deviance information criterion (DIC) (Chan & Grant, 2016) for this purpose. In our model specifications, DIC is defined as DIC = D + p D , where D = −2E y •0 ,u [ ln P(y|y •0 , u)|y], p D = D −D andD = −2 ln P(y|ŷ •0 ,û).ŷ •0 andû denote the corresponding point estimates based on posterior means. D can be used to measure model fit, and p D is called the effective number of parameters, which can be used to measure model  Heterogeneous spatial dynamic panels with an application to US housing data complexity. Moreover, we approximate the expectation term in D using the average of ln P(y|y •0 , u) over all the posterior draws of y •0 and u. Table 2 presents the model selection results for each DGP-N − T combination. When the DGP is model A, we see that model A is strongly favoured over model C. Although DIC may not strongly favour model A over model B when T is small, it would perform significantly better as T goes up, and the true model is strongly favoured when T = 100. When the DGP is generated from model B or model C, DIC strongly favours the true model under all N − T combinations. Therefore, our results indicate that in practice when one is uncertain about the choice of weight matrices, whether being a single one of single source, or a single one of multiple sources combined, or multiple ones of multiple sources, the DIC seems to be a reliable model selection criterion. Table 3 contains the average bias and RMSE results when the DGP is model A. Under this specification, g i = d i = (1, 0) ′ in model B and c (2) i = f (2) i = 0 in model C. This experimental design corresponds to the scenario when a researcher may be using more than one measure of connectivity when in fact there is only one. Model C has 2N (q − 1) = 2N additional parameters relative to model A, pertaining to the second-order spatial and spatial-temporal coefficients, but has the same number of parameters as model B (N (2q + 1) + N (k + 1) = 3N + 2N (q − 1) + N (k + 1)). Thus, we would expect that for a given N − T combination, the RMSEs associated with the estimated c i and f i from models B and C are higher than those from model A. This is evidenced in Table 3. We also see increased RMSEs in the estimated l i , b i and s 2 i , but to a much less extent. Lastly, we observe that for a fixed N , when T goes up, the biases and RMSEs across the three models all go down. Table 4 reports the bias and RMSE when the DGP is generated according to model B. Immediately, we see that ignoring multiple sources of spatial correlation and using the baseline HSARDP leads to much higher average RMSEs, especially in terms of estimating c i , f i , a i and s 2 i and when T is relatively large. Comparing results from model B with those from model C, we see that the former estimates c i and f i with less bias and lower RMSE and also fares in general better in estimating other parameters.
We now turn to the situation when model C is the true one, as reported in Table 5. As before, by assuming only one source of spatial connectivity, model A estimates all the parameters with much higher RMSEs and gives significant bias in estimating s 2 i . Also, model C performs typically better than model B in estimating all the parameters, which is expected because model C is , respectively, where R = 1000 is the number of simulations.
, respectively, where R = 1000 is the number of simulations.
, respectively, where R = 1000 is the number of simulations. more flexible than model B, given that model C would allow the spatial correlations characterized by various weight matrices to have different signs.

AN EMPIRICAL STUDY
In this section we investigate the spatial-temporal dynamics in real house price changes among different MSAs in the United States. The data used are similar to those in Aquaro et al. (2021) and Yang (2021), but we have updated their quarterly data and the extended sample period runs from 1975:Q2 to 2019:Q4. The Freddie Mac House Price Index (FMHPI), consumer price index (CPI), population and nominal per capita income for 377 MSAs are used to construct the growth rates. They are de-factored and de-seasonalized using the same method as in Aquaro et al. (2021) with the aim of filtering out the effects of seasonal trends and national factors so that the estimated parameters only reflect the local influences and spillovers. For comparison purpose, the same baseline model as in Aquaro et al. (2021) is used, namely, model (1), where an adjacency matrix is used as the spatial weight matrix such that two MSAs are considered as neighbours if the distance between their centres is within 75 miles. With such a geographical distance threshold, 39 MSAs in the sample do not have any neighbours. Excluding these 39 MSAs, we have a panel of N = 338 and T = 179. The outcome variable y it is the quarterly growth rate in real house prices and the explanatory variables x it include quarterly growth rates of population and real per capita disposable income. Accordingly, we put where a i is for the constant term. This spatial weight matrix with a geographical distance threshold of 75 miles is denoted by W g1 .
In addition to W g1 , we consider two different candidate weight matrices, recognizing that spatial correlation may travel well beyond the 75-mile ring and it may also be of non-geographical nature. W g2 characterizes neighbours such that the corresponding entries are non-zero when their distances are greater than 75 miles but less than or equal to 150 miles. Therefore, this matrix represents a second distance ring of relatively more remote neighbouring areas. W n is constructed such that two MSAs are regarded as 'neighbours' if the correlation coefficient of their estimated residuals under the baseline model is significantly positive. In other words, W n would treat two MSAs as neighbours if there is some positive correlation in their local housing markets that is not captured by the geographical weight matrix W g1 . The significance of correlation coefficients is based on the testing procedure in Yang (2021).
We use models A-C to designate again the baseline model (1), HSARDP with combined matrices (2) and higher order HSARDP (16) (with q = 2), respectively. Under models B and C, we can choose two different sets of weight matrices: {W g1 , W g2 } and {W g1 , W n }. 10 For example, under model B, the estimated combination weights based on W g1 and W n would represent the relative importance of geographical and non-geographical information in specifying the spatial weight matrix. It can be expected that the spatial correlations characterized by different candidate weight matrices may have different magnitudes and/or signs, and that there may be heterogeneity in the relative importance of different types of spatial correlation.
In estimation, we continue to use the relatively uninformative priors as in the previous section. We use 12, 000 iterations in the MCMC cycle with the first 2000 discarded. Figure 1 presents the posterior means of estimated c i + f i under the baseline model, where (and also in Figures 2 and 3) MSAs in dark grey have positive estimates and those in light grey have negative estimates. 11 If the same colour scheme is used, Figure 1 will be very similar to that in Aquaro et al. (2021, fig . 1a), which represents their QML estimates for the net spatial parameters based on the same spatial weight matrix. Table 6 summarizes the estimation results under the baseline model and models B and C with two different sets of weight matrices. The baseline model shows that 153 of the 338 MSAs have significantly positive net spatial parameter estimates, while 33 have significantly negative i and tend to be more significant. When using W g1 and W g2 as the candidate weight matrices, models B and C provide comparable results and they are similar to those from model A. Although incorporating multiple geographical weight matrices based on different distance thresholds do affect the signs and/or magnitudes of net spatial parameter estimates for some MSAs, it seems that it does not change much the pattern of spatial correlation. The story is quite different, however, when we instead use W g1 and W n that incorporate both geographical and non-geographical information. Now there are many more MSAs with significantly positive c i + f i under model B or C compared with the baseline model. This makes sense because when we incorporate different types of spatial correlation based on both geographical and non-geographical information, we may be able to capture more interactions among local housing markets. It is highly possible that there are some unobservable economic, social or migration factors leading to spillover effects among local housing markets even though these markets are not close to each other in the geographical sense. In particular, we find that there are 54 MSAs whose net spatial parameter estimates are insignificant under the baseline model but become significantly positive under model C with W g1 and W n . When comparing model B with W g1 and W n against the baseline model, we find that 58 MSAs, including, for example, San Francisco-Oakland-Hayward, have significantly positive net spatial parameter estimates even though they are insignificant under model A. Intuitively, consider Los Angeles-Long Beach-Anaheim that is more than 150 miles from San Francisco-Oakland-Hayward. Given that they are of similar economic, social and demographic characteristics, one would naturally expect that the housing market of Los Angeles-Long Beach-Anaheim has some impact on that of San Francisco-Oakland-Hayward, even though they are not geographical neighbours. If we only use the spatial weight matrices based on geographical distance, we fail to capture the spatial correlation between the two markets. Figures 2 and 3 respectively present the estimates of net spatial parameters under models B and C when we use W g1 and W n as the candidate weight matrices. In comparison with Figure 1, which corresponds to results under the baseline model, we see that there is less heterogeneity across MSAs in the signs of c i + f i once we incorporate the non-geographical information.
Based on the DIC values from Table 6, we see that the baseline model is the least favoured and using both geographical and non-geographical information is strongly preferred to using only geographical information. Also, model C is slightly favoured over model B. This is expected because as shown in the previous section, model C tends to be more flexible than model B. 12 Table 7 reports the estimated marginal effects of income growth and population growth on house price growth. Notably, there are many more MSAs with significantly positive indirect spill-in impact estimates under models B and C with W g1 and W n compared with cases when only geographical information is used to capture spatial correlation, while the direct and spillout impact estimates are quite stable across different specifications. Recall that in this set-up a positive spill-in impact implies that increases in population and income in MSAs close-by and/or of similar characteristics can create upward pressure on an MSA's real house price. When we use only geographical distance to define neighbours, we would underestimate the number of neighbours that can exert positive pressure on an MSA's housing market.    It would be interesting to investigate the degree of heterogeneity in the combination weights if we can settle down with using model B. Figures 4 and 5 present the estimates of combination weights g i and d i under model B with W g1 and W n as the candidate weight matrices. 13 In Figures  4 and 5, for MSAs that are in dark grey, the g i (d i ) estimates are greater than 0.5, which means that geographical distance is more relevant for specifying the spatial weight matrix for the spatial lag (spatial temporal lag). In contrast, for the MSAs that are shown in light grey, the non-geographical distance is more relevant. We can see from Figures 4 and 5 that there is strong evidence of heterogeneity in the estimated combination weights across different MSAs. Notably, it seems that higher importance of non-geographical distance is usually the case in economically highly active areas, such as New York-Newark-Jersey City, Boston-Cambridge-Newton, Chicago-Naperville-Elgin, Dallas-Fort Worth-Arlington, Seattle-Tacoma-Bellevue, San Francisco-Oakland-Hayward, etc. Table 8 presents the group average estimates of combination weights by region under the same model specification. In particular, when specifying the spatial weight matrix for the spatial lag, non-geographical distance is relatively more important for MSAs in Mideast, New England and Southwest, where there are a number of economically highly active and densely populated MSAs. In contrast, geographical distance is relatively more important for MSAs in the Rocky Mountain region.

CONCLUSIONS
In this paper we introduce heterogeneity in the specification of spatial weight matrix into the heterogeneous SAR dynamic panel models by incorporating heterogeneous convex combinations of different connectivity matrices. A high-order model is also proposed as an alternative approach to incorporating multiple types of spatial correlation characterized by different weight matrices. We propose a Bayesian MCMC estimation procedure and conduct Monte Carlo experiments to compare the finite-sample performance of these two models relative to the standard model that incorporates only a single source of spatial correlation. It is found that the DIC performs very well in picking up the true model when one is uncertain about the choice of spatial weight matrices. When there are multiple sources of correlation in the true model, the baseline model can lead to substantially biased estimation results. We apply our extended models to study real house price changes among 338 MSAs in the United States, where two geographical and one non-geographical connectivity matrices are considered. Based on the model selection criterion DIC, models taking into account both geographical and non-geographical correlation are strongly preferred and they find many more MSAs that have significant positive net spatial parameter estimates and more MSAs whose income and population growths have significant positive spill-in impact estimates.
For future research, it may be interesting to impose Bayesian shrinkage on the heterogeneous coefficients or the combination weights on different spatial weight matrices. This would allow for heterogeneous selection of the most relevant spatial weight matrices for different spatial units. Most Bayesian LASSO papers focus on imposing shrinkage on the coefficients of explanatory variables (b), while only a small number of papers apply shrinkage to the spatial dependence parameters. Lam and Souza's (2020) approach allows for the selection of connectivity matrices in the homogeneous coefficient setting. Under the heterogeneous framework, a potential challenge is how to impose stationary condition on the spatial dependence parameters. We need to specify the shrinkage priors which can accommodate this condition. Most of the frequently used shrinkage priors are based on normal or beta distributions and there does not seem to be a straightforward way to take care of the stationarity condition.
suggest that model (2) may be subject to identification concerns when we have weight matrices that are highly similar. We thank a referee for bringing the issue of identification to our attention and the editor-in-chief for suggesting the additional simulations. 5 Although each g i contains q elements, there are only q − 1 parameters that need to be estimated given the constraint q s=1 g is = 1. Similarly, there are q − 1 parameters in each d i . 6 The large-T asymptotic result may be restrictive for many empirical researchers when dealing with micro-level data. Intuitively, large-T asymptotics are needed since there are so many parameters arising from unit-level heterogeneity. We thank a referee for pointing this out. Simulation results given in the third section and additional results in the supplemental data online suggest that the performance of the Bayesian estimator is not that sensitive to N . 7 When implementing the Gibbs sampler, we actually sample h i , g i and d i together in a single block to increase computational efficiency. This would allow us to evaluate the stationarity condition only once instead of three times for each i within each simulation, although it may take larger number of simulations for the sampler to converge. Moreover, unlike the other sets of parameters, we do not need to update y •0 for each i [ {1, 2, . . . , N } in each simulation because it does not vary across different units. 8 When simulating data, the values of exogenous regressor for each spatial unit i and time t are independently sampled from a standard normal distribution. 9 We set j i = 0, v 2 i = 100 and a i = b i = 0 for each i [ {1, · · · , N }. 10 It would be less intuitive if we choose a third set {W g2 , W n } since W n is based on the residuals from the baseline model using W g1 . There are 11 MSAs that have no neighbours as defined in the second geographical weight matrix W g2 . For these 11 MSAs, we set the corresponding weights g i2 = d i2 = 0 in model B and c (2) i = f (2) i = 0 in model C. Similarly, we did the same for the 15 MSAs that have no neighbours as defined in W n when the second set {W g1 , W n } is used for models B and C. 11 Figures for the heterogeneous estimates in this section are all generated from the R code of Aquaro et al. (2021). The sum c i + f i is of particular interest because it represents the net spatial effect from both contemporaneous and temporal components. Furthermore, the category 'No-Neigh' in the figure legends consists of the 39 MSAs without any neighbour as defined in W g1 . 12 When model B is a restricted version of model C, it implies that for each i, c (s) i , s = 1, · · · , q, are of the same sign, and that for each i, f (s) i , s = 1, · · · , q, also have the same sign. We may use the post-convergence posterior draws under model C to see what proportion of the draws having c (s) i and f (s) i satisfying these constraints for all i. For our empirical data, regardless of which of the two different sets of weight matrices is used, we get a p-value of 0 (in the sense that 0% of the draws meet these constraints). This is also consistent with the DIC result, which favours model C. As a referee pointed out, the DIC approach is more general and useful for comparing many different aspects of model specifications. 13 Given q = 2 and the requirement that weights sum up to 1, g i and 1 − g i are the weights attached to W g1 and W n , respectively, for the i-th MSA's composite term y w it on the righthand side of (2). Similarly, d i and 1 − d i are those for y w it−1 .