Bayesian analysis of a dynamic multivariate spatial ordered probit model

ABSTRACT Spatial econometrics has few studies on multivariate ordinal responses. This study proposes a dynamic multivariate spatial ordered probit (DMSOP) model, which is the first attempt to capture temporal and spatial dependencies simultaneously for multivariate ordinal responses. The parameters are calculated using Bayesian inference based on Markov chain Monte Carlo sampling. The DMSOP model performs effectively with the simulated data. Furthermore, the DMSOP model is applied to two response variables, namely, the life satisfaction and self-rated health of adults in 25 provinces in China. The empirical results show that the model can effectively measure the spatial and temporal dependencies for multivariate ordinal responses.


INTRODUCTION
When dependent variables are ordinal discrete responses, the ordered probit (OP) model is required.The traditional estimation method of OP models is the maximum likelihood (ML) method.The accuracy of the ML method is questionable when the sample size is small.Albert and Chib (1993) solved this problem by introducing latent continuous variables, which can be easily obtained using truncated normal distributions through Bayesian inference.Following Albert and Chib (1993), Bayesian inference and latent continuous variables have been widely applied to OP models (Chen & Dey, 2000;Hasegawa, 2010;Jeliazkov et al., 2008;LeSage, 1999).It should be noted that when Bayesian statistical inference is carried out for complex models, Bayes factor cannot be defined under improper prior distributions or Jeffreys-Lindley paradox (Li et al., 2014(Li et al., , 2015;;Li & Yu, 2012).
Additionally, many dependent variables are spatially dependent.For the sake of studying the spatial dependence of discrete response variables, in recent years the econometric models mainly focus on logit and probit models.Krisztin et al. (2022) proposed a Bayesian approach for estimating a (hierarchical) Bayesian spatial logit model for analysing land-use change and discussed the relevance of dealing with spatial dependence.Zhang et al. (2021) introduced the Bayesian spatial multinomial logit model to investigate spatial correlations in freeway crash injury severity.Khattak and Khattak (2021) used the Bayesian hierarchical logit model to capture the spatial and unobserved heterogeneity in consumer preference.Since McMillen (1992) first introduced the spatial probit (SP) models to address the spatial dependence for discrete responses, many studies have focused on the study of SP model.Skevas et al. (2022) used the Bayesian spatial autoregressive probit model to research how the role of peers impacted farmers' decision to adopt unmanned aerial vehicles (UVAs).Wei et al. (2021) introduced the Bayesian spatial random parameters logit model to research the spatial correlation of the vehicle crash severity.
Some researchers examined the ML method to obtain the parameter estimates of SP models (Anselin, 2013;Arbia, 2006;LeSage, 1999;Wang et al., 2013), but found them difficult to estimate (Beron & Vijverberg, 2004).LeSage (2000) introduced Albert and Chib's (1993) method in spatial econometrics and proposed the Markov chain Monte Carlo (MCMC) method to estimate the parameters of SP models.Furthermore, Smith and LeSage (2004) introduced a new Bayesian SP model wherein the spatial dependencies are addressed in the error using an autoregressive structure.All discrete responses of the above SP models are binary (0 and 1), and when ordinal, these models can be extended to the spatial ordered probit (SOP) model.
Besides spatial dependencies, temporal dependencies also widely exist.The dynamic or panel OP model is needed to capture this characteristic.Hasegawa (2009) proposed a Bayesian dynamic panel OP model and applied it to estimate the subjective well-being of Japanese consumers.In spatial econometrics, Kakamu and Wago (2007) introduced a Bayesian panel SP model, which Kakamu et al. (2010) used to estimate the regional business cycle in Japan.In the case of ordered responses, Wang and Kockelman (2009) extended Smith and LeSage's (2004) model to the dynamic spatial ordered probit (DSOP) model.
All the above-mentioned studies involve univariate models.In the case of multivariate response variables, Chen and Dey (2000), Chib and Greenberg (1998), Hasegawa (2010) and Jeliazkov et al. (2008) illustrated multivariate probit models with binary and ordinal response variables.In spatial econometrics, Baltagi et al. (2016) proposed a panel bivariate SP model, wherein the spatial dependencies are reflected in the latent dependent variables.
Following the dynamic OP model with spatial dependencies in Wang and Kockelman (2009) and models for multivariate ordinal response variables in Gao (2021), this study proposes a dynamic multivariate spatial ordered probit (DMSOP) model, wherein the spatial dependencies are explained using an additive error specification, as in Smith and LeSage (2004).To the best of our knowledge, the DMSOP model is the first attempt to capture temporal and spatial dependencies simultaneously for multivariate ordinal responses.
The remainder of the paper is structured as follows.Section 2 introduces the model specification for the DMSOP model.Section 3 explains details of the Bayesian inference for the model.Section 4 presents the model for the simulated dataset and its simulation results.Section 5 explains the empirical data and MCMC settings, and discusses the empirical results for the two ordinal response variables: life satisfaction and self-rated health.Section 6 concludes.

MODEL SPECIFICATION
This study proposes a DMSOP model, which is the first attempt to simultaneously capture spatial and temporal dependencies for multivariate ordinal responses.Here, a notation with four subscripts is needed to capture all the model's features.First, two subscripts are introduced to quantify the spatial dependencies, as in Smith and LeSage (2004).One subscript i is used to index spatial regions (i = 1, . . ., m), and the other subscript k is used to index individuals inside each region (k = 1, . . ., n i ).Therefore, the total number of observations is m i=1 n i = n.Second, one subscript t is introduced to index time periods, which is used to quantify the temporal dependencies, as in Wang and Kockelman (2009).Finally, the DMSOP model is proposed for multivariate ordinal responses, so we need to introduce one subscript s to index different response variables.
In the DMSOP model, the above four subscripts are required to index the observed response variable, that is, y ikst .Following Gao (2021) and Wang and Kockelman (2009), the observed response variable y ikst is ordinal and has the following relationship with the latent response variable z ikst : where l s indexes the classifications of the response variable y ikst .Additionally, the cut-points g s,l s are specified, as in Chen and Dey (2000): As in Smith and LeSage (2004), the latent response variable z in SP models is assumed to be affected by exogenous variables x, and after controlling for all the exogenous variables x, the residual d is assumed to be spatially autocorrelated.In dynamic model, the value of response variables z t may not only depend on contemporaneous exogenous variables x t , but also on their previous period's value z t−1 (at the same location and neighbouring locations).Similarly, if spatial autocorrelation still exists in the residuals d after controlling all the temporally lagged z t−1 and contemporaneous variables x t , the temporal and spatial dependencies can be described by the DSOP model proposed by Wang and Kockelman (2009).
In this study, the response variables are multivariate.Any response variable is assumed to be affected by itself in the last period, along with all other response variables in the last period.This assumption is reasonable and meaningful.For example, suppose the response variables are life satisfaction and self-rated health.These two responses are ordered and correlated.Life satisfaction is affected not only by contemporaneous exogenous individuals' conditions, but also by the last period's life satisfaction and the last period's self-rated health.
In this dynamic study, the initial time is set to t = 0, so all parameters including subscript 0 represent the parameter value at the initial time t = 0, rather than the real value of this parameter.Based on the specification of the temporal and spatial dependencies in the DSOP model (Wang & Kockelman, 2009), as well as the above assumptions of multivariate correlated response variables, the latent response variable z ikst in the DMSOP model is specified as follows: where l sg is the temporal coefficient between z ikst and z ikg(t−1) .S g=1 l sg z ikg(t−1) reflects the effects of all the variables in the last time period.Commonly, S g=1 |l sg | ≤ 1 is to guarantee the system's stability.x ikst = (x iks1 , x iks2 , . . ., x iksq s ) ′ is a q s -dimensional vector of the explanatory variables.Correspondingly, b s = (b s1 , b s2 , . . ., b sq s ) ′ is a q s -dimensional vector of the regression parameters for x ikst .d ist and j ikst represent regional and individualistic effects, respectively, for the sth response variable in period t.We assume that d ist is time-invariant, as in Wang and Kockelman (2009): for all t = 0, 1, . . ., T .
Therefore, (3) can be rewritten as: where d is represents all unobserved common effects for observations within region i for the sth response variable.In this DMSOP model, the response variables are the multivariate ordered variables of an observation, and the regional effect d is is used to capture all unobserved common effects for observations within region i.Therefore, the spatial coefficient r could be assumed to be constant for all s in the above spatial autoregressive model as in Gao (2021).The following spatial autoregressive model can be used when d is is spatially correlated: where r is the spatial coefficient, which reflects the degree of spatial influence.Weights w ij reflect the closeness between regions i and j.A commonly used definition is the queen contiguity.Weights w ′ ij are row normalized as in Gao (2021).Note that the 'island' situation (no contiguity region) is not suitable in this model, because it is very different from the data-generating process of spatially connected regions.The disturbance e is captures the regional effects that are not spatially correlated.We assume that e is follows an independent and identically distributed (i.i.d.) normal distribution as: Observations can be stacked by the multivariate ordinal responses (please refer to the detailed explanation in Appendix A.1 in the supplemental data online).Note that, the multivariate responses are not independent, and thus the individualistic effects j ikt = (j ik1t , . . ., j ikst , . . ., j ikSt ) ′ are assumed to be normal variates with zero means and a common covariance matrix V , that is: where V is an S × S symmetric positive definite covariance matrix.

with each
Bayesian analysis of a dynamic multivariate spatial ordered probit model 465 , and the other notations of the DMSOP model in matrix form are explained in Appendix A.2 in the supplemental data online.From ( 7) and ( 9), by defining Therefore, the density of d is: From ( 6) and ( 8), we can derive that z t has a multivariate normal distribution, that is: , and stack all the observations by time, then (6) can be expressed as: , and 1 T is a T -dimensional vector of ones.

PARAMETER ESTIMATION
3.1.Prior distributions of parameters Following Smith and LeSage (2004), we introduce the following prior distributions of b 0 , b and s 2 s : where IG(e 0 , g 0 ) denotes an inverse-gamma distribution with shape e 0 and scale g 0 .
In this study, to distinguish prior distribution from posterior distribution, p(•) is introduced to represent prior distribution.The beta prior distribution is as follows: where 1(•) is an indicator function.Moreover, when a 0 = b 0 = 1, the prior of r is a uniform distribution, and thus, this prior is more flexible than the uniform prior.
In this DMSOP model, the temporal coefficient l sg is a crucial and new parameter compared with the MSOP model (Gao, 2021).The property of the temporal coefficient l sg is similar to the spatial coefficient r, and thus uniform prior can be used.Furthermore, the prior of temporal coefficient in Wang and Kockelman ( 2009) is uniformly distributed.In view of the complexity of the joint posterior of the time coefficients and previous studies, the following uniform prior is used: In spatial econometrics, there are few Bayesian models for multivariate response variables.Therefore, this paper will draw on the prior setting for V in the non-spatial multivariate probit models as in Hasegawa (2010) and Jeliazkov et al. (2008); that is, where IW(n 0 , Q 0 ) denotes an inverse-Wishart distribution with degree of freedom n 0 and S × S scale matrix Q 0 .
Because g has the constraint of −1 = g s,0 ,

MCMC sampler
The joint posterior density function of the parameters is as follows: Given z and g, the ys are conditionally i.i.d.; and thus: where Based on the joint posterior density function and prior distributions, we can obtain the full conditional distributions (FCDs) of all parameters.The parameters are then estimated by MCMC sampling.The MCMC estimation scheme requires starting with the initial values of the parameters which we define b 0 0 , b 0 , d 0 , r 0 , s 0 s , V 0 , L 0 , z 0 and the cut-points g 0 .We then sample sequentially from the following set of FCDs of the parameters: (1) Sample b 0 , b and d using the multivariate normal distributions as explained in Appendix B.1-B.3, respectively, in the supplemental data online.(2) Because the FCD of r is not a standard distribution, we can use a Metropolis-Hastings (MH) algorithm to obtain the samples.A viable alternative to the MH step is a Griddy-Gibbs step, which has much better mixing and convergence properties than a MH step.Following Ritter and Tanner (1992), Smith and LeSage (2004) and LeSage and Pace (2009), we use the Griddy-Gibbs step to obtain the sampling values of r.The core of this method is to achieve a grid approximation to the conditional posterior for r first, then draw from this using inversion.For the detailed sampling process, see the descriptions in Appendices B.4 in the supplemental data online.
(3) Sample s 2 s using a Chi-square distribution as explained in Appendix B.5 in the supplemental data online.(4) Sample V using an inverse-Wishart distribution as explained in Appendix B.6 in the supplemental data online.(5) Based on the property of multivariate normal distribution, we can obtain the FCD of L, which is complex and cannot be used directly for sampling.We should extract the element l sg from L, and sample l sg individually by using a truncated normal distribution (for details, see Appendix B.7 in the supplemental data online).In each draw, the value of l sg needs to be limited to (−1, 1).Moreover, we only accept those l sg where S g=1 |l sg | ≤ 1 for s = 1, . . ., S. (6) Sample z ik0 , z ikt for t = 1, . . ., T − 1 and z ikT using the respective truncated normal distributions as explained in Appendix B.8 and B.9, respectively, in the supplemental data online.(7) The sample of g can be obtained using a uniform distribution (Wang & Kockelman, 2009).Cowles (1996) pointed out that the MCMC sampling of g using a uniform distribution is not stable.Following Cowles (1996), Chen and Dey (2000) offered a better way to obtain parameters g to remove the ordering constraint using the one-to-one map.In this paper, the sampling of g uses transformations as in Gao (2021) where k s,l s is a hyperparameter, which can be sampled using a MH algorithm as explained in Appendix B.10 in the supplemental data online.
After obtaining a sufficient number of sampling values, the last step of MCMC estimation scheme is to discard a certain number of initial sampling values as burn-ins, and retain the subsequent sampling values to obtain a stable posterior estimation of parameters.

Simulated dataset
We implemented a simulation study to demonstrate the validity and accuracy of the DMSOP model.We set the number of response variables S = 2 (y 1 and y 2 ) for ease of understanding and explanation.Additionally, we assumed that there are L 1 = L 2 = 5 ordered categories for the two response variables.
In the simulation study, the sample size and the number of time period may affect the estimated results in time series and spatial econometrics.Following Wang and Kockelman (2009), the time-series T is set at 7 (initial time is set at 0), the number of region m is set at 30.
Because there is a limitation to the total number of observations in the DMSOP model, and thus individuals in each region n i is set at 10 in different experiments.We assume that the 30 regions are located as shown in Figure 1.The weight matrix is generated based on queen contiguity.For example, region 15 is adjacent to regions 8, 9, 10, 14, 16, 20, 21 and 22.It is then rowstandardized.
In the simulated dataset, each response variable has q s = 2 explanatory variables.Different response variables may have a common set or subset of regressors, and thus the explanatory variables could be set the same for different responses as in Gao (2021).In an empirical study, different response variables may need different explanatory variables.Additionally, the model becomes more complex when the explanatory variables are different.Therefore, such variables are set to be different in the simulated dataset.They are generated using a standard uniform distribution, U(0, 1).Note that the simulation does not include the intercept.The explanatory variables are set to be different for the two response variables.b and b 0 are set the same in different experiments as follows: b = (0.25, 0.51, 0.33, 0.46) ′ , b 0 = (0.47, 0.50, 0.49, 0.53) ′ .
The vector of regional effects d is generated using the multivariate normal distribution: where B r = I ms − r(W ⊗ I S ), and spatial coefficient r is set as 0.3, 0.5 and 0.7 in different experiments.s 2 is set the same in different experiments as (0.1, 0.2) ′ .Because there are two response variables, the DMSOP model may be sensitive to the setting of covariance v 12 and v 21 : the greater the covariance, the stronger the interaction of the response variables.To verify the stability of the model, v 12 and v 21 are set at 0.01 (weak interaction), 0.25 (moderate interaction), and 0.9 (strong interaction) in different experiments.Thus, covariance matrix V = 0.10 0.01 0.01 0.10 , V = 0.50 0.25 0.25 0.50 , and V = 1.0 0.9 0.9 1.0 in different experiments.Based on the settings of V , the vector of individualistic effects j t is generated using the multivariate normal distribution j t N(0, I n ⊗ V ), for t = 0, 1, . . ., T .
There are three settings each for the spatial coefficient r, covariance matrix V , and temporal coefficient matrix L; and thus, there are 27 sets of simulation data, and each set include 30 * 10 * 2 * (7 + 1) = 4800 records (m = 30, n i = 10, S = 2, T = 7).In this study, the validity of the DMSOP model is tested through the different settings of the above parameters.If the model is valid for all the 27 datasets, the validity is passed.

Simulation results
In the simulation, we use the following less informative priors: b 0 b N(1, 10 12 I); , where IG denotes an inverse-gamma distribution, not an inverse Gaussian distribution.The posterior results are calculated using MCMC sampling.
In this study, the accuracy of estimation is tested by the error between the true value and the estimated value.The main parameters we focus on are b, L and r.The mean squared error (MSE) is used to describe their estimation accuracy as follows: where Y represent the final valid sampling number (total sampling numbersampling number in burn-in period), b (w) , l (w) sg and r (w) represent the final wth simulation results.When using the MCMC method, there is an obvious trade-off between accuracy and computation time, and this trade-off becomes more pressing when the sample size and number of choice variables increase.Table 1 shows the parameter estimation results of 3000 iterations and 11,000 iterations under three typical parameter settings: low setting, medium setting and high setting.The computation time of one experiment for 11, 000 iterations is approximately 20, 882 seconds.The computation time of 11, 000 iterations is 3.7 times that of 3000 iterations; however, there are no obvious improvements in simulation results between these two  1).For example, MSE all (the sum of MSE of all 22 estimated parameters listed in Table 1) only decreases from 0.0623 to 0.0620 for the low setting.
The convergence diagnosis is another method to demonstrate the required number of iterations.Figure 2 shows the convergence diagnosis of three main parameters by using trace plot, ergodic mean plot and autocorrelation function plot for 3,000 and 11, 000 iterations, respectively.Through comparison, it can be seen that the convergence diagnosis of 3,000 iterations and 11, 000 iterations is consistent, so the Markov chain under 3,000 iterations is also convergent.
Finally, each MCMC simulation is run for 3, 000 iterations for trade-off, and the first 1,000 samples are discarded as the burn-in period, because the sampled parameter distributions appear to stabilize after 1,000 iterations.Thus, the parameters are calculated based on the final 2,000 runs.
In addition to the number of iterations, the sample size also affects the computational time.Furthermore, the sample size may have an impact on the estimation accuracy.simulation results with different sample sizes, from which it can be seen that the estimation bias for n = 600 is substantially lower than that for n = 300.For example, when V = 0.10 0.01 0.01 0.10 , L = 0.15 0.03 0.015 0.12 , and r = 0.3, MSE all decreases from 0.0623 to 0.0385.Thus, the larger the sample size, the higher the accuracy of the estimate.Table 3 gives partial simulation results of the posterior bias and true values as well as the MSE of b, L and r.Tables C.1-C.3 in Appendix C in the supplemental data online provide posterior means and true values as well as the MSE of all the estimated parameters.
From Table 3, as the covariance matrix V increases, MSE b , MSE L and MSE r tend to be larger.In other words, the magnitudes of coefficients tend to exhibit greater bias.For example, when L = 0.15 0.03 0.015 0.12 and r = 0.3, with the increase of V , MSE b increases from 0.0047 to 0.0147, MSE L from 0.0009 to 0.0061, and MSE r from 0.0275 to 0.0335.One reason for this phenomenon is that as the covariance matrix increases, the interaction of latent response variables is enhanced, which adds uncertainty to the model, leading to greater bias.
As the spatial coefficient r increases, MSE b and MSE L tend to increase; that is, the magnitudes of coefficients b and L tend to exhibit greater bias.However, a noteworthy tendency is apparent; as the spatial coefficient r increases, MSE r tends to decrease.For example, when V = 1.0 0.9 0.9 1.0 and L = 0.45 0.09 0.045 0.36 , with the increase of r, MSE b increases from 0.0129 to 0.0150 and MSE L increases from 0.0096 to 0.0148; however, MSE r decreases from 0.0318 to 0.0128.Therefore, the model can accurately capture high spatial dependency.

Figure 1 .
Figure 1.Location of regions in the simulated dataset.

Table 1 .
Simulation results of 3000 iterations and 11,000 iterations.
settings (Table Table 2 gives the

Table 2 .
Simulation results of 300 and 600 samples.Posterior bias is the difference between the posterior mean and the true value.

Table 3 .
Simulation results of partial parameters.
Note: a Posterior bias is the difference between the posterior mean and the true value.

Table 5 .
Summary statistics of data.
Note: a The value in (•) denotes the relative frequency.

Table 6 .
Posterior estimates of the parameters.