An urban cellular automata model based on a spatiotemporal non-stationary neighborhood

Abstract Spatiotemporal modeling has long been a major concern of geographic information science. Even though previous research has shown the importance of temporal and spatial information in quantifying the neighborhood effects of urban cellular automata (CA) models, constructing a spatiotemporal non-stationary neighborhood remains a challenge, due to the complexity of the spatiotemporal models. In this study, we introduced spatiotemporal modeling into the neighborhood of an urban CA model and constructed a geographically and temporally weighted neighborhood (GTWN). A corresponding approach to optimizing the bandwidth of the GTWN was also developed. Taking Beijing and Wuhan in China as examples, the GTWN-CA model was employed to simulate their urban expansion. The experimental results indicate that the GTWN-CA model has a better and performance than other CA models whose neighborhood is constructed based on the assumption of temporal or spatial stationarity, highlighting the advantages of spatiotemporal modeling in quantifying the neighborhood effect. Compared with the commonly used CA model with a homogeneous neighborhood (HON-CA), in terms of the figure of merit (FoM), the calibration accuracy of the GTWN-CA model was improved by 0.87% in Beijing and 5.4% in Wuhan, and the validation accuracy was improved by 7.9% in Beijing and 8.9% in Wuhan.


Introduction
World urbanization trends have been irreversible since the second half of the 20th century (Chen et al. 2013).In many rapidly urbanizing areas, disorderly urban expansion has led to a series of serious issues, such as traffic congestion, environmental pollution and resource constraints (Miller and Hutchins 2017).Urban expansion models provide a scientific tool for preventing and controlling potential risks brought about by future urbanization (Barreira-Gonz� alez et al. 2019, Li et al. 2020).Cellular automata (CA) models with a bottom-up mechanism can simulate the dynamic processes of complex systems and have been widely used to simulate urban expansion (Tobler 1970, White and Engelen 1993, Batty and Xie 1994, Li and Gong 2016, Liu et al. 2021).
In recent decades, many scholars have worked to improve the CA models for urban expansion simulation, to obtain more realistic simulation results (Clarke et al. 1997, Couclelis 2006, Liu et al. 2008, Feng et al. 2011, Li et al. 2013, Zhang et al. 2023).Transition rules are the core of CA models, and their calibration is the focus of CA modeling research (Sant� e et al. 2010, Zhang andWang 2022).The transition rules include four parts: development suitability, neighborhood effect, land-use constraints and random disturbances (Blecic et al. 2015, Zeng et al. 2023b).Among these parts, development suitability and the neighborhood effect are hot topics in urban CA modeling research.Urbanization usually occurs in areas suitable for development.The accurate characterization of development suitability can improve the simulation accuracy of the model.Many scholars have employed different models to extract the relationship between urban development and various spatial factors to calculate the development suitability (Al-Kheder et al. 2008, Arsanjani et al. 2013, Garc� ıa et al. 2013, Chen et al. 2020).These models excel at extracting geographical laws from samples (Zhang and Xia 2022).In this way, geographical laws are incorporated into the CA model, which significantly improves effectiveness of the model.In addition, the agglomeration effect is a fundamental factor that leads to the formation and expansion of cities (Han et al. 2019).The neighborhood effect refers to the local interaction between the central cell and surrounding cells, characterizing the local agglomeration effect of the urban expansion process.(Clarke et al. 1997, Liao et al. 2016, Zeng et al. 2023a).Calibration of the neighborhood effect can be divided into the definition of the neighborhood configuration and its quantification (White andEngelen 2000, Zhang andWang 2021).A homogeneous neighborhood (HON) is the simplest neighborhood configuration, which assumes that all the surrounding cells within the neighborhood have the same impact on the central cell.By conducting a sensitivity analysis for the neighborhood size, the neighborhood effect of a HON can be accurately quantified (Almeida et al. 2008, Wang et al. 2021).Although a HON is simple and easy to implement, its definition relies on the assumption of spatial stationarity, which is often difficult to support in geographic research (Brunsdon et al. 1998).For example, compared to cells that are closer to the center cell, cells that are farther from the center cell typically have less impact on them.Consequently, scholars have integrated geographical laws into their neighborhoods to further calibrate the neighborhood effect.
Spatial heterogeneity is an essential feature of land-use change (Feng & Tong 2019).Some studies have broken the assumption of spatial stationarity and constructed spatially heterogeneous neighborhoods (Liao et al. 2014, Feng & Tong 2019).Among these different neighborhoods, the distance decay neighborhood constructed based on the first law of geography is considered to be able to characterize the neighborhood effect more accurately (ZiaeeVafaeyan et al. 2018).However, due to the complexity of its quantification, the distance decay neighborhood has always been difficult to apply widely.Consequently, Zeng et al. (2023a) proposed a method to quantify the distance decay neighborhood, which revealed the decay trend of the neighborhood effect in space.However, these studies only focused on the spatial variation of the neighborhood effect, ignoring the temporal non-stationary nature of the neighborhood effect.Li et al. (2020) proposed a trend-adjusted neighborhood which considers the historical pathway of urban expansion and uses the information of the temporal context of cells to calibrate the neighborhood effect, successfully improving the CA model.However, the trend-adjusted neighborhood still lacks an objective and scientific method to quantify it, and it ignores the spatial variation of the neighborhood effect.
Location and time are important determinants of urban development.Based on the existing research, we can conclude that both spatial and temporal information are crucial for the calibration of the neighborhood effect.Location is an important factor because of the spatial dependence between land use.For example, urban development often takes place near existing urban land.According to the first law of geography, this spatial dependence decreases as distance increases (Tobler 1970).In addition, satellite observations have confirmed that urban expansion often follows its historical pathway, and recently developed areas are more likely to continue to be developed (Gong et al. 2019;Huang et al. 2020).Consequently, the temporal information of urban cells should be considered in urban CA models.Integrating spatial and temporal information into the neighborhood is the key to further calibrating the neighborhood effect.However, it remains a challenge to construct a spatiotemporal non-stationary neighborhood, due to the complexity of the spatiotemporal models.In fact, currently, there is still a lack of a modeling method that can simultaneously integrate temporal and spatial information into the neighborhood.
The research by Li et al. (2020) suggests that, in the neighborhood of urban CA models, urban cells that were developed in more recent years have larger impacts than those developed in earlier years.Consequently, we integrated temporal information into the neighborhood and constructed a temporally weighted neighborhood (TWN).In the TWN, newly developed cells have larger impacts on the central cell, and these impacts decrease as the temporal distance increases.To integrate both temporal and spatial information into the neighborhood, we also constructed a geographically and TWN (GTWN), where the temporal distance is coupled with the spatial distance in the distance decay neighborhood.To verify the effectiveness of the GTWN, we used a GTWN-based CA model to simulate the urban expansion of Beijing in China.For the sake of comparison, the distance decay neighborhood is referred to as the geographically weighted neighborhood (GWN) in the subsequent content of this article.

Cellular automata (CA) model
We employed a constrained CA model in this study (Li and Yeh 2000, Huang et al. 2009, Zhang and Xia 2022).To avoid the influence of randomness on the experimental results, a random term was not considered.The synthesized probability of the model can be described as: where P t i is the synthesized probability of cell i converting to an urban cell at the t-th iteration; S i is the development suitability of cell i; N t i is the neighborhood effect of cell i; and C i represents the land-use constraints of cell i.In this study, we considered that a water body cannot be developed into urban land.The actual year, half a year, quarter or month can be considered as an iteration of the CA model.The research by Li et al. (2013) has shown that increasing the iterations introduces additional errors.Furthermore, if the number of iterations is too small, this can result in an excessive number of cells being developed in each iteration, and many cells with a probability of 0 will also be developed, leading to simulation errors.Consequently, referring to the existing research (Zhang andXia 2022, Wang et al. 2021), we conducted the simulation on a semi-annual basis, i.e., 20 iterations from 2000 to 2010 and 10 iterations from 2010 to 2015.

Artificial neural network (ANN) model
An artificial neural network (ANN) consists of multiple layers of neurons and is designed by mimicking the structure of biological neurons (Schmidhuber 2015).ANN models are commonly used to obtain the development suitability for the CA simulation of urban expansion (Basse et al. 2014, Huang et al. 2009, Yang et al. 2019).The architecture of the ANN model used in this study is shown in Figure 1.The input layer has 10 neurons representing 10 driving factors.The hidden layer has three layers, including 32, 16 and 16 neurons.Before the neurons in each hidden layer are input to the next layer, they need to be normalized and non-linear transformed.Normalization makes the network training stabler and smoother (Santurkar et al. 2018).A linear rectification functionthe rectified linear unit (ReLu) -was selected as the activation function.To avoid overfitting, a dropout layer that randomly discards 20% of the network was set in the last hidden layer.Finally, the output layer has 1 neuron.During training, 80% of the samples were used for the training and 20% for the validation.The adaptive moment estimation (Adam) optimization algorithm was selected as the optimizer (Kingma and Ba 2014).The learning rate was set to decrease with the increase of the epochs.When epochs � 30, the learning rate was 1e-3; when epochs >30, the learning rate was 1e-4.The weight decay was set to 1e-4 and the batch size was set to 128.

Neighborhood configuration
From Section 2.1, it can be seen that the neighborhood is an important component of the CA model, and different neighborhood configurations will generate different neighborhood effects, which will affect the simulation results of the CA model.In the following, we introduce the four neighborhood configurations considered in this study.

Homogeneous neighborhood (HON)
In a HON, all the neighborhood cells have the same impact on the central cell.The neighborhood effect can be calculated as: where t represents the tth iteration, and t ¼ 1, 2, … , iters; U i t is the number of urban cells in the neighborhood of cell I at the tth iteration; and n is the neighborhood size.

Geographically weighted neighborhood (GWN)
In a GWN, the impact of the neighborhood cells on the central cell is a function of their geographic location, which can be described as: weight S ðu, vÞ ¼f ðDiss ðu, vÞ Þ (3) Diss ðu, vÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ðu − iÞ 2 þ ðv − jÞ 2 q (4) where weight S ðu, vÞ is the weight at location ðu, vÞ; Diss ðu, vÞ is the Euclidean distance from location ðu, vÞ to the central cell; ði, jÞ is the location of the central cell; and f ðÞ is a kernel function, which is used to map the relationship between distance and weight.According to the first law of geography, the farther away a cell is from the central cell, the less impact it has on the central cell.Based on this, a distance decay function, such as a Gaussian decay function or a bi-square decay function, is often used as the kernel function.Referring to the research of Zeng et al. (2023a), the bi-square decay function is usually able to better characterize the neighborhood decay effect in space and has high robustness.Consequently, the bi-square decay function was selected as the kernel function of the GWN, which can be described as (5) where Diss represents the Euclidean distance; and bws represents the spatial bandwidth, which determines the decay rate of the kernel function with distance.The smaller the bandwidth, the faster the decay rate.The construction process of the GWN is shown in Figure 2.

Temporally weighted neighborhood (TWN)
In a TWN, the impact of the neighborhood cells on the central cell is a function of their development time.Since the GWN can abstract geographic information into the spatial distance, the TWN can also be constructed by abstracting temporal information into the temporal distance, which can be described as:  2015), the Gaussian distance decay function was selected as the kernel function of the TWN, which can be described as: where Dist represents the temporal distance, and bwt represents the temporal bandwidth.The construction process of the TWN is shown in Figure 3.

Geographically and temporally weighted neighborhood (GTWN)
In the above, we described how the GWN and TWN are built.The construction of the GTWN is more complex because time and space are in different dimensions, and an appropriate mathematical model needs to be constructed to combine them, i.e., where � can represent different operators.Referring to the method of the GTWN model, Equation (10) shows an example of a spatiotemporal weighting function: where f S ðÞ and f T ðÞ are the spatial kernel function and temporal kernel function, respectively.In this study, the bi-square decay function was selected as the spatial kernel function, and the Gaussian decay function was selected as the temporal kernel function.The construction process of the GTWN is shown in Figure 4.

Bandwidth optimization
For geographic spatiotemporal weighted models, bandwidth optimization is a crucial step.However, how do we determine the most appropriate bandwidth?For the CA model of urban growth, the optimal bandwidth is the bandwidth that can most accurately simulate urban expansion.Consequently, our optimization idea is to optimize the bandwidth using the optimization model, with the simulation accuracy of the CA model as the optimization goal and the bandwidth as the decision variable.Genetic algorithms (GAs) have been widely used in optimization problems (Deb andTiwari 2008, Perez et al. 2018).Consequently, a GA was employed to optimize the spatial and temporal bandwidths of the GWN, TWN and GWTN.As both the GWN and TWN can be considered as special cases of the GTWN, we take the GTWN as an example here to illustrate the bandwidth optimization process.The figure of merit (FoM) can used to evaluate the simulation accuracy of a CA model.Consequently, we used the FoM to characterize the fitness of the individuals in the GA model.The calculation process for the fitness is shown in Figure 5.The fitness of each individual can be obtained by inputting its spatiotemporal bandwidth into the model.The architecture of the GA model is shown in Figure 6 and its specific steps can be summarized as follows: 1. Initialize population.Generate a set of initial solutions.Each solution contains two decision variables, i.e. the spatial bandwidth and temporal bandwidth.2. Calculate the fitness.3. Obtain the offspring population.The excellent individuals in the population are selected for recombination and mutation to obtain the offspring population.4. Calculate the fitness of the offspring population. 5. Merge the offspring population and the parent population, and select individuals with high fitness for the next generation of the population.6.Judge whether the termination conditions are met.If not, the population will continue to be optimized as the parent population.If the conditions are met, the optimal individual in the population will be output.The termination conditions are that the maximum iteration is reached or the evolution stops.
To simplify the computational complexity, the temporal bandwidth is limited to a positive integer.After referring to the existing research and conducting multiple experimental tests, the population number was set to 10 and the recombination probability and the mutation probability were set to 0.7 and 1, respectively (Zeng et al. 2023a).

Figure of merit (FoM)
The FoM is a widely recognized indicator that can accurately measure the simulation accuracy of CA models (Tong and Feng 2020).It is calculated by the ratio of the intersection of the observed change and simulated change to their union: where FN represents the number of false negative cells, i.e. observed as urban but simulated as non-urban; TP represents the number of true positive cells, i.e. observed as urban and simulated as urban; and FP represents the number of false positive cells, i.e. observed as non-urban but simulated as urban.The larger the FoM, the higher the simulation accuracy of the model.

Kappa
The kappa is calculated as: where p o is the prediction consistency, namely, the proportion of accurately projected cells to total cells in the study area; p e is the random consistency; T 1 and T 0 represent the number of urban and non-urban cells in the simulation, respectively; N 1 and N 0 represent the number of urban and non-urban cells in the observation, respectively; and n represents the number of total cells in the study area.The larger the kappa, the higher the simulation accuracy of the CA model.

Experimental process
Figure 7 shows the flowchart of the experimental process.First, the spatial variables were input into the ANN model for training, to obtain the development suitability.
Then, through the sensitivity analysis, the optimal neighborhood size for the HON was found and the GA model was used to optimize the remaining three neighborhood configurations.Next, the CA models based on the different neighborhood configurations were used to simulate urban expansion.Finally, we evaluated the simulation results to identify the optimal neighborhood configuration.

Study areas and data
Beijing (115 � 20 0 -117 � 30 0 E, 39 � 28 0 -41 � 05 0 N), located in the north of China, is the capital of China.The total area of Beijing is 16410 km 2 , with 62% mountainous area and 38% plain area.The terrain of Beijing is high in the northwest and low in the southeast, with an average altitude of 43.5 m.Wuhan (113 � 41 0 �115 � 3 0 E, 29 � 58 0 �31 � 22 0 N), located in central China, is the capital of Hubei province.The total area of Wuhan is 8569.15km 2 , with the terrain mainly composed of plains, accounting for 81.9% of the total area.As Beijing and Wuhan have experienced rapid urbanization in the past 20 years, they are ideal study areas for urban expansion research (He et al. 2008, 2011, Yang et al. 2018, Zhang et al. 2022).Figure 8   of which are considered to have a significant impact on urban expansion (Agyemang and Silva 2019, Wang et al. 2013, Zhang and Xia 2022).The details of the spatial variables are provided in Table 1 and Figure 9.All the spatial variables were normalized to [0,1].

Implementation program
We constructed CA models based on the four different neighborhood configurations, namely, HON-CA, GWN-CA, TWN-CA and GTWN-CA.First, the urban expansion laws from 2000 to 2010 were extracted to calibrate the CA models, including employing the ANN model to obtain the development suitability, conducting a sensitivity analysis for the neighborhood size of the HON-CA model, and optimizing the bandwidths of the weighted neighborhoods.We then employed the calibrated CA models to simulate the urban expansion between 2010 and 2015.Finally, the performance of the CA models was evaluated with regard to both the calibration accuracy and validation accuracy.

The development suitability
After many attempts, a sampling rate of 5% was selected for the sampling, with positive samples being developed cells and negative samples being undeveloped cells.
The samples were input into the ANN model for training.

Sensitivity analysis for the neighborhood size of the HON-CA model
We tested the simulation accuracy of the HON-CA model in the range of neighborhood sizes from 3 to 21.The results are shown in Figure 12.For both Beijing and Wuhan, the simulation accuracy decreases as the neighborhood size increases, and the optimal neighborhood size is 3. Consequently, for the HON-CA model, the optimal local interaction domain of the central cell is a square with a side length of 90 m.In this study, the impervious surface data were used for the research, and the expansion of impervious surfaces was approximated to the urban expansion.Many studies have found that increasing the neighborhood size appropriately can allow the central cell to more fully accept the peripheral information and improve the simulation accuracy.(Liao et al. 2016, Zhang et al. 2022, 2023, Zeng et al. 2023a).However, the impervious surface data exhibit a different pattern.The reason for this is that the urban patches in the impervious surface data are very fragmented, with a large number of small independent patches, and a large neighborhood is not good for simulating subtle spatial changes (Zhang et al. 2022).

Bandwidth optimization for the GWN-CA model
To find the optimal spatial bandwidth of the GWN-CA model, we coupled the GWN-CA model with the GA model.Based on the result of the neighborhood sensitivity analysis for the HON-CA model, we noted that the scope of the local interaction domain was very small, so the search range for the spatial bandwidth was set to [0,10].The optimization result is shown in Figure 13.For Beijing, the model converges in the 9th generation, with an objective value of 0.2853 and an optimized spatial bandwidth of 2.1890.For Wuhan, the model converges in the 8th generation, with an objective value of 0.2343 and an optimized spatial bandwidth of 2.0118.Figure 14 shows the relationship between the weight and spatial distance in the GWN at this time.Consequently, in the GWN-CA model of Beijing, the optimal local interaction domain of the central cell is a circle with a radius of 65.67 m.The neighborhood effect decays from 1 to 0 in the range of 65.67 m.In the GWN-CA model of Wuhan, the optimal local interaction domain of the central cell is a circle with a radius of 60.35 m.The neighborhood effect decays from 1 to 0 in the range of 60.35 m.In line with the results of the HON, the local interaction domain of the GWN is also very small, and the neighborhood effect decays sharply with the increase of the spatial distance.

Bandwidth optimization for the TWN-CA model
To find the optimal temporal bandwidth of the TWN-CA model, we coupled the TWN-CA model with the GA model.For the TWN, in the spatial dimension, we still used a HON.Therefore, there were two decision variables for the TWN-CA-GA model, namely, the neighborhood size and the temporal bandwidth.For the neighborhood size, the search range was set to [3,21].Considering that the earliest development time in the land-use data was 1985, which was 30 years from 2015, we set the search range for the spatial bandwidth to [0, 30].The optimization result is shown in Figure 15.For Beijing, the model converges in the 9th generation, with an objective value of 0.286.After optimization, the temporal bandwidth is 29 and the neighborhood size is 3.For Wuhan, the model converges in the 9th generation, with an objective value of 0.2471.After optimization, the temporal bandwidth is 11 and the neighborhood size is 3.
Figure 16 shows the relationship between the weight and temporal distance in the TWN at this time.Consequently, in the TWN-CA model of Beijing, for the spatial dimension, the optimal local interaction domain of the central cell is a square with a side length of 90 m.For the temporal dimension, the neighborhood effect decays slowly over the temporal distance.Within 30 years, the neighborhood effect decays from 1 to about 0.6.In the TWN-CA model of Wuhan, for the spatial dimension, the optimal local interaction domain of the central cell is a square with a side length of  90 m.For the temporal dimension, the neighborhood effect decays faster than in Beijing over the temporal distance.Within 30 years, the neighborhood effect decays from 1 to about 0.

Bandwidth optimization for the GTWN-CA model
To find the optimal temporal bandwidth of the GTWN-CA model, we coupled the GTWN-CA model with the GA model.The search range for the spatial bandwidth was set to [0,10] and the search range for the temporal bandwidth was set to [0,30].
The optimization result is shown in Figure 17.For Beijing, the model converges in the 8th generation, with an objective value of 0.291.After optimization, the temporal bandwidth is 18 and the spatial bandwidth is 1.4168.For Wuhan, the model converges in the 16th generation, with an objective value of 0.2489.After optimization, the temporal bandwidth is 7 and the spatial bandwidth is 1.89.Figure 18 shows the relationship between the weight and temporal and spatial distance in the GTWN at this time.Consequently, in the GTWN-CA model of Beijing, for the spatial dimension, the optimal local interaction domain of the central cell is a circle with a radius of 42.5 m.The neighborhood effect decays from 1 to 0 in the range of 42.5 m.For the temporal dimension, the neighborhood effect decays faster than that of the TWN of  Beijing.Within 30 years, the neighborhood effect decays from 1 to about 0.3.In the GTWN-CA model of Wuhan, for the spatial dimension, the optimal local interaction domain of the central cell is a circle with a radius of 56.7 m.The neighborhood effect decays from 1 to 0 in the range of 56.7 m.For the temporal dimension, the neighborhood effect decays faster than that of the TWN of Wuhan.Within 30 years, the neighborhood effect decays from 1 to about 0. Compared with the GWN and TWN, in both Beijing and Wuhan, the spatiotemporal bandwidth of the GTWN decreases, indicating that, after coupling, the neighborhood effect decays faster in space and time.

Comparative analysis of the simulation accuracy of the four CA models
Through the above work, we calibrated the four CA models.In this section, we describe how we employed the calibrated CA models to simulate the urban expansion from 2010 to 2015, to verify the models' ability to simulate future urban expansion.Three indicators -FoM, kappa, and OA -are used here to characterize the simulation accuracy.commonly used neighborhood configuration, due to its simple architecture, but the simulation accuracy of the HON-CA model is low.Compared with the HON-CA model, for Beijing, the FoM of the GTWN-CA model is improved by 0.87% in calibration, and the FoM is improved by 7.9% in validation.For Wuhan, the calibration accuracy of the GTWN-CA model is the highest, and its validation accuracy is slightly lower than that of the TWN-CA model.This outcome can be attributed to the fact that the optimal spatial bandwidth of the GTWN is relatively small, measuring only 1.89.A smaller spatial bandwidth implies that the influence of spatial information on the GTWN is comparatively weaker.Consequently, although the GTWN-CA model demonstrates a superior performance in the calibration phase, it may not be as effective in transferring this accuracy to the validation phase, due to the limited impact of the spatial information.Compared with the HON-CA model, the FoM of the GTWN-CA model is improved by 5.4% in calibration, and the FoM is improved by 8.9% in validation.Due to the temporal non-stationarity of urban expansion patterns, a decrease in simulation accuracy can be expected when using a CA model to predict future urban expansion, compared to its performance during the calibration process (Wang et al. 2021).However, the decrease in validation accuracy for the TWN-CA and GTWN-CA models is comparatively smaller than that for the HON-CA and GWN-CA models.Consequently, the incorporation of temporal information within the neighborhood is shown to be beneficial in simulating future urban expansion.Moreover, in most cases, the simulation accuracy of the GTWN-CA model is higher than that of the TWN-CA model, indicating that simultaneously integrating temporal and spatial information into the neighborhood can further improve the performance of the CA model.
In Figure 19, we compare the simulation results of the four models with the real urban morphology.The results indicate that the differences between the four simulation results are not significant, with only minor differences in subtle areas.By comparing some of the spatial details, it can be found that, in the actual rapidly urbanizing areas, the GTWN-CA model develops the most cells, and the development path is more similar to reality.Overall, the simulation results of the GTWN-CA model are closer to the reality.

Discussion
With the increasing recognition of the importance of spatial and temporal information in explaining geographical phenomena and processes, spatiotemporal modeling has attracted widespread research interest in recent years.Spatial heterogeneity and temporal non-stationarity are widely found in geographical processes.Consequently, many geographic models have attempted to break the assumption of spatiotemporal stationarity and have achieved good results (Huang et al. 2010, Fotheringham et al. 2015).In this study, we incorporated temporal information into the neighborhood of the CA model to construct a temporally non-stationary neighborhood (TWN).The experimental results showed that the validation accuracy of the TWN-CA model is higher than that of the HON-CA model, indicating that the experimental hypothesis that urban cells that were developed in more recent years have larger impacts than those developed in earlier years is valid.Constructing a temporally non-stationary neighborhood is of positive significance for urban CA modeling.This result is consistent with the findings of Li et al. (2020).Previous studies have demonstrated the importance of spatial information in calibrating the neighborhood effect (Liao et al. 2014, Zeng et al. 2023a).To simultaneously integrate temporal and spatial information into the neighborhood, we introduced spatiotemporal modeling into the neighborhood of the urban CA model and constructed a spatiotemporal non-stationary neighborhood (GTWN).The results obtained in this study indicate that, in most cases, the simulation accuracy of the GTWN-CA model is better than that of the TWN-CA model, indicating that simultaneously integrating temporal and spatial information into the neighborhood can further improve the performance of the CA model.
The agglomeration effect serves as a fundamental driver leading to the formation and growth of cities (Han et al. 2019).Current neighborhoods in CA models mostly focus on characterizing the spatial agglomeration effect, which assumes that urban expansion is more likely to occur near urban land (Liao et al. 2016, Zhang et al. 2022).However, recent advancements in remote sensing have revealed new insights.Scholars, through analyzing long-term remote sensing images, have observed that urban expansion often follows historical trajectories, implying that areas adjacent to recently developed regions are more likely to undergo further development (Gong et al. 2019;Huang et al. 2020).This observation underscores the temporal continuity inherent in urban expansion.Consequently, when characterizing the neighborhood effect in a CA model, it is imperative to consider not only the spatial agglomeration effect but also the temporal continuity of urban expansion.Grounded in this understanding, in the GTWN, we make the assumption that urban cells with smaller spatial and temporal distances to the central cell exert a greater influence on the central cell.This assumption is based on the premise that nearby cells, both in terms of geographic proximity and temporal proximity, are more likely to share similar development patterns and dynamics.Consequently, their impact on the central cell's stage is considered more significant than that of more distant cells.Spatial information refers to the geographical location of urban cells provided by land-use data.To calculate the spatial distance, we extract spatial information from land-use data, namely, the spatial location of each urban cell.Similarly, in order to calculate the temporal distance, we extract temporal information from long-term land-use data, i.e., the year in which each urban cell was developed.Therefore, spatiotemporal information is crucial for the construction of the GTWN.
We assessed the performance of the CA models from two phases: calibration and validation.In the calibration phase, we calibrated the models based on the urban expansion pattern from 2000 to 2010.In the validation phase, the calibrated models were used to simulate the urban expansion from 2010 to 2015.Consequently, the calibration accuracy represents the fitting ability of the model to reproduce the urban expansion that has occurred, while the validation accuracy represents the prediction ability of the model to predict the future urban expansion.By observing the simulation accuracy of these two phases, we can effectively prevent the over-simulation issue in the model.This refers to a situation where a model exhibits an exceptional calibration accuracy but fails to demonstrate a comparable advantage in validation accuracy.The ultimate goal of building an urban CA model is to predict the future urban expansion, which means that the evaluation of the validation accuracy is crucial.To ensure the generalizability of the experimental results, we tested the models in two study areas: Beijing and Wuhan.In terms of the FoM, although the calibration accuracy improvement of the GTWN-CA model was not significant, with 0.87% in Beijing and 5.4% in Wuhan, its validation accuracy showed a greater improvement in both regions, with 7.9% in Beijing and 8.9% in Wuhan.This indicates that the GTWN-CA model performs excellently in predicting future urban expansion trends, providing better decision support for urban planning and management.
Impervious surface data were used in this study as these data can provide longterm information on the land-use changes from 1985 to 2017, which was crucial for this study.Due to the convenient access and sufficient information, imperious surface data have been widely used in urban expansion research (Omurakunova et al. 2020, Qian and Wu 2019, Gong et al. 2019).However, compared to normal land-use data, the urban patches in impervious surface data are often more fragmented, with a large number of independent patches.Figure 20 compares the urban land-use maps of Beijing in 2015 obtained after reclassification from the different data sources.To the left is China's land use and land cover change (CNLUCC) dataset (http://www.resdc.cn),and to the right is the impervious surface data.Clearly, the urban patches in the left image are more concentrated and organized, while the urban patches in the right image are more fragmented.The mechanism of the CA model always enables nonurban cells to be developed in a centralized manner, which makes the CA model less adept at generating fragmented patches.Consequently, the FoM of the simulation result generated by the impervious surface data is relatively low.In the experimental results, the optimal neighborhood size for the HON-CA model is 3, and the optimal spatial bandwidth of the GWN and GTWN does not exceed 2.2, indicating that this feature of impervious surfaces data leads to a small spatial local interaction domain in the neighborhood, and the neighborhood effect in space is relatively weak.This, to some extent, limits the improvement of simulation accuracy by the spatiotemporal non-stationary neighborhood (GTWN).To test the robustness of the GTWN-CA model on different datasets, we collected the CNLUCC data of Beijing over five periods: 1995,  3, and the specific experimental results are presented in the Supplementary Material.The results indicate that the GTWN-CA model performs well with the CNLUCC data.In the future, with the increasing availability of long-term land-use data, the advantages of the GTWN-CA model will become more significant.
In this study, we utilized three metrics -FoM, OA and kappa -to assess the simulation accuracy of the CA models.In the obtained simulation results, there was a minimal difference in the values of OA and kappa, compared to FoM, across the different models.The reason behind this lies in the fact that the values of OA and kappa are intimately associated with the quantity of unchanged urban cells and non-urban cells within the study area.Given that the number of cells undergoing state changes is significantly lower than the number of state-persistent cells, the responsiveness of OA and kappa to variations in the simulation accuracy of the CA model is relatively diminished.It is noteworthy that kappa can generate misleading or flawed evaluation outcomes (Pontius and Millones 2011).When assessing a CA model, it is crucial to exclude cells that do not undergo change, as these cells would be correctly captured by CA without much effort (Gao et al. 2022).Consequently, the FoM that excludes state-persistent cells is a more appropriate metric as it focuses on the net change in the urban cells (Pontius et al. 2008).This focused assessment enables a more precise and meaningful evaluation of the CA model's performance in simulating urban dynamics.
Bandwidth optimization is a key step in spatiotemporal modeling, and an inappropriate bandwidth can even reduce the performance of the model.However, bandwidth optimization typically has high computational complexity.In this study, to reduce the computational complexity, the temporal bandwidth was limited to a positive integer.Modelers can flexibly adjust the accuracy of the optimal bandwidth based on their own hardware conditions and accuracy requirements.For example, if the hardware conditions are good and a high simulation accuracy is expected, the spatial and temporal bandwidths can both be set to positive real numbers.In contrast, the spatial and temporal bandwidths can both be set to positive integers.After obtaining the optimal bandwidth, it is incorporated into the corresponding spatial and temporal kernel functions, and the changes in the function curve approximately depict the decay process of the neighborhood effect in the temporal and spatial dimensions.From a spatial perspective, the spatial bandwidth of the GTWN-CA model is smaller than that of the GWN-CA model, indicating that the neighborhood effect of the GTWN-CA model decays faster in the spatial dimension.From a temporal perspective, the temporal bandwidth of the GTWN-CA model is smaller than that of the TWN-CA model, indicating that the neighborhood effect of the GTWN-CA model decays faster in the temporal dimension.This seems to be an interesting phenomenon, in that the coupling of spatial and temporal decay effects makes the neighborhood effects in all dimensions decay more severely.Furthermore, it should be noted that the focus of this study was on the neighborhood of the urban CA model.Consequently, we used a conventional model, the ANN model, to obtain the development suitability.In fact, the GTWN could be easily coupled with other methods for modeling, including some recently developed urban CA models, such as the Bayesian spatially-varying coefficients (BSVC) model and the geographically weighted ANN (GWANN) model (Chen et al. 2020, Zeng et al. 2023b).

Conclusion
In this study, we constructed four neighborhood configurations based on different spatiotemporal stationarity assumptions to quantify the neighborhood effects of the CA model.Taking Beijing and Wuhan as examples, the four models, i.e., HON-CA, GWN-CA, TWN-CA, and GTWN-CA, were used for urban expansion simulation.The experimental results indicated that the GTWN-CA model based on the spatiotemporal non-stationary assumption has the best simulation performance.Compared with the commonly used HON-CA model, the calibration accuracy of the GTWN-CA model was improved by 0.87% in Beijing and 5.4% in Wuhan, and the validation accuracy was improved by 7.9% in Beijing and 8.9% in Wuhan.Consequently, this represents a successful attempt to introduce spatiotemporal modeling into the neighborhood of urban CA models.The GTWN-CA model can more accurately simulate the process of urban expansion and provide decision-making support for sustainable management and conservation of urban land resources.However, limitations exist in this study.In the GTWN, the optimization of the temporal and spatial bandwidth is carried out at the same spatial and temporal scales.In the future, we will make further progress by incorporating scale effects into the neighborhood of the CA models.

Figure 1 .
Figure 1.The architecture of the ANN model.

)
Dist ðu, vÞ ¼ T simulation − T develop (7) where weight ðu, vÞ is the weight at location ðu, vÞ; Dist ðu, vÞ is the temporal distance from location ðu, vÞ to the central cell; and T simulation is the time represented by the current iteration.Since the available land-use data reflect annual land-use changes, we chose year as the temporal unit.For example, the simulation starts in 2000 and takes half a year as an iteration.If the current iteration is 4, then T simulation is 2002, for which the calculation steps are 2000 þ 4 � 0:5: T develop is the time when cell ðu, vÞ was developed into urban.f ðÞ is a kernel function.Referring to the research of Huang et al. (2010) and Fotheringham et al. (

Figure 2 .
Figure 2. The construction process of the GWN.

Figure 3 .
Figure 3.The construction process of the TWN.

Figure 4 .
Figure 4.The construction process of the GTWN.

Figure 5 .
Figure 5.The calculation process for the fitness.

Figure 6 .
Figure 6.The architecture of the GA model.
shows the urban growth of Beijing and Wuhan during 2000-2015.From the figure, the historical pathway of the urban expansion can be seen, where the green area extends outward along the black area and the red area extends outward along the green area.Gong et al. (2019) mapped impervious surfaces over the whole of China during 1985-2017 at an annual frequency.The data were of a 30-m resolution, and the overall accuracy (OA) reached more than 90%.We selected urban impervious surface data for this study, excluding the surrounding rural areas.These data can provide sufficient temporal information to support the construction of the GTWN.Through the clipping and reclassification tools of ArcGIS Pro, we obtained the specific time when each urban cell was developed and generated urban expansion maps of Beijing for three dates: 2000, 2010 and 2015.Considering the scientific nature of the modeling and the availability of data, 10 spatial variables were selected for the training of the ANN, all

Figure 7 .
Figure 7.The flowchart of the experimental process.

Figure 8 .
Figure 8.The urban growth of Beijing and Wuhan during 2000-2015.
Figure 10 shows the change of the loss during the training.For Beijing, the model gradually converged after the 36th epoch, and we chose the network of the 39th epoch as the optimal result.For Wuhan, the model gradually converged after the 34th epoch, and we chose the network of the 39th epoch as the optimal result.By inputting the spatial variables of each cell in the study areas into the trained ANN models, we obtained the development suitability (Figure 11).The development suitability in the southeast of Beijing is relatively high, while the development suitability in the northwest is relatively low.The development suitability of the central region in Wuhan is relatively high, while the marginal areas have a relatively low development suitability.

Figure 10 .
Figure 10.The change of the loss during the training (a) Beijing (b) Wuhan.

Figure 12 .
Figure 12.Sensitivity analysis for the neighborhood size of the HON-CA model (a) Beijing (b) Wuhan.

Figure 13 .
Figure 13.The optimization result of the GWN-CA model.

Figure 14 .
Figure 14.The relationship between weight and spatial distance in the GWN.

Figure 15 .
Figure 15.The optimization result of the TWN-CA model.

Figure 16 .
Figure 16.The relationship between weight and temporal distance in the TWN.

Figure 17 .
Figure 17.The optimization result of the GTWN-CA model.

Figure 18 .
Figure 18.The relationship between weight and temporal and spatial distance in the GTWN.

Figure 19 .
Figure 19.The reality and simulation results for the urban expansion.

Figure 20 .
Figure 20.Data comparison between the impervious surface data and the CNLUCC data.

Table 1 .
Details of the spatial variables.

Table 2 .
The accuracy of the four CA models.

Table 3 .
The simulation accuracy for the CNLUCC data.