A neural network model to optimize the measure of spatial proximity in geographically weighted regression approach: a case study on house price in Wuhan

Abstract The estimation of spatial heterogeneity within real estate markets holds significant importance in house price modelling. However, employing a single or straightforward distance to measure spatial proximity is probably insufficient in complex urban areas, thereby resulting in an inadequate modelling of spatial heterogeneity. To address this issue, this paper incorporates multiple distance measures within a neural network framework to achieve an optimized measure of spatial proximity (OSP). Consequently, a geographically neural network weighted regression model with optimized measure of spatial proximity (osp-GNNWR) is devised for the purpose of spatially heterogeneous modeling. Trained as a unified model, osp-GNNWR obviates the need for separate pretraining of OSP. This enables OSP to delineate the modeled spatial process through a post hoc calculated value. Through simulation experiments and a real-world case study on house prices, the proposed model reaches more accurate descriptions of diverse spatial processes and exhibits better overall performance. The interpretable results of the case study in Wuhan demonstrate the efficacy of the osp-GNNWR model in addressing spatial heterogeneity within real estate markets, suggesting its potential for modelling and predicting complex geographical phenomena.


Introduction
Housing is an essential component of human well-being and development.Studying the housing market, researchers can gain insights into some of the most pressing social, economic, or political issues facing society.The hedonic model (Lancaster 1966, Rosen 1974) provides a theoretical foundation for the structural analysis of market prices for complex goods from a consumption perspective.Within this framework, the worth of a commodity lies in its useful properties and characteristics.The hedonic model has been extensively applied to empirical studies of the dynamics of the housing market and has demonstrated superior interpretability (Witte et al. 1979).Since spatially heterogeneous socio-economic and demographic conditions can lead to functional disequilibria of real estate market, spatial heterogeneity should be considered in modelling house price (Helbich et al. 2014).
Geographically weighted regression (GWR) model (Brunsdon et al. 1996) and geographically neural network weighted regression (GNNWR) model (Du et al. 2020) are both local weighted regression model for spatially heterogeneous process, and are frequently used in studies of house price modelling (Yu 2006, Du et al. 2018, Wang et al. 2022a, Wang et al. 2022b).The localized estimates of these models are commonly derived through computations reliant on the measure of spatial proximity among samples.Hence, the precision of local weighted regression hinges primarily on the accuracy in measuring spatial proximity.
The original GWR and GNNWR models use the Euclidean distance to measure spatial proximity.But in real-world scenarios like house price or other complex urban processes, the spatial proximity is not merely a straight line in the homogeneous geometric space.Models based solely on ED may be inadequate for characterizing the spatial relationships between regression points, particularly when the spatial proximity measure is biased.For an accurate measure of spatial proximity, scholars have introduced Minkowski distance as a general distance measure in Euclidean space (Lu et al. 2016, Comber et al. 2020).Moreover, non-Euclidean distance measures have been incorporated into local weighted regression models, such as road network distance and travel time (Lu et al. 2014, Cao et al. 2019).In addition to the physical space, He et al. (2023) has proposed a topological distance formed from the network flow to measure the proximity within the space of flows.Results from these studies indicate that choosing an appropriate measure of spatial proximity can enhance fitness and interpretability of the regression model.Furthermore, the diverse mechanisms of house prices factors lead to varied spatial patterns in distance measures as well.Consequently, there is no one-fit-all measure of spatial proximity, and employing a singular distance measure for spatial proximity across the entire study area is likely to yield inaccuracies.Therefore, integrating multiple distance measures to formulate an accurate measure of spatial proximity tailored to specific scenarios holds the potential for enhancing spatial regression models significantly.
The aim of this study involves devising a framework to estimate spatial proximity using multiple distance measures and incorporating it within the GNNWR model.Firstly, we introduce a neural network model to formulate an optimized measure of spatial proximity (OSP) based on various distance measures.Secondly, we propose a geographically neural network weighted regression with optimized measure of spatial proximity (osp-GNNWR) to capture spatial heterogeneity in complex spatial relationships.Thirdly, the performance of the osp-GNNWR model is assessed against GWR and GNNWR with both Euclidean distance and non-Euclidean distance, employing simulated and real house price datasets for analysis.
The remainder of this article is constructed as follows.In Section 2, we provide an overview of the osp-GNNWR model.Section 3 conducts a simulation experiment to evaluate the efficacy of osp-GNNWR.Section 4 presents and discusses the outcomes of a case study of house prices in Wuhan, China.Finally, the concluding section offers a summary of our findings and suggests potential avenues for future research.

GWR and GNNWR models
GWR is an important technique in spatial non-stationary modelling.Unlike ordinary least square (OLS) model, GWR represents a localized regression approach that employs specific local spatial weights to quantify spatial heterogeneity: where Þ are the k-th independent variable and its regression coefficient at this point.The coefficients can be estimated with weighted least squares method as following: 2), the estimated value ŷi of dependent variable at point i can be derived with the corresponding independent variables x i : Since y, X and x i are all observed and determined values, the estimating result of ŷi is directly influenced by the spatial weighting matrix Wðu i , v i Þ: This is calculated from a distance-decay kernel function like Gaussian (Equation 4) or bi-square (Equation 5) based on the distances d ij from sample i to samples around, in which a selected bandwidth b determines the decay rate of regression weights.
Considering that artificial neural network can be an approximate realization of arbitrary continuous nonlinear functions (Funahashi 1989, Chen andBillings 1992), a spatial weighted neural network (SWNN) method was formulated to incorporate a neural network representation of the spatial weighting matrix within the GNNWR model (Du et al. 2020).This method demonstrates proficiency in learning complex distanceweight mapping, thereby mitigating the risk of underfitting encountered when employing predetermined kernel functions: where T represent for the distance vector between point i and all the sample point.
Based on a concept similar to GWR model, the GNNWR model uses SWNN to emulate the spatially heterogeneous mechanism of parameters and calculate their respective weights over the space.According to Equation (3), the estimates of dependent variable in GNNWR can be calculated with: where S GNNWR is the hat matrix of the GNNWR model.According to Equation ( 7), GNNWR model is able to calculate the estimated value directly from observations.Leveraging the neural network's advantaged capacity to accommodate nonlinear features, it facilitates the automatic and precise construction of the spatial weighting matrix.

Optimized measure of spatial proximity
Spatial proximity plays a pivotal role in geographical analysis and spatial statistics, determining the correlations among diverse geographical entities.Typically, scholars employ 'distance' as a metric to gauge spatial proximity, and various distance measures are proposed to suit different application contexts.The simplest expression is the Euclidean distance (ED), also known as the linear distance, which measures the length of a line segment connecting two points.In a twodimensional plane with coordinates, the Euclidean distance between point P i u i , v i ð Þ and P j u j , v j ð Þ can be calculated as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi In practical applications, geometric distance measures face limitations due to impediments posed by natural terrain and human-made structures, hindering direct linear interactions among geographic entities.In these circumstances, adopting a shortest-path distance, such as road network distance or travel duration (TD), proves to be a more precise indicator of spatial proximity.The shortest-path distance signifies the most minimal length among all possible paths between two points within a specified path network and cost function (Smith 1989).
While individual spatial proximity measures offer situational advantages, their utility may be constrained when employing only a singular representation within intricate, multi-scale scenarios influenced by numerous factors.For instance, housing prices can be influenced by diverse elements, encompassing natural factors like vegetation or noise, along with socio-economic factors such as education or sanitation, where the former typically exerts influence along straight-line distances and the latter through road networks.Moreover, each factor may have varying scales of impact, complicating the establishment of a definitive measure of spatial proximity.
Hence, we aim to build an optimized measure of spatial proximity by integrating multiple distance measures.Here, the optimized measure of spatial proximity p ij between sample i and j is defined as: where d n ij is a certain distance measure between sample i and j, such as Euclidean distance, Manhattan distance, or travel duration; and f SP represents for the fusion function of spatial proximity.It should be notice that f SP is not defined with a certain expression.The fusion function can take the form of a weighted sum, Euclidean norm, neural network, or any other suitable function, as long as it can mapping input distances into a certain non-negative value that represents spatial proximity.
Since distance measures have varying definitions, the input distances for f SP may exhibit diverse magnitudes and units.f SP thereby must possess the capability to execute a complex nonlinear transformation to map these distances into a unified measure of spatial proximity.To realize the mapping process, we propose a neural network technique termed the spatial proximity neural network (SPNN), aiming to convert the formulation of a specific function into a data-driven procedure by training a neural network: where is the vector of input distance measures between sample i and j: Specifically, we employ a fully connected network as the realization of the SPNN (Figure 1) due to its capability in discerning intricate patterns and relationships within the diverse input distances.

osp-GNNWR model
Based on the concept of OSP and its neural network-based fusion method, we propose the osp-GNNWR model, depicted in Figure 2. Within this model, SPNN integrates into the GNNWR model to enhance the assessment of spatial proximity.The output of SPNN was connected to SWNN as the spatial input to calculate the spatial weights.Hence, the spatial weighting matrix at point i can be computed as follows: Like GNNWR model, the estimations of ŷ can be calculated with: During the neural network training phase, an optimization algorithm is employed to iteratively adjust the network's weights and biases, minimizing the loss function that gauges the disparity between the predictions and the actual values.Nevertheless, the output of the SPNN encapsulates spatial proximity in an abstract manner, lacking concrete values.Consequently, formulating a specific loss function for SPNN and directly training the network becomes unfeasible.
However, upon integration of SPNN into the osp-GNNWR model, the overall objective function primarily focuses on minimizing the error between estimated and real values of sample regression points.As a result, there arises no necessity to formulate a distinct loss function or independently train SPNN.Once the entire neural network attains an acceptable threshold of regression error, the well-trained biases and weights of SPNN contribute to calculating OSP values, granting it with the capability to provide quantitative descriptions and analyses of spatial relationships.
As depicted in Figure 3, the procedural outline for the training and validation of the osp-GNNWR model unfolds as follows: Step 1: Extraction of dependent and independent variables for constructing the regression model.
Step 2: Random division of the dataset into train, validation, and test sets in appropriate proportions.
Step 3: Calculation of sample distances as spatial information within the osp-GNNWR model.Step 4: Establishment of the osp-GNNWR model incorporating network structure and hyper-parameters, utilizing the input variables and spatial information.
Step 5: Acquisition of mini-batch data from the train set, employing the gradient descent algorithm for training, and assessing the goodness-of-fit, such as using MSE (mean squared error) as the loss function.
Step 6: Evaluation for completion of current epoch; if not completed, return to Step 5.
Step 7: Evaluation of the loss function on the validation set to determine potential overfitting.If the loss improves upon the previous optimum, a new superior model is retained; otherwise, the count for overfitting tolerance is incremented.
Step 8: Assessment for reaching the maximum value in overfitting tolerance or number of epochs; upon reaching the limit, the training ceases, and the latest superior model is evaluated using the test set.Otherwise, iteration continues from Step 5.

Evaluation method
To facilitate model performance comparisons, evaluation metrics such as the determination coefficient (R 2 ), root mean square error (RMSE), and mean absolute error (MAE) serve as quantitative indicators.The formulae for these metrics are provided below: Besides, Moran's I-statistic (Moran 1948) is used to measure the spatial differences in the prediction performance of the localized models: where S 0 is the sum of spatial weights w i, j :

Simulation dataset
To assess the fitting precision of the osp-GNNWR model, a 64 � 64 spatially heterogeneous simulation dataset was generated following the methodology proposed by Harris et al. (2010).It is crucial that the simulated dataset should demonstrate heterogeneity across spaces beyond the Euclidean space, so that the efficacy of OSP can be demonstrated.Specifically, the Z-order curve (Morton 1966) was employed to introduce an additional heterogeneous space based on Z-order curve.Within this space, spatial proximity is measured through the Z-order distance (Z-dist), representing the difference between sequential positions along the curve.The observation y i ðu i , v i , z i Þ is determined based on its positional attributes across both Euclidean space and Z-order space: where the regression coefficients b By iteratively generating independent variables x 1 and x 2 using a uniformly distributed random function, we derived 50 simulated datasets for experiment replication.The depiction of the distribution of the dependent variable y and regression coefficients is depicted in Figure 4. Notably, it is observable that b 1 and b 2 exhibit variability concerning the Euclidean and Z-order distances, respectively, while b 0 demonstrates a blended pattern involving both Euclidean and Z-order distances.

Model assessment
Models including OLS, GWR, GNNWR, and osp-GNNWR, were applied to the simulated datasets for analysis.In the GWR model, both Euclidean distance and Z-order distance served as control measures for distance evaluation.A test set comprising 15% of the dataset was randomly selected to impartially assess model suitability, while the remaining 85% was allocated for training and validation to construct regression models.As the training process of GNNWR and osp-GNNWR can take a long time, we first trained the models on a randomly selected simulated dataset (whose evaluation results are shown in Table 1) and fitted the remaining simulated dataset with the trained model.
Table 1 indicates that the GWR models, utilizing either Euclidean or Z-order distances for spatial proximity, outperform OLS on both the train and test sets.The osp-GNNWR model demonstrates superior performance, as evidenced by higher R 2 values, lower RMSE, and MAE indicators across both the train and test sets.
Figure 5 illustrates the estimated distribution of regression coefficients b and b 2 z i ð Þ as determined by GWR, GNNWR, and osp-GNNWR.While these localized regression models effectively capture the general spatial distribution pattern of the regression coefficients, certain intricate features pose challenges for the GWR and GNNWR models.Notably, the abrupt variation in the b 2 coefficient along the Z-order curve delineates 16 distinct square subregions within the simulation space.GWR and GNNWR approximate the b 2 distribution primarily through Euclidean distances, whereas osp-GNNWR accurately delineates these 16 subregions with distinct boundaries utilizing the input Z-dist.Moreover, concerning the b 0 coefficient influenced by both Euclidean and Z-order distances, the osp-GNNWR model integrates these distances via SPNN, preserving the straight linear division caused by the Z-dist as well as the rounded gradual transition originating from the Euclidean distance.
Moreover, the F-test outlined in Appendix A reveals significant differences in the accuracy of estimated coefficients among diverse models.Table 2 offers a quantitative assessment of regression coefficient precision through the Pearson correlation coefficient (PCC) and RMSE for each model.Specifically, osp-GNNWR and GNNWR exhibit superior coefficient estimation compared to GWR.Particularly for coefficients influenced by Z-order distance, such as b 0 and b 2 , osp-GNNWR displays heightened fitting accuracy.These results on the simulated dataset indicate that SPNN has an exceptional generalization ability for input distances.Given the inability of a sole Euclidean metric to encapsulate the spatial patterns of geographic processes in the real world, it is reasonable to posit that the regression outcomes generated by osp-GNNWR offer a more precise depiction of actual spatial heterogeneity within the influencing factors.

Study area and data
Wuhan, situated in central China as the capital city of Hubei Province, resides at the convergence point of the Han River into the Yangtze River.Characterized by a humid subtropical climate and abundant rainfall, the city boasts numerous rivers, lakes, and ponds, posing challenges in assessing spatial proximity.
Being the largest and most densely populated city in Central China, Wuhan hosts a thriving real estate market, affording adequate house price data for constructing a comprehensive model specific to Wuhan's real estate dynamics.A dataset comprising 968 distinct estate samples was compiled for this purpose, drawn from authentic transaction records of second-hand residential properties within Wuhan throughout 2019, obtained from Anjuke (https://wuhan.anjuke.com).All these records were cleaned and special property types like villas were excluded, and the data quality is guaranteed.The study area and dataset for this paper was shown in Figure 6.We select Euclidean distance and travel duration as the input distances for the SPNN, given their common usage in analyzing urban housing prices.The former was computed directly using coordinates within the projected coordinate systems (WGS 84 World Mercator), while the latter relied on driving time calculated via Amap's path planning API (https://lbs.amap.com/api/webservice/guide/api/direction).
Drawing insights from prior research (Glumac et al. 2019, Kang et al. 2021), we categorize housing price influencers into three domains: structural attributes, neighborhood features, and transportation accessibility.To construct the house price regression model, ten representative indicators were chosen as the independent variables.Detailed variables illustrating these three factors are presented in Table 3.
It is important to highlight that within the neighborhood and transportation indicators, only PD is quantified in meters to assess its environmental and recreational attributes (Ram� ırez-Juid� ıas et al. 2022).Conversely, other indicators are measured in walking minutes to gauge the accessibility of community facilities.To mitigate heteroskedasticity and standardize variable magnitudes, a natural logarithmic transformation was  applied to both dependent and independent variables.Furthermore, properties equipped with inherent living facilities were standardized to a value of 0. Table 4 presents an overview of these variables.

Experiment implementation
Models include OLS, GWR, GNNWR and osp-GNNWR are conducted on the house price dataset, with 15% utilized as a test set for assessing the unbiasedness and predictive ability of the models, while the remaining 85% serves as both training and validation sets for regression model generation.
Specifically, the GWR model adopts a fixed Gaussian kernel function based on prior studies demonstrating superior performance in house price analysis (Wang et al. 2022b).Both GNNWR and osp-GNNWR models employ a 10-fold cross-validation method to maximize sample utilization within the training and validation sets.The spatial weight matrix is calculated using a three-layer SWNN, with an additional two-layer SPNN incorporated into osp-GNNWR to optimize spatial proximity, balancing efficiency and fitting capacity.To counter overfitting or oscillating loss scenarios, a learning rate warmup and decay strategy is implemented during mini-batch gradient descent.Furthermore, a grid search technique is utilized to select hyperparameters, and the structures and hyperparameters of the neural-network-based models are outlined in Table 5. Notably, the osp-GNNWR model demonstrates sensitivity to dropout and L2 regularization hyperparameters.A dropout ratio of 0.5 or higher combined with L2 regularization is recommended to alleviate potential overfitting issues.

Model performance evaluation
Table 6 displays the comparative performance of OLS, GWR, GNNWR, and osp-GNNWR models.OLS, as a global model, lacks the capacity to incorporate spatial characteristics in regression parameters, leading to notably inferior statistical metrics compared to localized weighted models.The GNNWR-based models exhibit superior performance over the traditional GWR model in both training and test sets.This suggests that the refined spatial weight matrix offered by SWNN significantly aids in capturing underlying house price trends.Similar to GWR (Lu et al. 2014), the efficacy of the GNNWR model is influenced by the choice of spatial proximity measure.In this case study of Wuhan house price, the GNNWR model exhibits improved fitness and interpretability when using travel duration for measuring spatial proximity, as opposed to Euclidean distance.
The osp-GNNWR model achieves the highest training R 2 alongside the lowest RMSE and MAE values of 0.852, 0.127, and 0.086, respectively.This superior performance extends to the test set, indicating enhanced generalization and predictive capability on unseen data.Notably, compared to GNNWR(TD), the osp-GNNWR model enhances the testing R 2 from 0.737 to 0.793, accompanied by reductions in RMSE from 0.168 to 0.149 and in MAE from 0.125 to 0.109.These results signify that the integration of OSP enhances the fitting and predictive performance of the osp-GNNWR model, establishing it as the most effective approach among the examined models for estimating spatial heterogeneity in house prices in Wuhan.

Spatial residual analysis and assessment
GNNWR-based models have an advantage in estimating elements with spatial heterogeneity, necessitating an evaluation of these models' performance from a spatial context.To achieve this, we conducted a spatial analysis of models' predictive ability with the residuals derived from the GNNWR and osp-GNNWR models.
Figure 7(c) illustrates the frequency distributions of these residuals, demonstrating bell-shaped distributions with respective medians closely approximating 0. Notably, the osp-GNNWR model exhibits the narrowest box within the residual distribution and demonstrates the smallest absolute values for both the minimum and maximum residuals.This observation implies a higher probability for the osp-GNNWR model to yield better predictive results with smaller residuals compared to alternative models.
Moreover, we investigated the spatial disparities in the efficacy of the osp-GNNWR and GNNWR models by computing the residual difference.The residual difference delineates the variance between the absolute residuals of these models, wherein both overvaluation and undervaluation are equally accounted for.A negative disparity indicates a more accurate estimation by the osp-GNNWR model compared to the GNNWR model.The magnitude of this negative value further signifies the degree of advantage conferred by the osp-GNNWR model.The resulting maps portraying the residual difference between the osp-GNNWR and GNNWR models are depicted in Figure 7, with the GNNWR model employing the ED in Figure 7(a) and the TD in Figure 7(b).
According to these visual representations, the osp-GNNWR model demonstrates superior prediction accuracy over the GNNWR model in regions characterized by intricate natural features such as rivers, lakes, ponds, specifically the western bank of Lake Tangxun in Jiangxia District and the shoreline of Hougong Lake in Caidian District.Additionally, the osp-GNNWR model indicates reduced residuals in the vicinity of the confluence of the Han and Yangtze rivers, an area distinguished by intricate river structures, bridges, and tunnels, leading to inaccuracies in measuring spatial proximity using ED.The presence of complex multi-tiered road networks also contributes to errors in TD assessments.Conversely, the utilization of the osp-GNNWR model, incorporating an enhanced assessment of spatial proximity through the OSP, results in more accurate predictions within this region.Furthermore, in newly developed regions like Huangpi District and Xinzhou District characterized by predominant high-grade expressways within the road network, the osp-GNNWR model exhibits improved performance compared to the GNNWR model utilizing ED.This enhancement arises due to the spatial closeness between sample points surpassing their actual physical separation, a situation wherein OSP-based assessment demonstrates superiority over a singular distance measure of spatial proximity.

Capabilities of OSP
The SPNN outputs derived from the trained osp-GNNWR model on the test set are extracted, representing the OSP between test points and training points.The spatial pattern of OSP can be expressed by contrasting it with ED.The disparity between OSP and ED can be quantified as follows: where OSP norm and ED norm are normalized by min-max normalization.
Based on the Figure 7, we incorporated elements delineating the disparity d and constructed Figure 8 accordingly.Within Figure 8, the magnitude of the test point represents the mean deviation from that specific point to the training points, wherein a larger point denotes a more substantial variance between OSP and ED. Figure 8 demonstrates a notably heightened disparity in the peripheral urban regions compared to the central locales.Generally, as the Euclidean distance between two points increases, the absolute discrepancy between OSP and ED intensifies.Over longer spatial intervals, the discrepancy between TD and ED amplifies due to heightened variability within the road network.Considering OSP as a fusion of ED and TD, any significant divergence between TD and ED is consequently reflected.
Furthermore, exemplified by the instance in Figure 8(a), there are more pronounced variations among the outer points concerning the OSP and ED.Notably, the directional disparities indicated by the arrow orientation are notably reduced.This alignment corresponds to the elevated Dongfeng Avenue along Wuhan's Second Ring Road, representing the shortest travel route in this specific direction.TD achieves its minimal value, consequently reducing the disparity.Conversely, points from the central area (Figure 8b) exhibit smaller and uniformly distributed discrepancies across all directions.This uniform distribution can be attributed to the well-established transportation infrastructure in the downtown area, ensuring consistent travel durations in all directions.Simultaneously, the intricate yet well-developed road network contributes to more pronounced differences between TD and ED, resulting in the most substantial disparities in Figure 8b appearing within the innermost circle of the radial lines.

Spatial non-stationarity of parameter coefficients
The coefficients derived from the osp-GNNWR model offer a workable perspective for examining the spatial heterogeneity within Wuhan's housing prices.As presented in Table 7, these coefficients exhibit varying positivity and a broad range of values across variables.Notably, their respective F-test values (Du et al. 2020) significantly surpass the critical threshold of the F-distribution at a 0.01 significance level.This robustly rejects the null hypothesis, confirming the substantial presence of significant spatial heterogeneity within the parameter coefficients.Particularly, we have selected UA and SA parameters for a more in-depth analysis of factor spatial heterogeneity.The coefficients of UA range from -0.186 to 0.030, bearing a standard deviation of 0.026, indicating noteworthy spatial heterogeneity in the influence of universities on housing prices.Similarly, the SA parameter demonstrates an uneven negative correlation with housing prices, evidenced by mean and standard deviation coefficients of -0.338 and 0.067, respectively.To explore the potential spatial heterogeneity in the impact of universities and subways, we have depicted the distribution of their coefficients in Figure 9.
Land-centered speculative construction (Li et al. 2014) and the land-tuition-leverage strategy (Shen 2022) in China lead to a noticeable increase in land prices of university towns.Moreover, universities and research institutions bring better livability and create a prosperous rental market, which further inflates house prices in the surrounding areas (Wen et al. 2018).Hongshan District, as the location of a university town in Wuhan, is a prime example of how proximity to a university can impact house prices.According to Figure 9(a), the UA parameter (i.e. the coefficient measuring the effect of distance to the university on housing prices) in the center of Hongshan District is significantly higher than in other areas.This suggests a positive impact of universities on housing prices within this locale, with prices exhibiting an upward trend in proximity to these educational institutions.The findings underscore the substantial influence of universities within the housing market of Hongshan District, wherein the accessibility of university amenities and the demand from students and faculty contribute to price escalation.Conversely, in districts with fewer universities, the association between university presence and housing prices lacks significance.
In the case of SA parameter, as depicted in Figure 9(b), a negative correlation between housing price and subway station accessibility is observed.Specifically, the closer a house is to a subway station, the higher the price will be.This correlation is much more significant in suburban regions and densely populated residential zones situated within city centers.In suburbs, access to public transportation is often limited, yet there exists a substantial demand for long-distance commuting.A better proximity to subway stations provides better transportation access (Li et al. 2019), which is an attractive feature for residents, particularly those without a private vehicle.Furthermore, the construction of the subway station brings more commercial facilities to the suburbs, which will revitalize the surrounding neighborhoods and boost the house price (Tan et al. 2019).In densely populated residential areas, the prevalence of congested surface transportation compels residents to favor subway utilization, thereby magnifying the impact of the subway system on housing prices.

Conclusions
In this research, we introduce a neural network approach aimed at integrating multiple distances to attain an optimized measure of spatial proximity applicable to real estate markets and multifaceted processes.Combined this OSP with the GNNWR model, we formulate the osp-GNNWR model to delineate spatial heterogeneity.This model facilitates the precise calculation of the OSP, enabling practical exploration and prediction of complex spatial proximity scenarios.
Our experimentation concerning house prices in Wuhan demonstrates that the osp-GNNWR model surpasses OLS, GWR, and GNNWR models in terms of both fitting and prediction accuracy.In contrast to the GNNWR model, osp-GNNWR exhibits superior interpretability and robustness in predicting house prices across diverse districts characterized by varying socio-economic conditions.These results underscore the significance of a more precise measure of spatial proximity in capturing underlying spatial patterns.Furthermore, our model effectively captures the spatially heterogeneous relationship between house prices and influencing factors, enabling an exploration of the spatial distribution of different influential factors.Overall, these findings highlight the efficacy of OSP in enhancing the osp-GNNWR model's capacity to characterize spatial heterogeneity, thereby advancing the modeling of complex spatial relationships within real estate markets.
Nonetheless, this study retains several limitations that warrant future attention.The training process of osp-GNNWR demands datasets with adequate samples to avoid overfitting or underfitting, which can lead to increased prediction errors due to excessive reliance on specific data points or insufficient representation of the target function.While larger datasets help reduce prediction errors and capture underlying patterns, the computational intensity of osp-GNNWR poses challenges, including heightened memory consumption and prolonged training times.Therefore, efforts should be made to improve the model generalization capacity on smaller datasets and streamline complexity without compromising prediction accuracy.One potential strategy involves exploring alternative fusion functions beyond neural networks, which can not only mitigate computational resource usage but also incorporate OSP into GWR and related methodologies.Besides, more experiments at larger scales and with various combinations of input distances are needed to evaluate the model performance under different scenarios.Additionally, while corrections have been proposed to ensure a conservative test of significance of the parameter estimates in GWR (Byrne et al. 2009, da Silva andFotheringham 2016), it remains to be explored and demonstrated whether similar corrections can be applied to osp-GNNWR.
include spatial-temporal modeling and deep learning.His contributions to this paper include model implementation, experiments, writing and revision of the original manuscript.
Wenying Cen is an undergraduate student of Zhejiang University, Hangzhou, China, majoring in Geographic Information Science.Her current research interests include spatial-temporal modeling and deep learning.Her contributions to this paper include visualization and revision of the original manuscript.
Sensen Wu received his Ph.D. degree in cartography and geographic information system from Zhejiang University, Hangzhou, China, in 2018 and is working as an Associate Professor in the School of Earth Sciences, Zhejiang University.His current research interests include spatialtemporal analysis, remote sensing, and deep learning.His contributions to this paper include the conception of this research, the supervision of the experiments and writing, and the research resource.

Appendix A.
Significance test for the accuracy of estimated regression coefficients across models To assess the statistical significance of differences in the accuracy of regression coefficients estimated by various models, we conduct an F-test on the errors of estimated regression coefficients D ¼ absðb − bÞ within the tested models using simulated data.The null and alternative hypotheses guiding this analysis are formulated as follows: where D 1 and D 2 represent the errors in estimated regression coefficients within two examined models.Denoting the covariance matrices of these models as S 1 and S 2 , respectively, the combined covariance matrix can be formulated as follows: Construct the test statistics F as: where n 1 ¼ n 2 ¼ 4096 is the number of samples, and p ¼ 3 is the number of coefficients.The value of T 2 can be calculated with: The results of the inter-model assessment are presented in Table A1, where the F-test statistics demonstrate compelling support for rejecting the null hypothesis.Consequently, we infer notable distinctions in the accuracy of fitting the regression coefficients among the models.

Figure 1 .
Figure 1.Design of the neural network to achieve optimized measure of spatial proximity.

Figure 4 .
Figure 4. Simulated dataset with high spatial heterogeneity.The distribution of y values is a special case in a particular repetition.

Figure 5 .
Figure 5. Averaged estimation of regression coefficients for the 50 simulated data sets: from top to bottom are b 0u i , v i , z i ð Þ, b 1 u i , v i ð Þ and b 2 z i ð Þ:

Figure 8 .
Figure 8. Discrepancies between OSP and ED, the color of the points represents the residual difference between osp-GNNWR and GNNWR(ED), and the size represents the average discrepancy between OSP and ED at that point.(a) Discrepancies at an outer test point with larger average discrepancy; (b) Discrepancies at a central test point with smaller average discrepancy.

Figure 9 .
Figure 9. Parameter coefficient estimates of the osp-GNNWR model: (a) the coefficient estimates of University Accessibility parameter; (b) the coefficient estimates of Subway Accessibility parameter.
Chen received his Ph.D. degree in cartography and geographic information system from Zhejiang University, Hangzhou, China, in 2023 and is working as an Assistant Professor in the School of Earth Sciences, Zhejiang University.His current research interests include spatialtemporal big data and machine learning.He contributed to the supervision of the experiments.Jin Qi received his Ph.D. degree in remote sensing and geographic information system from Zhejiang University, Hangzhou, China, in 2021 and is working as a Research Associate in the School of Earth Sciences, Zhejiang University.His fields are remote sensing, machine learning, and oceanography.He contributed to the supervision of the experiments and writing.Bo Huang is currently a Chair Professor in the Department of Geography, The University of Hong Kong.His research interests include spatial-temporal statistics, unified satellite image fusion, and multi-objective spatial optimization.He contributed to the conception of the research idea and supervision of writing.Zhenhong Du is currently a Professor in the School of Earth Sciences, Zhejiang University.He is the Director of the Institute of Geography and Spatial Information, Zhejiang University, and is also the Deputy Director of Zhejiang Provincial Key Laboratory of Geographic Information System.His research interests include remote sensing, spatial-temporal big data & artificial intelligence.He contributed to the resource of this research.

Table 1 .
Evaluation indicators for experiment models on the randomly selected simulated dataset.

Table 2 .
Overall PCC and RMSE between the estimated coefficients and the true coefficients on the 50 simulated datasets.The best indicators are marked in italics and underlined, and the second-best indicators are identified in italics.

Table 3 .
Independent variables for house price estimation in Wuhan.

Table 4 .
Overview of dependent and independent variables with logarithmic transformation.

Table 6 .
Evaluation indicators for experiment models on Wuhan house price dataset.

Table 7 .
Summary of parameter coefficients of the osp-GNNWR model.

Table A1 .
Results of inter-model F-test.