Uncovering the association between traffic crashes and street-level built-environment features using street view images

Abstract Investigating the relationship between built environment factors and roadway safety is crucial for preventing road traffic accidents. Although studies have analyzed traffic-related built environment factors based on pre-determined zonal units, conclusive evidence regarding the relationship between streetscape features and traffic accidents at a fine-grained road segment level is still lacking. With the widespread availability of large-scale street view images, automatically analyzing urban built environments on a large scale is possible. Therefore, the aim of this study was to investigate the relationship between streetscape features and traffic accidents at a fine-grained road segment level using street view images. Specifically, we employed semantic image segmentation to extract streetscape elements from urban street view images, and then created traffic crash-related variables, including the street-level built environment variables, traffic variables, land-use indices, and proximity characteristics, at the road-segment level. Finally, we adopted a classification-then-regression strategy to model the number of traffic crashes while considering the zero-inflated and spatial heterogeneity issues. Our findings suggest that streetscape features can effectively reflect built-environment characteristics at the road-segment level. Moreover, a comparison of our proposed modeling method with existing models demonstrates its superior performance. The results provide insight into the development of effective planning strategies to improve traffic safety.


Introduction
Identifying the potential characteristics connected to road traffic safety in urban areas is a longstanding problem in traffic safety studies, and has received increasing attention among traffic engineers and officials, safety professionals, and urban planners (Aarts and van Schagen 2006;Abdel-Aty et al. 2013;Chen and Shen 2019).For example, multinational safety initiatives, such as Vision Zero (Elvebakk 2007), Roadmap to a Single European Transport Area (European Commission 2011), and The Decade of Action for Road Safety 2021-2030(World Health Organization 2020), have been implemented to explore the variations in traffic risk associated with various urban sites.Many researchers have attempted to evaluate the association between crash frequency and the configuration of the built environment for significant crash predictors (Ewing and Dumbaugh 2009;Merlin et al. 2020).Several significant built-environment characteristics, such as vehicle miles traveled (Tasic et al. 2016), population (Wang et al. 2019), accessibility to destinations (Ewing and Cervero 2010), and land-use characteristics (Pulugurtha et al. 2013), have been addressed in many studies.
Relevant research has verified the fundamental links of traffic safety with various built-environment and socioeconomic factors at a macroscopic scale and the influence of the built environment on human behaviors.Various predetermined zonal units of analysis, such as square grids (Asadi et al. 2022), census block groups and tracts (Zhang et al. 2015), and traffic analysis zones, have been adopted to investigate the local heterogeneous impacts in safety research (Obelheiro et al. 2020).However, evidence for crash frequency at a fine-grained scale, such as the street scale, remains unexplored.
In metropolitan areas, streets in a road network connect various urban functional regions.Meanwhile, road networks restrict human movements and their travel behaviors (Shen and Karimi 2016).Ignoring built-environment factors at the street level causes a shortcoming in related research, such as accurately identifying significant crash predictors (Loo and Yao 2013;Ulak et al. 2019).Street units are promising substitutes for areal or zonal units that can provide fine-grained knowledge (e.g.streetscape characteristics) (Zhu et al. 2017).Furthermore, using predetermined areal or zoning schemes, such as the macroscopic traffic analysis unit, may result in bias and faulty statistical inference.In addition, the modifiable areal unit problem (MAUP), to some extent, limits the analysis scale (Zhu et al. 2017;Obelheiro et al. 2020).
The capacity to gather vast volumes of geo-tagged multimedia data, such as street view images, has been greatly improved by the rapid growth of crowdsourcing and map services in the past decade (Liu et al. 2020).Street view images provide realworld scenery from a pedestrian's viewpoint, including both man-made and natural landscapes, making them potential data for auditing different built settings at a finegrained street level (Kang et al. 2020;Verma et al. 2020;Sun et al. 2022).The wide application of state-of-the-art machine learning techniques, particularly the successful use of image segmentation algorithms in computer vision (Ibrahim et al. 2020;Minaee et al. 2022), enables researchers to objectively and quantitatively explore refined builtenvironment characteristics using street view images.
Recent studies have demonstrated the advantages of using street view image data to gauge the subjective human views of metropolitan areas (Qiu et al. 2022), synthesize location semantics (Zhang et al. 2018;Fang et al. 2021), investigate public health issues (Egli et al. 2019;Kang et al. 2020), and estimate urban crime and safety (Zhang et al. 2020;Luo et al. 2022).However, few studies have focused on the impact of the built environment on traffic crashes using street view images.In this context, street view images can provide new, yet largely unexplored, opportunities to assess the association between the traffic safety and urban built environment at a finegrained street scale.
This study proposes a methodological framework that evaluates the relationship between the streetscape built environment and roadway safety at a refined road-segment level.In particular, the focus of this study was the following two main research questions: 1. How can street view images with street-level built environment features help in traffic crash prediction?How can predictive performance be improved using a modified model? 2. Which characteristics of the built environment are significantly relevant to traffic crash frequency considering that they are related to the traffic safety?
The answers to these questions, as presented in a new insight of urban visual intelligence (Fan et al. 2023;Xu et al. 2022), will contribute to an increased understanding of the role of visual aspects of the city.Moreover, the results of this study enrich location-based geographical information system (GIS) research and provide a fine-grained perspective for transportation planning and safer travel applications.The roadway safety assessment framework proposed in this study can provide empirical evidence for geospatial analytics in the city and new insights for formulating effective planning strategies.

Built environment and traffic safety
Understanding the important factors that influence traffic crashes in urban areas is a key topic in traffic safety and crash prevention.One consensus reached in the current literature is that factors in the built environment may either contribute to or impede traffic safety issues (Aarts and van Schagen 2006;Ewing and Cervero 2010).The built environment can be divided into land-based and street-based characteristics (Merlin et al. 2020).The placement, layout, mixed land use, and buildings' physical design with respect to roadways, are examples of land-based characteristics of a built environment.The layout of streets and surrounding streetscapes, crossings, and street networks are examples of street-based built-environment features.
Traffic-safety outcomes can be measured by the number of traffic crashes or trafficrelated deaths and injuries.Various variables describing the built environment and their effects on crash safety have been widely discussed in the literature.For example, transportation and traffic characteristics (i.e.vehicle miles travelled (VMT), road lengths, intersections, and driving speed) are considerably conceptual and practical indicators because these characteristics are remarkably related to traffic safety (Hadayeghi et al. 2006;Huang et al. 2016;Tasic et al. 2016).The socioeconomic and demographic indicators are equally or more significant measures of exposure for assessing traffic safety.In places with a larger population, traffic crashes are more likely to occur (Kim et al. 2006;Wang et al. 2019).In addition, land-use patterns are key factors in traffic crashes.According to Pulugurtha et al. (2013), traffic crashes are more common and likely in commercial and institutional districts than in residential areas.Saha et al. (2020) argued that land-use patterns influence the behaviors of those living in different urban functional regions, increasing or reducing the number of traffic crashes.

Crash prediction model
Many studies have attempted to predict the number of traffic crashes.The crash prediction model (CPM) is widely used in traffic safety estimation and urban spatial modeling.Given the crash counts and skewness, considerable research in the early stage has demonstrated the advantages of global models, such as multiple variants of the Poisson model (Chiou and Fu 2013) and negative binomial model (Ladron de Guevara et al. 2004), to predict crash frequency.However, the dependent and independent variables are assumed to be consistent over the entire research region in global models; that is, the model parameters do not change for different locations.In the absence of geographical heterogeneity, the fixed linkage hypothesis among variables in global models may be less plausible (De Marsily et al. 2005;Anselin 2010;).
The geographically weighted Poisson regression (GWPR) method expands the conventional global models by allowing local variations in the parameters.This helps academics understand how the connection between variables varies across space, making the results easier to interpret (Fotheringham et al. 2017).In this context, numerous indepth studies have been conducted to predict the number of traffic crashes using the GWPR model (Hadayeghi et al. 2010;Li et al. 2013;Al-Hasani et al. 2021;Soroori et al. 2021).However, none of the abovementioned methods are appropriate for addressing spatial nonstationary and zero-inflated problems simultaneously (Obelheiro et al. 2020); thus, a novel prediction strategy should be developed.

Studies associating traffic safety with streetscape built environments
Discerning the association between the built environment and roadway safety is one of the longstanding questions in urban studies and planning.Investments in traffic safety infrastructure may be a particularly cost-effective way to reduce traffic-related morbidity and mortality and improve population health.For example, many major cities, such as New York City, have developed high-profile plans to install traffic signals and reengineer dozens of roads and intersections.Recent research suggests that improving lighting, adding speed bumps, or maintaining pavement markings in road and streetscape built environments can significantly enhance pedestrian safety (Tester et al. 2004;Retting et al. 2003); however, these findings have not been extensively duplicated (Mooney et al. 2016).
Previous studies have evaluated traffic crashes at the neighborhood-level through environmental auditing using field investigations (Hanson et al. 2013;Mooney et al. 2016;Cai et al. 2022).This approach involves defining various indicators to provide qualitative explanations for crash frequencies and rates.Obelheiro et al. (2020) investigated the relationship between road network features and the number of crashes in newly developed traffic-safety zones.They examined the density of intersection types, traffic signals, and the proportion of different road types as indicators of road network and infrastructure design.However, traditional environmental auditing methods struggle to effectively quantify traffic-related variables with spatial heterogeneity because of their high costs in terms of both time and human resources.
Recently, multidisciplinary works have focused on representing the built environment and physical space formally, highlighting the link between urban spatial patterns and human movement (Luo and MacEachren 2014;Ibrahim et al. 2020).Traditional research using surveys, such as interviews and questionnaires, has been challenging to conduct in large urban areas (Luo et al. 2019;Martin and Schuurman 2020).However, street view images have shown great potential for large-scale and automatic urban built environment perceptions (Middel et al. 2019;Qiao and Yuan 2021).For example, street view images were utilized by Qiu et al. (2022) to predict housing prices by extracting 30 streetscape elements.Additionally, Fang et al. (2021) explored the potential for mapping land-use and complex geospatial relationships across street view images and land parcels.Moreover, Egli et al. (2019) used Google Maps to explore the link between unhealthy food and beverage advertising on children's obesity.Considering that analyzing traffic-related built environmental characteristics is significant in eliminating traffic accidents and related injuries, street view images have both practical and conceptual implications.Despite the significant progress, utilizing street view images in evaluating traffic-related risks remains inadequate, and their potential remains untapped.

Critical analysis of existing studies
Through an extensive review of the existing literature, three critical issues can be identified that must be adequately addressed.First, traffic-safety policy and research have primarily focused on infrastructure, vehicle safety standards, and road user behavior; built environment factors derived from street view imagery have not been fully explored.Second, existing studies have applied zip code-level geographic units or grids of square kilometers to predict traffic safety.However, such research has revealed significant variation in traffic crash rates from street-to-street, even within the same neighborhoods (Castro and De Santos-Berbel 2015;Rui et al. 2016).This variability emphasizes the importance of analyzing crash rates at the street level, particularly when designing safety-based routing systems for users.Third, spatial non-stationarity and zero-inflated problems have not been given simultaneous consideration in crash prediction modeling.

Study area
The study area was the Kowloon Peninsula in the Hong Kong Special Administrate Region (HKSAR).The Kowloon Peninsula (47 km 2 ) comprises five administrative districts: Kowloon City, Kwun Tong, Sham Shui Po, Wong Tai Sin, and Yau Tsim Mong (Figure 1).The east and west of the Kowloon Peninsula are densely populated industrial areas; the north is a residential area, and the south is a prominent commercial area.According to the 2021 Hong Kong Census (Census and Statistics Department 2021), the total population of the Kowloon Peninsula accounts for 30.1% of Hong Kong's population.Moreover, this study area is filled with various streets with distinct visual appearances and is one of the areas where traffic crashes are common in Hong Kong (Hong Kong Police Force 2021).

Data collection
The primary datasets for this study were traffic crash count and street view image data.Traffic crash count data were derived from the official records of the Transport Department of the HKSAR in 2020.In the original data, there were 13,458 crashes from 2017 to 2019, including 7,192 vehicle collisions with vehicle (VCV), 2,845 vehicle collisions with pedestrian (VCP), and 3,421 single-vehicle collision (SVC) crashes.The records contain the fundamental spatial information (e.g.latitude and longitude) of the accident and additional attribute information, such as the date, time, and severity of the accident.Table 1 lists the statistical summary of the original traffic crash count data.Statistics indicate a marginal change in the total number of traffic crashes from 2017 to 2019.The most frequent traffic crash type was VCV crashes, followed by SVC and VCP crashes.Accordingly, the frequencies of VCV, VCP, and SVC crashes were used as the dependent variables separately to investigate the different impacts on traffic crashes.Figure 2 shows a comparison of the road segments with zero and nonzero crashes.
Street view images were obtained from Tencent Map, one of China's leading map service providers, using its public application programming interface (API).During image collection, locations used to obtain street-level images were sampled along with the road network at 100-m intervals.To present the surrounding streetscape, street view images were acquired with different camera headings (e.g.0 � , 90 � , 180 � , and 270 � ) at each sampling location.Figure 1 shows an example of street-level images captured from four views at a sampled location.Distinct streetscape elements, including trees, buildings, concrete roads, and sky, are presented in different views of the images.Approximately 29,900 street view images were collected at 7,482 sampling locations.
Supporting data, including urban road network and points-of-interest (POIs) data, were obtained from the OpenStreetMap (OSM) in July 2019.The road network in OSM  primarily covers primary, secondary, residential, and other driven roads in the study area.To simplify the experimental analysis, single-direction roads were retained, roads with erroneous information were disregarded, and the centerlines of bidirectional roads were extracted to replace complex bidirectional roads.After preprocessing, 2,004 edges of the road network were extracted from the study area.In addition, 3,459 POIs records were collected as auxiliary information to reflect land-use characteristics.Fundamental information, such as name, latitude, longitude, and category, were included in the original POIs records.

Extracting streetscape elements using semantic image segmentation
Semantic image segmentation was performed to acquire streetscape semantic objects/elements from a street view image.Semantic image segmentation assigns a class to each pixel of a street view image (e.g.pixel-wise prediction) characterizing the representation (e.g.streetscape elements) (Liu et al. 2019).The pyramid scene parsing network (PSPNet) (Zhao et al. 2017), a state-of-the-art deep neural network with outstanding performance in terms of segmentation accuracy, was used in this study.
The PSPNet uses a pyramid parsing module that exploits the global context information through context aggregation based on different regions.Owing to its powerful ability in image context aggregation and reliable prediction accuracy, PSPNet has been frequently applied to automatically extract semantic information using street view images in multiple fields (Zhang et al. 2018;Helbich et al. 2019;Xia et al. 2021).The original PSPNet model was trained using the ADE20K dataset and achieved a 79.7% pixel-wise prediction accuracy when classifying 150 semantic object classes (Zhou et al. 2019).In that study, the original PSPNet framework was introduced and fine-tuned using 1,000 randomly selected street view images in Hong Kong with a prediction accuracy of 78%.To simplify the experimental analysis in this study, the top nine semantic object classes that are closely related to streetscape elements were considered, including roads, buildings, trees, skies, sidewalks, cars, walls, fences, and plants.Figure 3 shows the process of extracting streetscape elements from street view images using the PSPNet model.

Creating traffic crash-related variables
In this study, traffic crash-related variables, including dependent and explanatory variables, were created at the road-segment level.The centerlines of the roads were extracted first, and a simplified road network was obtained from the study area.The geographical buffer of each road was then created to spatially join the traffic crashrelated variables; the buffer size was 100 m.The details of the variables are described in the following sections.

Dependent variables: weighted number of traffic crashes
The number of traffic crashes was used as the dependent variable.The crash count inside the buffer area of one road segment was used as a proxy variable for the intensity of traffic crashes.Considering the impact of road length (typically, longer roads are associated with more traffic crashes), the road length was used to normalize the traffic crash counts.Three traffic crash types, VCV, VCP, and SVC, were used for separate modeling.The normalized traffic crash counts C i was defined as where N i denotes the original traffic crash counts at road segment i and L i denotes the road length (km).

Explanatory variables: streetscape element ratio
Street-level built-environment variables, which are streetscape element indicators, were used as the primary explanatory variables.After semantic image segmentation of street view images, the cover ratios of the streetscape elements shown in the semantic images were obtained, as shown in Figure 3(c).These cover ratios of various streetscape elements were used as proxy variables for streetscape element indicators.The average percentage of street view images of sampling locations within the buffer area of one road was used as the streetscape element indicator of that street.The indicator of streetscape element j was defined as where k is the number of images within the buffer area of road segment i, v is a different view (e.g.front, back, left, and right), N denotes the pixel number of one street view image (the pixel size of the obtained semantic image was 512 � 512, such that N is fixed as 512 � 512 ¼ 262, 144), and N j denotes the pixel number of element j:

Explanatory variables: mixed land-use variables
In addition, some traffic crash-related socioeconomic variables were created using the POIs data.Recent studies have confirmed that land-use design configurations are significantly correlated with traffic crashes (Ewing and Cervero 2010).Therefore, some proxy variables related to mixed land use were created as auxiliary factors.A hypothesis of this study was that the degree of mixing of urban functions reflects the regional urban functional structure.These indices are used to evaluate mixed land use in the POIs context from different aspects and can effectively verify the impacts of land-use variables on traffic crashes.The details of mixed land-use variables can be found in Supplementary Appendix A1.

Modeling traffic crashes
Traffic crash count data frequently contain too many zeros because there are few crashes in many road segments (Figure 2).In statistical terms, the observed number of zeros surpasses what a Poisson or a negative binomial distribution would predict, classifying the data as zero-inflated.This issue, known as the zero-inflation phenomenon, results in significant traffic crash distribution biases in space.Therefore, it should be specifically modeled to correct the regression relationships between geographical variables and traffic crash counts (Huang and Chin 2010;Lukusa and Phoa 2020).In a model that ignores this zero-inflation problem, using crash count data with 'many zeros' can lead to inaccurate and invalid parameter estimations.In addition, considering that the effect of environmental factors varies with location, many spatial modeling methods provide significantly improved estimates for predicting the number of traffic crashes (Lord and Mannering 2010).Thus, in this study, a transfer learning strategy was used, specifically the classification-then-regression strategy, to address the zero-inflated issue and spatial heterogeneity separately.First, the number of traffic crashes was discretized into two bins: 1 with a traffic crash on a road segment (regardless of the number of times it happened) and 0 without a traffic crash on a road segment (that has never happened).In addition, a multilayer perceptron (MLP) neural network model was trained to evaluate the probability of traffic crashes in a discriminative classification manner.Second, the embedding features were extracted from the hidden layer of the MLP model, an embedding representation with a priori crash classification knowledge, as the new features to feed in a GWPR model to estimate the spatial heterogeneity based on the traffic crash counts.
The MLP neural network model (Murtagh 1991) is a widely used feed-forward artificial neural network in multiple application domains of urban studies (Delashmit and Manry 2005).The MLP has been proven to be a universal approximation algorithm mathematically; therefore, it can learn a priori knowledge from the original features.In this study, a 'vanilla' MLP neural network model (Figure 4) was pretrained to investigate the prior knowledge in traffic crash count data with 'many zeros' using a binary classification between the probability of traffic crashes and traffic-related explanatory variables.The model includes an input, hidden, and output layer.Subsequently, the weights (the embedding representation) of the hidden layer in the MLP were extracted to fit the number of traffic crashes in the regression analysis.
To explore the spatial heterogeneity in the association between traffic crashes and independent variables, a GWPR model was introduced to explore these spatially varying distributions of traffic crashes.The GWPR model allows parameter values to change in response to spatial units u i , explanatory variables that describe road segment i, to predict the number of traffic crashes.The model specification can be written as where lnðYÞ denotes the natural logarithm of the expected crash counts for each road segment.X k is the kth explanatory variable (k ¼ 1, 2, 3, . . ., K), b k refers to a function of the spatial units u i ¼ ðx i , y i Þ denoting the two-dimensional coordinates of road segment i in space.The GWPR model considers spatial heterogeneity by the estimated parameters b ¼ ðb 0 , b 1 , b 2 , . . ., b K Þ varying between road segments.The GWPR model parameters are estimated using the local maximum likelihood principle.Two common evaluation metrics were introduced to assess the performance of the GWPR model: the Akaike information criterion (AIC) (Sakamoto et al. 1986) and the Akaike information criterion with bias correction (AICc) (Pirdavani et al. 2014).Note that a model with minimum AIC and AICc values has a high goodness of fit, indicating better modeling performance.

Descriptive spatial analysis
Table 2 lists the final set of variables in this study and their summary statistics.Figure 5 shows the spatial distributions of the traffic crashes.The figure shows distinct spatial distribution patterns among different types of traffic crashes.For example, most VCP crashes occur in the west of the Kowloon Peninsula (the downtown areas of the Sham Shui Po and Yau Tsim Mong districts).Most SVC crashes are highly concentrated in the eastern Kowloon Peninsula (Wong Tai Sin and Kwun Tong districts).The Kowloon City district has fewer traffic crashes.In addition, the spatial autocorrelation results shown in Table 3 indicate that traffic crashes in the study area have a significant spatial cluster pattern.Accordingly, in this study, prior work was leveraged (Zhang et al. 2015;Su et al. 2022) and a geographically weighted model was used to further investigate the spatial impacts on traffic crashes.
The top streetscape elements that accounted for the largest proportion were buildings, roads, and skies, with mean values of 36.23%,27.15%, and 11.36%, respectively (Table 2).Figure 6 shows the spatial distributions of the proportions of the nine types of streetscape elements: roads, buildings, trees, skies, sidewalks, cars, walls, fences, and         plants.The spatial distribution patterns of these streetscape elements differed from one another.For example, the downtown areas of Sham Shui Po, Yau Tsim Mong, and Kowloon City districts have high proportions of building elements from a streetscape view but low proportions of tree and sky elements, indicating ambiguous connections with traffic crashes.

Modeling performance
In this study, an MLP neural network model was used to predict the probability of traffic crashes.Inspired by previous exploratory studies, a stochastic gradient descent (SGD) optimizer was used in the MLP model.The maximum number of training epochs was set to 100 to sufficiently fit the MLP model.The number of neurons in the hidden layer was set to 12 to maintain consistency with the number of original features.The learning rate was set to 0.01, and the other parameters were set to default values.The MLP model was compared with common machine learning models, including linear regression (LR) and random forest (RF) classification (Jordan and Mitchell 2015).
LR serves as a useful baseline model for comparing the performance of more complex models.RF can capture nonlinear relationships between the dependent variable and the independent variables.Binary classification accuracy was used to compare and evaluate the modeling performance.Table 4 summarizes the results.The MLP model outperformed the other machine-learning models.The prediction accuracies of the SVC, VCV, and VCP crashes of the MLP model were significantly higher than those of the other two models.Therefore, the MLP model fits the probability of traffic crashes, providing a strong proof for the regression analysis.
A comparison was performed to understand the performance and effectiveness of integrating the embedding representation and GWPR model.The three panels are as follows: 1. GLM: Generalized linear model.Using the GLM as a baseline model allows for fair and consistent comparisons with more advanced or complex models.The GLM  was used to demonstrate the advantage of the GWPR model in capturing spatial heterogeneity.All parameters in the GLM were assumed to be fixed.2. NBM: Negative binomial regression model.The NBM is often used as a baseline model in statistical analysis, particularly when processing count or overdispersed data.The NBM is specifically designed for count data, where the response variable represents the number of occurrences of an event (traffic crash in this study) within a given unit of observation.The NBM accounts for the overdispersion in count data, resulting in improved model fit compared to simpler models like Poisson regression.All parameters in the NBM were assumed to be fixed.3. GWPR o : GWPR model with original feature input.The original features include the street-level built environment and land-use variables listed in Table 2. 4. GWPR e : GWPR model with embedding feature inputs.Embedding features were extracted from the hidden layer of the aforementioned MLP model for the probability prediction of traffic crashes.
Table 5 presents the comparative results of the model performances in fitting the traffic crash counts.The following were observed according to the results: (1) The GWPR model outperformed the GLM and NBM in fitting the number of traffic crashes.This result is similar to the findings of Li et al. (2013), who showed that the GWPR is effective for capturing the spatially nonstationary relationships and has a better performance than the GLM and NBM in predicting fatal crashes in individual counties in California; (2) The GWPR model with embedding feature input obtained better fitness in both AIC and AICc in crashes of all types, decreasing approximately 10.7%-18.7% in performance metrics.This result indicates that the underlying a priori knowledge extracted from the pretrained MLP model is highly correlated with the number of traffic crashes.
The designed ablation experiments to fit traffic crash counts using the GWPR e panel are shown in Table 6, where every two adjacent experiments from a set of control experiments.For example, using traffic variables and proximity characteristics (Exp.2) has a better result than using traffic variables only (Exp.1), because the AICc metrics are 4.7% lower.By incorporating land-use indices to extract the land-use design configurations (Exp.3), the value of AICc can be further decreased from 22512.183 to 19524.655.In addition, the contributions of built-environment characteristics from streetscape element ratios on traffic crash counts were tested in Exp.4; the AICc metrics improved by 12.7%, decreasing from 19524.655 to 17040.876.The ablation experiment results show that all strategies used in the proposed framework contribute to improve the performance if fitting traffic crash counts and street-level built environment variables extracted from street view image data make the most remarkable contributions.

Effects of streetscape element variables
Figure 7 shows the global effects of the original features on different traffic crashes using the GWPR model.Figure 8 shows some samples of street view images that are positive and negative for traffic crashes within the study area.
A comparison between the sizes of the red and blue bars reveals the magnitude of the impacts of streetscape element ratio variables on different types of traffic crashes.Cars were found to be the most influential streetscape elements of the impact on all types of crashes.A high proportion of car density is associated with an increased number of traffic crashes.By contrast, a high proportion of urbanized elements, such as buildings, walls, fence, and sidewalks, is associated with a reduced frequency of traffic crashes.The building is the most influential streetscape element of the impact on VCP crashes.These results indicate variations in streetscape environments between urban and rural areas.In noncentral areas of the city where roads are typically characterized by wide road surfaces and abundant greenery on both sides (e.g.highways), more SVC and VCV crashes are expected to occur owing to fatigue or high-speed driving.However, fewer VCP crashes occurred in these areas because there were fewer residents around the roads in rural areas.
VCV crashes are more likely to occur on roads with a higher proportion of road surfaces and cars.It is well known that wide road surfaces with many vehicles are associated with more vehicle-to-vehicle traffic crashes.The high proportions of buildings,  trees, and sky positively contribute to reducing VCP, SVC, and VCV crashes.The proportion of buildings has a larger impact on SVC than on VCV crashes.Furthermore, an increase in the frequency of VCP crashes has been observed where the proportion of buildings is high.We believe that the proportion of buildings reflects the heights and densities of building groups, implying the built level of urbanization, rather than the effects of the buildings themselves.Therefore, the proportion of buildings with the density of surrounding buildings can be a proxy variable capturing the popularity of residential movement, traffic flows around a street, and diversity in built environment design and influencing traffic safety.A reduced probability of traffic crashes is expected and observed at high proportions of trees and sky in terms of VCP crashes.These results reveal a visual description of the road segments in which traffic crashes occur, indicating an underlying relationship between streetscape-level built-environment characteristics and various traffic crashes.
In addition, analyzing the joint contribution of the streetscape element variables in combination with other characteristics, we can draw many interesting conclusions.The results show that the proximity to special-purpose buildings (such as schools, stations, and stores) in central areas with dense building stock is likely to exacerbate the risk of pedestrian-related traffic crashes.Meanwhile, we found that in addition to the proportion of the road, the density of the road had a significant effect on all types of crashes, and the distribution of the proportion of different surrounding road types had a differential effect on the traffic crashes.Otherwise, the result similar to the findings of Asadi et al. (2022) who showed that the land-use variables had a biased correlation with different traffic crashes.
The streetscape element ratio variables can reflect the built environment characteristics at the road-segment level.The combination of several streetscape element variables is a significant crash predictor.For example, most sidewalks are found in metropolitan areas with both residential and commercial activities (Figure 6), and the amount of human movement (pedestrians) has been shown to be closely associated with sidewalks.The proportions of sidewalks may correctly determine the volume of outdoor populations.Therefore, these proportions are significantly correlated with pedestrian-related crashes.In addition, the visual enclosures of outdoor spaces (e.g.green and blue spaces) can be described by combining the proportions of skies, trees, and plants (Yin and Wang 2016;Gong et al. 2018).Visual enclosures have a slight impact on traffic crashes by indirectly influencing driving psychology and behavior (Helbich et al. 2019).

Discussion and conclusions
The aim of this study was to investigate the relationship between traffic crashes and streetscape characteristics.Previous studies were reviewed, including studies on trafficrelated built-environment factors, crash prediction models, and recent applications of street view images.Accordingly, this study proposes a novel method to spatially model traffic crash counts at the road-segment level.First, a deep-learning method was used to extract streetscape element ratios from street view images.Subsequently, a series of traffic crash-related variables was created.Considering both zero-inflated and spatial heterogeneity problems, an MLP neural network model was pretrained to evaluate the probability of traffic crashes.Furthermore, the prior crash classification knowledge was extracted from the MLP.A priori knowledge was used to fit a GWPR model.The results indicate that the proposed method outperformed the baseline and comparison methods.Finally, the spatial effects of streetscape element variables on different traffic crash types were analyzed.The contributions of this study are as follows: 1. We employed semantic image segmentation with street view imagery to detect the relationship between street-level urban-built environments and roadway safety, in a scalable and efficient manner.2. We extended previous literature by identifying the various types of crashes arising at the street level.3. The proposed approach examines the association between various crash types, street-level built environment variables, traffic variables, land-use indices, and proximity characteristics while considering zero-inflation and spatial heterogeneity.
To the best of our knowledge, this is the first attempt to address these factors comprehensively.
Additionally, we explored the potential reasons behind the superior performance of our proposed method.In the context of the task at hand, which involves predicting the number of traffic crashes, it is typically treated as a regression problem.However, regression tasks pose challenges due to the complexity, uncertainty, and noise present in the data, making them difficult to train using traditional machine-learning or statistical-analysis models.To address this issue, we employed a classification-then-regression strategy.The classification step enhances accuracy and reduces the complexity of the classification process, thereby extracting valuable information from the explanatory variables.The subsequent regression step focuses on modeling the spatial heterogeneity and zero-inflated issue within the traffic crash data, thereby improving the performance of the regression model.
The GWPR model is a spatial regression technique that extends the traditional Poisson regression by incorporating spatial heterogeneity.Although GWPR offers several advantages, it also has certain limitations that warrant consideration.On the one hand, GWPR can be computationally intensive and time consuming, especially when confronted with large datasets or a high number of spatial units.Estimating a separate regression equation for each location can become impractical or infeasible when working with extensive spatial domains or limited computational resources.On the other hand, GWPR does not provide an automatic model selection criterion for determining the most appropriate set of predictors or spatial weights.Researchers must rely on their judgment and prior knowledge to decide which variables to include and how to specify the spatial weights.This subjectivity introduces the potential for bias or suboptimal model specifications if not carefully addressed.
The knowledge acquired from this study can be used to create spatially responsive tactics to reduce traffic accidents and injuries.The recognition of the association of the streetscape-built environment with traffic crashes can help formulate effective planning for resilient human settlements.Meanwhile, this study enriches place-based GIS research because it demonstrates how those geographical datasets can be leveraged to gain insights into traffic-safety assessments.Our analysis reveals insights about the spatial distribution of traffic crash counts and their relationship to built-environment variables.These findings have implications for transportation planning, which is an area of interest for GIS researchers.
Future studies are needed to investigate joint analysis of a combination of features from multiple fields supported by multimodal data on traffic crashes in an extended study area.Owing to the data limitations of multitemporal street view images, this research relied on spatial modeling, rather than spatiotemporal modeling.Nonetheless, this research paves the way for modeling traffic crashes at the road-segment level using street view images.

Figure 1 .
Figure 1.Study area -Kowloon Peninsula.The top left picture shows the location of Kowloon Peninsula in the HKSAR.The right pictures are examples of street view images at the sampling point.

Figure 2 .
Figure 2. Comparison of zero-value road segments and non-zero-value road segments.The y-axis represents the percentage of counts of different road segments.

Figure 3 .
Figure 3. Extracting streetscape elements using semantic image segmentation: (a) PSPNet architecture (revised from Zhao et al. 2017), (b) workflow of extracting semantic objects using street view images, and (c) real-life examples of street view images in Hong Kong (derived from API of Tencent Map) and related semantic results of image segmentation.

Figure 4 .
Figure 4. Architecture of the MLP neural network model for binary classification of traffic crashes.
of sky (%), reflecting the blue space and the openness of outdoor spaces. of sidewalk (%), reflecting the volumes of outdoor crowds (pedestrian) of car (%), reflecting the volumes of traffic flows. of wall (%), reflecting the openness of outdoor spaces. of fence (%), reflecting the openness of outdoor spaces. of plant (%), reflecting the thickness of vegetation and green space.

Figure 6 .
Figure 6.Spatial distributions of the streetscape element variables.

Figure 7 .
Figure 7. Mean estimated parameters of the GWPR model with original features.More summary statistics for GWPR parameter estimates can be found in Supplementary Appendix A2.

Figure 8 .
Figure 8.Samples of street view images that are positive and negative for traffic crashes.

Table 1 .
Traffic crash types and their statistics in the study area.
Pilkington and Kinra (2005)pes: single-vehicle collision and vehicle collision with object (SVC), vehicle collision with pedestrian (VCP), and vehicle collision with vehicle (VCV).Detailed descriptions of traffic crash types are available inPilkington and Kinra (2005).

Table 2 .
Descriptive statistics of the dependent and explanatory variables.

Table 3 .
Spatial autocorrelation summary with global Moran's index.

Table 4 .
Summary of binary classification accuracy to predict the probability of traffic crashes.
Note: LR: linear regression classification; RF: random forest classification.Bold scores indicate a relatively fit value in the corresponding column.

Table 5 .
Summary of regression results to fit traffic crash counts.Scores in bold indicate relatively fit value in the corresponding raw.

Table 6 .
The ablation experiment results to fit all types of traffic crash counts using GWPR e panel.