Measuring Physical Disorder in Urban Street Spaces: A Large-Scale Analysis Using Street View Images and Deep Learning

Physical disorder is associated with negative outcomes in economic performance, public health, and social stability, such as the depreciation of property, mental stress, fear, and crime. A limited but growing body of literature considers physical disorder in urban space, especially the topic of identifying physical disorder at a fine scale. There is currently no effective and replicable way of measuring physical disorder at a fine scale for a large area with low cost, however. To fill the gap, this article proposes an approach that takes advantage of the massive volume of street view images as input data for virtual audits and uses a deep learning model to quantitatively measure the physical disorder of urban street spaces. The results of implementing this approach with more than 700,000 streets in Chinese cities—which, to our knowledge, is the first attempt globally to quantify the physical disorder in such large urban areas—validate the effectiveness and efficiency of the approach. Through this large-scale empirical analysis in China, this article makes several theoretical contributions. First, we expand the factors of physical disorder, which were previously neglected in U.S. studies. Second, we find that urban physical disorder presents three typical spatial distributions—scattered, diffused, and linear concentrated patterns—which provide references for revealing the development trends of physical disorder and making spatial interventions. Finally, our regression analysis between physical disorder and street characteristics identified the factors that could affect physical disorder and thus enriched the theoretical underpinnings.

P hysical disorder, or neighborhood physical disorder in urban space, refers to the disturbance in residents' lives and public spaces caused by observable or perceptible visual signs (Skogan 1990). Evidence indicates that physical disorder is an external representation of urban decay and is associated with crimes, such as theft, robbery, and prostitution (Skogan 1990;Ross and Jang 2000). Additionally, the epidemiology community has found that residents who live in environments with higher levels of physical disorder tend to suffer from greater stress and fear, which could trigger unhealthy behaviors (e.g., binge drinking; Keyes et al. 2012) and adverse outcomes (e.g., obesity; Burdette and Hill 2008). Therefore, the physical disorder of urban space has become an interdisciplinary subject at the nexus of urban planning, social science, and public health, attracting attention from both academics and practitioners.
With the expansion of our knowledge of urban streets, the function of streets has been redefined from mobility to livability, and streets have become important components of urban space that support various human activities. Given that urban streets as public spaces exert significant impacts on quality of life, the spatial quality of urban streets has become of great interest. Quantifying the quality of urban space, such as physical disorder, remains a central issue for both researchers and planners, however. Conventional measurements of physical disorder rely heavily on systematic social observations (SSOs) or neighborhood audits (Sampson and Raudenbush 1999), which are expensive, time-consuming, and sometimes dangerous (Grubesic et al. 2018); thus, they are usually limited in both geographic and temporal coverage. In addition, potential bias exists when different auditors observe the same space, as they assess the environment based on what they visually observe, which might not reflect the objective physical environment. Due to the rapid development of online mapping services, street view images (SVIs) that record the urban landscape along streets have become publicly accessible, providing an alternative approach for auditing physical disorder. As a new type of data source, the SVIs not only break through spatial-temporal limitations, but also provide abundant information on urban street landscapes (F. Zhang et al. 2018) and have been used in virtual auditing studies (e.g., Mooney et al. 2016;Jiang et al. 2018). Although evidence is accumulating that virtual auditing with SVIs is an effective alternative in quantifying and understanding our built environment, the studies that use such data and methods in physical disorder identification remain limited. In addition, existing studies that use SVIs for virtual auditing still require extensive manual auditing of each image, which can now be performed much more efficiently by computer vision and image processing. In addition, the existing literature on assessing physical disorder is either based on Western cities or a limited number of sample areas, which also encourages us to initiate a large-scale analysis in the Chinese context.
In the past decade, the advancement of deep learning models has resulted in great progress in image processing. Due to its ability to automatically learn image features and accurately recognize various objects, deep learning could supersede conventional manual audits and provide an opportunity for the large-scale, automatic measurement of physical disorder using SVI data. Aiming to fill the potential gaps in quantifying physical disorder, this article proposes an approach that combines SVIs and deep learning to automatically measure the physical disorder of urban streets, which can manifest as abandoned buildings, vacant lots covered by ruderal and trash, and so on, in the SVIs. The results of implementing this methodology on more than 700,000 streets in 264 cities in China-to the best of our knowledge, the first attempt to use large amounts of SVI data and deep learning algorithms in quantifying largescale, street-level physical disorder for a number of cities-validate the effectiveness and efficiency of the method. In addition to the great methodological innovation, there are several theoretical contributions through the large-scale empirical study in China. First, we enriched the indicator system of physical disorder, which has been mostly ignored in previous studies in a U.S. context. Second, we find that urban physical disorder exhibits three spatial characteristics: monocentric, sectoral, and polycentric patterns. Further, the probability of disorder in street space is positively associated with street length, functional combination, and distance to urban centers, but negatively correlated with the functional density of streets and the density of intersections, which provides opportunities for locating disordered streets, assessing development trends, and making spatial interventions.

Measuring Physical Disorder with Conventional Methods
Physical disorder is recognized as the presence of specific visual items on the street and exerts a negative impact on individuals' perceptions of the comfort and safety of the space (Franzini et al. 2008). For example, in the Project on Human Development in Chicago Neighborhoods, approximately 90 percent of the respondents believed that garbage or broken glass on the street or sidewalks indicated disorder (Sampson and Raudenbush 1999). The presence of window bars on buildings made nearly half of the people in one survey more likely to judge the space as being disordered (Day et al. 2006). Therefore, to achieve a full understanding of the impacts of physical disorder and its mechanisms, it is necessary to conduct a large-scale and comprehensive measurement of these disorder-related visual items.
Traditional methods of measuring physical disorder include telephone interviews, field questionnaires, and SSOs, all of which are both labor-and cost-intensive processes. Moreover, auditors sometimes have to take risks to enter potentially dangerous areas, as physical disorder has been found to be associated with crime (Taylor, Shumaker, and Gottfredson 1985;Perkins, Meeks, and Taylor 1992). Sampson and Raudenbush (1999) carried out an SSO with the aid of video-recording technology, which partially solved the problem of in-person auditing. Such a method, however, can neither be applied to large-scale measurements nor deliver comparable results across time and space (Jones, Pebley, and Sastry 2011).
Due to the availability of massive SVI data and the development of omnidirectional image technology with high resolution, virtual auditing through SVIs has become an effective alternative to environmental auditing due to its wider spatial coverage, higher SVI update frequency, and lower acquisition cost. Physical disorder studies have also switched to virtual auditing methods using SVIs (e.g., Clarke et al. 2010;Mooney et al. 2014;Bader et al. 2015;Quinn et al. 2016;Mayne, Pellissier, and Kershaw 2019). For example, Quinn et al. (2016) visualized the distribution of physical disorder in New York City through virtual audits and found that wide swaths of disorder were concentrated in most of the Bronx and the northernmost part of the borough, away from the city center. Other than SVIs from online map platforms, scholars also tried to collect primary SVIs at the human scale. Using wearable cameras and virtual audits, Z. Zhang et al. (2021) and Li et al. (2022) assessed small-scale, individual exposure to urban greenery and neighborhood physical disorder, respectively. Such studies using SVIs and virtual audits, however, depend on manual audits of the images, and limitations in terms of geographic and temporal constraints remain significant.

Auditing Street View Images with Deep Learning Models
The development of high-performance computing systems and the availability of large-scale annotated data sets offers a new opportunity for the large-scale, automated processing of SVIs using deep learning algorithms (F. Zhang et al. 2018). Convolutional neural networks, as classic deep neural networks, can be trained for feature extraction of an SVI and then effectively identify elements in the street space, including but not limited to sidewalks, vehicles, buildings, and green plants (Shen et al. 2018;Tang and Long 2019). One such network, Cambridge University's SegNet, uses a method that enables a deep learning framework to conduct semantic segmentation of SVIs, resulting in high-quality segmentation (Kendall, Badrinarayanan, and Cipolla 2015). In addition, Suel et al. (2019) applied deep learning and street view imagery to reveal through an audit of SVIs that London's highest income areas are located in the city center and the southwest, where the scores of spatial quality are the highest. Due to the availability of the massive volume of SVI data and the advances in deep learning algorithms, using deep learning to extract considerable potential information from SVIs has also been employed in the evaluation of street greenery (Helbich et al. 2019;Ye et al. 2019;Z. Zhang et al. 2021), public health Nguyen et al. 2020), and street safety (Naik et al. 2014), all of which laid methodological foundations for our study on physical disorder assessment using SVI and deep learning. The lack of large-scale analysis on physical disorder is an additional reason we carried out this study.

Method Study Area
For Chinese cities, the administrative boundary is much larger than their spatial region, leading the cities to include both urban areas and rural areas. Thus, this research adopted Ma and Long's (2020) spatial city data, which use communities as basic administrative units and the data of urban built-up areas to identify urbanized areas. Superimposing the available street view imagery over the spatial city boundary, we finally used the data set of 264 prefecture-level or above cities 1 in China (see Figure 1), which includes 769,407 streets in total and covers most urban areas of China with the latest SVIs.

Data
The SVIs in 264 Chinese cities are the core data set in this study. We used the 2014 roads and streets of China obtained from Amap (https://mobile.amap. com/), a local road-navigation firm based in Beijing, for crawling SVIs. We clipped the data set with the study area and derived the street segments within the study area and also removed highways and bridges to avoid the street-level images captured in them, resulting in 769,407 streets with a total length of 155,063 km. Then the Tencent Maps application programming interface (API) was used to query and download SVIs (see https://lbs.qq.com/panostatic_v1/ guide-getImage.html, accessed June 27, 2022). The sampling points on the streets were first generated on the map at an interval of 100 m (50 m in Beijing). Each point has latitude, longitude, and other parameters to facilitate the crawling processes. Furthermore, by concatenating the parameters, including size (1,280 Ã 720), pitch (0), heading (0, 90, 180, and 270 for four directional SVIs), and the request key, we could obtain a URL to open the street view Web site pages. Afterward, with the support of Python, we downloaded the SVIs in four directions-north, west, east, and south-for each sampling point. Finally, a total of 4,876,952 SVIs in 2015 were collected, representing 1,219,238 sampling points on all the streets within 264 Chinese cities.

Defining and Identifying Physical Disorder in Chinese Cities
The first step is to create an indicator system of physical disorder for Chinese cities. First, in accordance with the previous studies (Sampson and Raudenbush 1999;Day et al. 2006;Quinn et al. 2016), a checklist containing a comprehensive set of physical disorder factors was constructed. Second, seven auditors with professional backgrounds either in architecture or urban planning conducted a few field surveys in random streets in Beijing, and also performed a virtual audit with 1,000 randomly selected SVIs (from all 4,876,952 SVIs in 264 cites) to compare what they found with the indicators in the checklist. Factors that were not found in both the SVIs and the field surveys, such as empty bottles and cigarette butts, were removed from the checklist, and some new factors that are commonly seen in the sample SVIs or field surveys, which are also manifestations of the physical disorder according to the definition, such as illegal or temporary buildings, street vendors, and unpaved roads, were added to the list. Finally, a refined list with five categories and fifteen detailed factors was created to represent the physical disorder indicators in urban streets (see Figure 2). For each factor, a sample image is provided to demonstrate how the factor manifested and can be identified in the SVIs.

Virtual Auditing of the Training Set
The second step is to generate a training set for the deep learning model. To minimize the statistical error and potential bias caused by cognitive differences between auditors, auditors were trained to do experimental audits on 5,000 sample images (randomly selected from the 4,876,952 SVIs in 264 Chinese cities) after receiving formal research and training materials, and interrater reliability (IRR) was calculated using the Kappa index (Vanwolleghem et al. 2016) until it was above 80 percent (Landis and Koch 1977) before proceeding to the formal scoring, which was performed with 200,000 randomly selected SVIs from all the SVIs to identify the physical disorder. Based on a predefined auditing guideline for each physical disorder factor, the auditors provided a general judgment on whether any factor of physical disorder existed in each SVI. If one physical disorder factor was identified, the image was assigned a score of 1; otherwise, it was given a score of 0. Only images labeled 1 for 15 categories of physical disorder were selected for training. In Table 1, the second column lists the number of training images for each category. All images labeled 1 are included in the training. The distribution of each category is shown as a percentage, which means the number of images labeled 1 out of a total number of 200,000 labeled images (see Table 1).

Identifying Physical Disorder Using a Deep Learning Model
Because this study deals with a huge amount of data, a lightweight and efficient machine learning algorithm is needed to make fast predictions. MobileNet is our first choice due to the advantages of fewer parameters, less computation, and shorter inference time. The structure of MobileNet V3-small is listed in Table 2 (Howard et al. 2019). In the model, the width multiplier, input resolution, batch size, and learning rate are set to 1, 224, 16, and 0.0001, respectively. Using the trained model, we obtain a classification result for each image, indicating the probability of the existence of a certain physical disorder factor.

Identifying Multiscale Spatial Patterns of Physical Disorder
The deep learning results of physical disorder are aggregated at three levels: sampling points, streets, and cities. Specifically, the physical disorder probability of each sampling point is the arithmetic average of disorder probability of four-directional SVIs (see Equation 1; the overall disorder probability of each image is the arithmetic average of the disorder probability of fifteen factors). The disorder probability of one street is the arithmetic average of the disorder probability of sampling points contained in the street (Equation 2). The disorder probability of one city is the arithmetic average of the disorder probability of sampling points contained within the central urban area of the city (Equation 3).   where D point_i is the disorder probability of sampling point i; p j is the disorder probability of the SVI in direction j, j ¼ 1, 2, 3, 4, referring to the north, south, west, and east directions, respectively; D street_k is the disorder probability of street k; n is the number of sampling points in street k; D city_l is the disorder probability of city l; and m is the number of sampling points in city l.
We use the tenfold cross-validation method by randomly dividing the data set into ten mutually exclusive subsets of similar size, using 90 percent as the training set each time and the remaining 10percent as the testing set. During the training, crossentropy is used for classification, and after training, the accuracy rate is used for validation and testing. The accuracy is calculated by dividing the number of correct items by the number of all items. Crossentropy is formulated as follows, where C represents the number of factors, p i refers to the ground truth, and q i refers to the prediction.
After knowing the predicted results of physical disorder in Chinese cities, we further explored the distribution and genetic mechanism of physical disorder. Regression analysis is also conducted using the level of physical disorder as the dependent variable and some selected street attributes as the independent variables. The regression analysis reveals the associations between physical disorder in urban street spaces and street length, function density, function mix, junction density, and distance to the city center, which further helps us to find clues regarding how physical disorder appeared and how to renovate the street space, which is expressed by Equation 5: where Yi is the dependent variable, the disorder probability of street i; Xi is the influencing factor; b 0 is the constant and b i is the coefficient (weight) of the corresponding influencing factor; and e is the random error term.
In addition, spatial analysis tools in ArcGIS, such as the spatial join, hot spot analysis, and natural breaks methods, are also used to conduct the data analysis and then identify the patterns of disordered street space in Chinese cities. Table 1 shows the performance of the MobileNet V3 model for each factor on the sample data set, with an average accuracy of 72.77 percent for the overall task. The accuracy for abandoned buildings and damaged public interface was slightly higher, at 84.1 percent. The accuracy of illegal or temporary buildings and buildings with damaged structure in the model was relatively low, at below 60 percent. The model performance can be further enhanced, however, by collecting more labeled data.

Identifying Features of Physical Disorder at the Street Level
The results (the data set supporting the conclusions is available in the Acknowledgment) of the deep learning show that physical disorder is quite common in Chinese cities, with more than half of the streets (74.2 percent) having at least one physical disorder factor (score > 0), but with a low overall disorder value of 0.213 on average, implying the presence of about three disorder factors. Furthermore, the probability distribution of street factors follows an exponential distribution, with most streets either having no factors or a low probability of physical disorder. In contrast, a few streets have a high probability of physical disorder factors (see Figure 3). Two typical images that represent a mean and high level of probability are illustrated in Figure 4. Figure 5 offers a closer look at the deep learning results and compares the factor rankings with the preceding similar studies. As shown in Figure 5, one significant distinction is that some factors that were crucial in the preceding studies, which were mostly focused on U.S. cities, are relatively trivial in our study (e.g., housing vacancies and abandoned buildings). In contrast, the newly added factors that reflect the unique urban landscape with poor space quality had a higher probability (e.g., buildings with damaged facade and broken roads). For all fifteen  factors, buildings with damaged facades, broken roads, graffiti or illegal advertisement, and broken infrastructure are the main manifestations of street physical disorder in Chinese cities, with relatively higher disorder probabilities of 0. 159, 0.134, 0.129, and 0.113, respectively. After finding the predicted results of street physical disorder in 264 Chinese cities, we found it a great opportunity to further explore the influencing factors of physical disorder, because we have such a large sample size, which also represents the whole Chinese urban system. In accordance with the literature, we selected five variables that represent street space attributes, including street length, function density, function mix, junction density, and distance to the city center, and then used a linear regression model to analyze the association between street variables and the streets' physical disorder score (Table 3).
The regression results show that the disorder probability of street space has positive correlations with the street length, function mix, and distance to the city center and negative correlations with the street function density and junction density. In other words, our results indicate that longer and more diverse streets, or those closer to the city center, might contain more disorder factors, thus leading to a greater probability of disorder occurrence. Streets with more points of interest and intersections are found to have a lower level of physical disorder. By calculating the average score for each street segment, we developed scoring maps for the physical disorder at the city level ( Figure 6). The study labeled the top 5  percent of streets with the highest probability of physical disorder among all sampled streets across the country. These spaces with a higher probability of physical disorder are evenly and widely distributed across the city without apparent spatial aggregation. Consistent with our findings, however, these streets are generally distributed in areas far from the city center. In addition, compared to urban centers, due to low housing prices, large numbers of urban villages and old residential areas are located in regions far from downtown, where residents are less concerned about spatial quality.

City Level: Physical Disorder Maps of Street Space in China
We used Jenks natural breaks to divide the disorder probability of cities into five categories and then labeled the cities with disorder probabilities higher than 0.33 as the most disordered category (see Figure 6, and full city list in Appendix). The most disordered cities are scattered across the country, such as in the northern region of China, and exist in central cities as well as southern cities. Cities with better urban environments (i.e., lower physical disorder scores) are located in western regions and eastern China, which are scenic landscapes, or the highly developed coastal Yangtze River Delta region.
Regarding the spatial distribution of physical disorder at the city scale, although the most urban physical disorder is relatively scattered within the city, the physical disorder hot spots reveal three typical patterns of distribution in most cities: (a) scattered (109 out of 264), (b) diffused along the urban expansion direction (89 out of 264), and (c) linear concentrated along arterial roads (42 out of 264). Twenty-four cities combined Characteristics b and c (see Figure 7). a. For these scattered cities, some are lowadministrative-level ones with a small overall spatial scale and relatively simple road network, and some are tourist cities, leading to a better overall street environment due to better construction and maintenance (e.g., Leshan). b. For most medium-sized cities, which are facing rapid urban expansion with limited urban resources, it leads to urban expansion in the peripheral areas, mostly urban-rural junctions or urban suburbs to be developed, showing a diffused distribution of physical disorder, with more nonurban landscapes such as farmland and countryside (e.g., Taiyuan). c. The third pattern refers to cities where the overall physical disorder is moderate, and places with a high probability of physical disorder are linearly distributed along the arterial roads, especially the common but poorly maintained wide roads, expressways, and so on. In addition, some provincial capital cities or large cities also show the characteristics of linear concentration along the arterial roads and polycentric spatial type, which are consistent with the existing multicentered urban structure (e.g., Nanjing).
Even with the existence of the aforementioned spatial distribution patterns, however, the dominant factor contributing to the spatial quality of each city varies, which deserves further research.

Concluding Remarks
This article proposes an approach combining a deep learning algorithm and SVIs to quantitatively measure the physical disorder of urban street spaces.
Fifteen physical disorder factors in five categoriesarchitectural, commercial, road, greening, and other infrastructure aspects-were defined and manually annotated in the training SVI set. The pretrained MobileNet V3 model performed well, with an average accuracy of 72.77 percent. To further test the approach, a case study was conducted in 264 Chinese cities. Based on the prediction results of the deep learning model, the study developed a series of maps showing the physical disorder of urban street spaces in Chinese cities and revealed significant patterns of these phenomena at both the street and city scales. At the street level, (1) physical disorder is commonly seen in the street spaces of Chinese cities, although the spatial quality remains moderate for most streets; (2) the low spatial quality in architecture, roads, and commerce is the main manifestation of the street space disorder in China; and (3) street length, functional diversity, and distance from the city center are all found to be positively associated with physical disorder of a street. At the city scale, physical disorder of street spaces exhibits three spatial patterns: monocentric, linear concentration along arterial roads, and polycentric distribution across the city.

Discussion
In terms of the contribution of this article, we found several areas in which our study makes a significant contribution. First, the attention to physical disorder has experienced quick growth during the past ten years, according to our keyword search on Google Scholar, but theories (e.g., mechanism and the externalities of physical disorder), measurement (e.g., conventional manual audit or questionnaire), and empirical analysis (limited case cities) all face limitations in the existing literature. Our study has clear goals to enrich the related research and to overcome these limitations. Second, our innovative approach to quantifying physical disorder with SVIs and deep learning not only overcomes the limitations of the conventional auditing and field survey, but also provides a universal tool that can easily be applied to almost any place in the world, for largescale analysis at a lower cost. Third, such a largescale empirical analysis that includes most urban areas in China, which represent the Chinese urban system, gives us valuable information to understand the characteristics (e.g., degree, spatial distribution) of the physical disorder at different scales, and enables us to reveal the mechanisms behind the appearance of physical disorder. All of these factors help us learn more about physical disorder and enrich the theories in urban space.
In practice, by identifying the physical disorder in urban street spaces in a large number of cities, as well as providing the visualization maps for each city, we believe this article could assist local governments in evaluating the street space quality of their cities, and especially help them to identify the existing problems that cause the physical disorder in specific streets, which might have already imposed negative externalities, such as decreased rent and urban vitality and increased stress and crime. The combination of the SVI data and deep learning also sheds light on a more comprehensive understanding of the urban built environment at multiple scales including location, street, and city, which provides refined support for policymaking in projects, such as urban renewal, street revitalization, and community development, thus helping improve the overall quality of life. In addition, local residents could also benefit from these physical disorder maps by learning more about their surrounding urban spaces and better engaging in improving them or adjusting their behaviors and location choice.
Notably, there are also a few limitations worth further discussion in the near future. First, the SVIs have spatial and temporal limitations; spatially they lack data on inaccessible local backroads and alleys.
In addition, street views generally cover only central urban areas (i.e., densely populated areas). Temporally, commercial street views are slow to update and urban renewal is not reflected in a timely manner. For example, Tencent Street View images have not been updated since 2015. In the near future, we will be able to access more SVI data from Baidu Maps, which is another major map service company in China, to further expand our study. Another alternative is an active urban sensing strategy that could help alleviate this problem by hiring self-employed agents, such as vehicles, bicycles, and pedestrians to collect street views. Second, using the binary indicator in identifying physical disorder has its limitations, such as one single abandoned building on the street must have a different physical disorder effect than a row of abandoned buildings. In this pilot study, though, we were attempting to first answer the question of whether physical disorder exists in urban street spaces but paid little attention to how much disorder the streets are in. The future plan could be implemented with the SegNet model, which is also a deep learning algorithm that enables us to not only identify the physical disorder factors' existence, but also to know the area proportion of each factor, such as the area of the unkempt facades or graffiti, and how many street vendors can be seen. Image segmentation model annotation, however, needs to depict the shape of each object resulting in a huge workload, so it still faces a huge challenge due to the lack of a large-scale annotation database.
The auditing guidance of this study is available as supplemental material. It will help readers to understand how the auditors identify the physical disorder in the process of virtual auditing. It also provides guidance for the research community in replicating the approach in future studies on physical disorder in other cities.