Trend surface analysis of geographic flows

Abstract An origin-destination (OD) flow is the movement of objects from an origin to a destination. Determining how the flows vary across geographic locations helps understand the mechanism of flow distributions; however, it has rarely been studied. Here, we propose a trend surface model with polynomial functions to quantify the flow distribution with coordinates in the flow space. This model assumes that an observed data-record is composed of the trend value and the residual, and is represented by the orthogonal polynomial with O and D coordinates as independent variables and flow properties as dependent variables. The simulation experiments based on the linear and quadratic models indicated that the trend surface function could reflect the increasing/decreasing variation of flows with OD locations (i.e. flow trends) in different patterns. Applying this model to a case study of taxi OD flows in the broad Central Business District of Beijing, we found that the flows exhibited a rising trend toward the southwest. The trend surface characteristics are associated with the distributions of urban functional patches, where the workplaces and residences increased toward the southwest in the study area. Notably, the spatial deviations of trend surface model can help in identifying site pairs that attract flows at a high density (e.g. commerce centers and big communities), facilitating the planning of public transportation to mitigate the congestion.


Introduction
The movement of an object between two locations can be defined as a geographical flow (Song et al. 2020), namely, as the logistics services in cities (Ducret et al. 2016), the daily commute from residence to workplace (Liu et al. 2012, Guo et al. 2021, and the information exchange between mobile phone users (Ahas et al. 2010). Such flow data can be denoted as an orderly connected point pair, composed of an origin (O) and a destination (D) (LeSage and Pace 2008, Guo and Zhu 2014, Zhao et al. 2020, Shu et al. 2021. Previous methods on flow modeling are mainly based on the network space and the Euclidean space. In the network analysis, the sites of flows are abstracted as nodes and the interactions between sites are regarded as edges in the network, which leads to the lack of spatial information. The method of modeling flows in the four-dimensional (4D) Euclidean space is easy for mathematical calculation, but makes it difficult to explain the distance and flow patterns. To alleviate these drawbacks, we define the flow as a 4D point in the OD flow space, which is formed by the Cartesian product of origins' 2D geographic space and destinations' 2D geographic space (Gao et al. 2018, Song et al. 2020, Shu et al. 2021, Guo et al. 2021. This method improves the interpretability of the model and helps to quantify the spatial distribution of flow. Due to the interactions and high-dimensional attributes of flows, the spatial distribution of flows could be complex. In geographical space, the distribution of point data may display trends with geographical coordinates, such as the deposition elements in the sedimentary environment Krumbein 1962, Radley andAllen 2012) and harmful particulate matter of 2.5 microns (PM 2.5 ) in the atmosphere (Lee et al. 2012). Detecting the trend pattern of geographical objects is essential to simulate the continuous distribution characteristic in a wide region and identify the abnormal ones which deviate from the trend. Similar to geographical points, the flow data may also display trends with coordinates. Referring to the point density, which is defined as the excepted number of points within a unit area, the flow density can be defined as the expected number of flows within a pair of unit regions. For example, the number of traffic flows between OD regions could reflect the interaction intensity between these two locations. Consider a scenario where the living and working places of a group of residences are located along the same road, where the workplace density increases from left to right while the residence density decreases from left to right. If the commute of morning refers to a flow from a residence to a workplace and that of evening reverses, the density of commuting flows would increase along the road from left to right during the morning rush hours and then become reversed in the evening. This increasing/decreasing variation of flows with coordinates can be defined as the flow trend. An increasing flow trend towards west/east indicates that there would be more traffic flows gathered along the road, which could explain the flow direction along which the congestion may generate. In this regard, to quantify how the trends of flows vary across geographic locations is important to explain the direction of flow aggregation, and then, understand the mechanism of flow distribution. In this study, we raise the scientific question about the flow data, in particular, do flow trends exist? and how to model the trends? Thus far, few studies have focused on the flow trends with spatial locations, which is a novel research topic for analyzing the flow patterns.
Generally, studies on geographic flow distribution can be divided into two categories. The first category concerns the spatial correlation and heterogeneity (aggregation), while the second aims to determine the relationship between the influencing factors and geographic flow distributions. The first category applies classic spatial statistical methods to measure the spatial correlation and spatial heterogeneity of flows.
For instance, the Moran's I can be used to assess the spatial autocorrelation in flows based on a migration model (Black 2010) and to measure the global and local spatial autocorrelation of flows by defining the flow as vectors (Liu et al. 2015). The Moran's I can be extended further as the BiFlowLISA (a local indicator of spatial association of bivariate flow data), to measure the spatial association of flows (Tao and Thill 2020). Moreover, the Getis-Ord G i statistic can be applied to identify the local spatial association for flow data (Berglund and Karlstr€ om 1999). To measure the spatial heterogeneity of flow, the multidimensional spatial scan statistics (Gao et al. 2018) and the classical local K-function (Tao and Thill 2016) were proposed to discern flow hotspots, in consideration of the multiscale issue and relative importance control. Then, the upgraded Flow Cross K-function method assessed spatial dependencies of bivariate spatial flow patterns (Tao and Thill 2019). However, the benchmarks of these K-function models stemming from Monte Carlo simulations may be incapable of accurately quantifying the maximal aggregation scale. Similar to the advantages of the L-function in 2D space over the K function, the L-function for flows (Shu et al. 2021) has an expected mean of zero (0) for all distances (scales) under the null hypothesis of CSR of flows, which enables the L-function to detect the maximal aggregation scale more easily and accurately. Furthermore, subsequent studies have detected arbitrarily shaped clusters in OD flows by using ant colony optimization (Song et al. 2019) and a method based on density domain decomposition (Song et al. 2020). Although these spatial statistical methods focus on the property of flow distributions globally or locally, they do not provide models that capture the flow trends varying with location.
The second category delves into studies on the influencing factors of geographic flow distribution, which were mainly based on the spatial interaction regression models. Fischer and Griffith (2008) have applied the spatial interaction models of the gravity type to deal with the issue of spatial autocorrelation among flow residuals. These spatial interaction regression models consider three types of variables: (1) the variables that characterize the origin of a flow, (2) the variables that reflect the destination of a flow, and (3) variables that express the separation between origin and destination. For instance, Bolduc et al. (1992) have identified the influence of socio-economic variables, associated with the origin and destination on the flow volume. Furthermore, Zhang et al. (2019) calibrated the flow-focused spatial interaction modeling by using ordinary least squares (OLS) regression and negative binomial (NB) regression methods. They reported that the pulling effect of economic development at the origin site was stronger on traffic flows compared with the pushing effect in economically wealthier destination sites. Kordi and Fotheringham (2016) have proposed spatially weighted interaction models for detecting, visualizing, and analyzing spatial non-stationarity in spatial interaction processes. Yang et al. (2019a) utilized the geographically weighted regression (GWR) to quantify the inter-relationships between distance decay coefficients, land use distribution, and traffic facilities. Moreover, in some other studies, the common characteristics of relationships between human convergence-divergence patterns and land use were revealed by considering the influence of the scales of spatial analysis units (Yang et al. 2016, Yang et al. 2019b. Although the relationship between the flow distribution and influencing factors have been extensively studied, how the geographical locations effect on the flow trends remains unclear remains unclear. The research aim of this study is to detect the flow variation with locations by extending the trend surface model, a spatial statistical method which can effectively reflect the distribution and variation trend of spatial data in geology (Miller 1956, Ni et al. 2017, resources Krumbein 1962, Radley andAllen 2012), environment (Lee et al. 2012), and diseases (La Vecchia et al. 1990). This model can quantitatively reflect the flow trends with geographical locations (Miller 1956, Lee et al. 2012. Furthermore, the residual variation from the trend surface could elucidate the intensive interactions between site pairs, which do not follow the flow trends, thus, facilitating the optimization of public transportation planning (Chen et al. 2013). The challenges of expanding existing trend surface model from 2D space to flow space are to (1) find an appropriate construction to describe the flow trend surface; (2) explain the flow trend surface; and (3) visualize the flow trend surface. To achieve these, firstly, the trend surface is constructed by using the quaternion polynomial function, which helps to quantify the spatial distribution of flow with locations. Secondly, in the simulation experiments, we depict different flow distributions with simulated linear or quadratic trends, as flow trends in a real-world dataset could be complex to comprehend. Thirdly, the visualization of the flow trend surface is achieved by using an OD map (Wood et al. 2010) with contours that learns from idea of spatial nesting, making it intuitive to arrange the flow trend in a geographic space.
The remainder of this paper is organized as follows. In Section 2, the basic concepts of flow and flow space are defined, and the trend surface model of flow in the OD flow space is introduced. In Section 3, we implement the simulation experiments with datasets composed of flows with trends and noise to test the capability of the trend surface model in estimating different flow trends. In Section 4, a case study with taxi OD data in Beijing is conducted to elucidate the trend surface of residents' travel flow during different times. Finally, Section 5 presents the conclusions and future work prospects.

Trend surface model in flow space
The trend surface model has been traditionally used by earlier studies to simulate the distribution characteristics and regional variation trends of geographic points in 2 D space (Radley andAllen 2012, Ni et al. 2017). To extend this model to flows, we first define an OD flow space and calculate the flow density in this section. Further, we proposed a flow trend surface model in the flow space.

Basic concepts and definitions
To accurately consider the flow data, our study defined a flow as a 4D point in the flow space, which is formed by the Cartesian product of two 2D planes (R 2 Â R 2 ) (Gao et al. 2018, Song et al. 2020, Shu et al. 2021, Guo et al. 2021. The two planes are the O and D planes, where the O and D points are located, respectively. Although the O plane and the D plane in the OD space are logically two different planes for the mathematical analysis, they usually share the same 2-D study area in geographical space (Shu et al. 2021).  (Guo et al. 2021). It can be illustrated by the OD matrix, which reflects the Cartesian product space by combining the O-unit and D-unit. In Figure  1(d), the flow points are distributed in the units in column O 1 , as O 1 is the only unit where the origins of flows gather. Although the OD matrix can accurately reflect the aggregation of the flows between regions, the spatial dimension cannot be elucidated by using it. Hence, the flow in the flow space was expressed using an OD map. This is a visualization method that learns from the idea of nesting, which in this case, involves inserting D-units within O-units to display their spatial relationships in a single map (Wood et al. 2010). For instance, the light orange points in subunit D 4 in O 1 in the flow space (Figure 1(e)) represent the flows starting from U 1 and ending at U 4 in the geographical space (Figure 1(a)). Notably, they can easily be identified by the spatial location of the O and D units in the OD map. The subunit can also record the flows attributes between the OD unit pairs, such as the number of flows.
To calculate the observed value of our model, several basic concepts of OD flows are introduced.

Definition 1 volume of flow zone
The flow zone is a subset of the flow space, which is composed of a pair of OD regions in the 2D geographical space ( Figure 2). As it is difficult to visualize a 4-D As for an orange flow Þ , the area of O i neighbor region with side length r, shown as the blue rectangle A, denoted S O , can be calculate as S O ¼ r 2 ; the area of D i neighbor region with side length r, shown as the blue rectangle B, denoted S D , can be calculate as S D ¼ r 2 : Then, the volume of the flow zone with F i as center and r as side length, denoted V, can be calculated as

Definition 2 flow density：
The flow density is defined as the intensity of the flow interaction between two locations. Similar to the definition of the point density in 2D geographical space, the flow density, denoted k, is the expected number of flows per unit volume in the 4D flow space (Shu et al. 2021). In practice, k can be estimated by the formula k ¼ n=V, where n is the number of flows within a flow zone, V is the volume of flow zone.
According to the definition above, the flow density at < x oi , y oi , x di , y di > can be expressed by the density of flows within the flow volume with center flow F i and side length r in the flow space.

Trend surface model of flow
Trend surface analysis can be conducted to uncover a fitted mathematical surface, for simulating the distribution characteristics and regional variation trends of geographic elements. The model assumes that the actual observed data is composed of two parts: the trend value and the residual value (Allen and Krumbein 1962). The trend value reflects the wide regional change in observations, which is governed by a range of global systemic factors. The residual value reflects the local change in observations, which is affected by some partial or random factors.
To elucidate the variation trend of flow density with location, we define the trend surface model of flow by Equation (1): where i is the serial number of OD flows in the study area, k i denotes the density of flow with domain centroid at < x oi , y oi , x di , y di > ,k i is the trend surface function, and the corresponding surface calculated by this is the trend surface of flow, and e i is the residual, denoting the deviation of flow density from the trend surface. The polynomial function is the most common form of trend surface, which enables to quantify the flow distribution with coordinates. Here, we use the polynomial function to construct the trend surface (that is to say, the proposed model can only test the trend surface by using the pre-defined form of polynomial functions). In general, the quaternionic polynomial of degree l should contain the same terms as the expansions of ðx o þ y o þ x d þ y d Þ l : According to the distribution of flow data, there could be linear, quadratic, cubic, or even higher degree of trend surface. While in many cases, the linear and quadratic trend components can unravel most of the trend variation in a small area, while the importance of cubic or higher degree trend components is being generally diminished (Allen and Krumbein 1962). Therefore, we consider only the linear and quadratic models in this study. But we still can improve these simplified trend surfaces into higher degree to analyze more complex distribution trends of mobility flows in a similar way.
The linear surface of flow, including four linear components, is fitted by Equation (2) The quadratic surface of flow, including four linear components plus ten quadratic components, is fitted by Equation (3): þa 9 x oi y oi þ a 10 x oi x di þ a 11 x oi y di þ a 12 y oi x di þ a 13 y oi y di þ a 14 x di y di wherek i denotes the trend surface function of flow density, x oi and y oi are the origin coordinates of flow, x di and y di are the destination coordinates of flow, a 0 is a constant, and a 1 , a 2 , . . . a 14 are the polynomial coefficients of the trend surface function.
It should be noted that in the flow space, once the origin point O is fixed, the variables x oi and y oi become constant terms. Then, the 4D trend surface degenerates into a 2D trend surface with x di and y di as variables in the geographic space. This sub-trend surface can be visualized as a nested D space in the OD map (Figure 1(e)). In practice, these coefficients can be calculated by minimizing the sum of squared residuals based on the least squares method (LSM). Then, we use the goodness of fit R 2 (coefficient of determination) to explain how the flow trend surface satisfies the real phenomenon, in which the closer R 2 is to 1, the better the fit of the trend function. Referring to the regression analysis model, the trend surface function of flow has an expected mean of zero (0) for all coefficients under the null hypothesis. If the result from F-test is significant, we should reject the null hypothesis and accept that the trend surface is available, otherwise, the trend surface is unavailable.

Simulation experiments
The purposes of simulation experiments are to test the capability of the trend surface model in detecting different flow trends, as well as to simulate flow trends with predefined patterns as flow trends in a real-world dataset could be complex to comprehend. To this end, we added noises to the flow dataset with trends. Then, the trend surface model was applied to the newly generated simulated dataset. Note that the effectiveness of the model can be determined by comparing the trend surface function before and after adding noise. We chose the OD map as the visualization method for the trend surface of flow, which enables simple identification of trends in the OD relationships (Wood et al. 2010). To better visualize the continuous trend of flow density, the contours are plotted in the subunit in the nested D-space. As a result, the OD map with contours is constructed.

The design of simulation experiments
Similar to 2D points, the variation of flow density (or other flow properties) with coordinates can be quantified by the coefficients of polynomial trend surface functions. For instance, a positive coefficient denotes an increasing trend and a negative coefficient denotes a decreasing trend. Then, the simulation experiments generated flows with trends using different predefined function. The experimental area was set as a rectangular area of 8 km Â 8 km with the origin located at the lower left corner of the area. Note that the horizontal axis represents X coordinates and the vertical axis represents Y coordinates. Five groups of simulation experiments were designed in this area, which correspond to different distribution trends of flow density (i.e. different trend surface functions), including two groups of linear trends and three groups of nonlinear trends. The steps for the experiments are described below.
1. Firstly, we generated the simulation data. The experimental area was gridded into a group of 4 Â 4 units with 2 km intervals in the geographical space. Then, the flow density from the unit U A to unit U B can be calculated by substituting the center coordinates of the units O A x A , y A ð Þ and O B x B , y B ð Þ into the predefined trend surface function. 2000 simulated flows with trends were generated between each unit pair according to the flow density. 2. Secondly, random flows were produced into the former dataset to generate the simulated experiment flow dataset. Note that the number of random flows was 10% of the former flow dataset. Each simulated dataset was composed of 2000 flows with trend and 200 noise flows. Then, the density of the flow between the units was recalculated. 3. Finally, the LSM method was applied to retrieve the trend surface function of the flows. Each experiment was simulated 100 times repeatedly (e.g. the coefficients of the trend surface function were virtually the average values of 100 simulations). The goodness of fit of the trend surface model was tested using R 2 .
Simulations 1 and 2 are the linear simulations of flows, where the flow density increases (or decreases) linearly along the horizontal X direction or the vertical Y direction. The corresponding trend surface function contains only the linear components. The following functions were used to create the raw simulated linear trends: Simulation 1 z ¼ 10 À 0:5x o À 0:5y o À 0:5x d À 0:5y d Simulation 2 Simulations 3-5 were nonlinear simulations of flows, where the flow density increases (or decreases) nonlinearly from one side toward the edges of the area. The corresponding trend surface of flow contained both linear and quadratic components. The following functions were used to create the raw simulated nonlinear trends: Simulation 5 Figure 3 shows the distributions of the simulated flow datasets and trend surfaces of the flows. The lines in Figure 3(a-e) represent the individual simulated flows in the experimental area. In Figure 3(f-j), the vectors between the domain centroids of the units denote the center flows between the OD unit pairs. Fundamentally, the higher the flow density, the wider and darker the vector is. To clarify the location relationship between the O and D sites, the so-called 'home cell' (see the marked subunit with red frame in the OD map from Figure 3(k-o)) refers to the relative position of O site in each nested D space. Foremost, to identify the flow trend surface characteristics with OD map, we concerned the large O-unit for its overall color change. Note that the depth of color is relative to the density of the flow from the corresponding O unit, whereby the higher the flow density, the darker the unit. Then, we focused on the nested D space within the O unit. The contours in the D space indicate how the flow density trend is affected by D's location. The detailed simulation results are summarized in Table 1.

Simulation results
In Simulation 1, the flow density linearly increased as the geographic coordinates increased (see Figure 3(a,f)). All the coefficients of x o , y o , x d , y d in the trend surface function were $0.5, indicating that the flow density increases by 0.5, for every 1 km increase in any X O , Y O , X D , Y D directions. In Figure 3(k), the colors on the OD map indicate a uniform darkened trend from bottom left to top right both in the O space and in the nested D space, thereby, suggesting the increasing trend of flow density.
The flow density in Simulation 2 linearly decreased as the geographic coordinates increased (see Figure 3(b,g)). Notably, this result is opposite to Simulation 1. The reduction tendency affected by the O coordinates is similar to that affected by the D coordinates in both the X and Y directions, according to the semblable coefficients of The flow density in Simulation 4 reveals a nonlinear increasing trend from the center to the area's edges (see Figure 3(d,i)). Without cross terms, the trend surface function can be approximated in the form ofẑ ¼ 6 þ 0:06ðx o À4Þ 2 þ 0:06ðy o À4Þ 2 þ 0:06ðx d À4Þ 2 þ 0:06ðy d À4Þ 2 , suggesting that the flow with minimum density was from ð4, 4Þ to ð4, 4Þ: Accordingly, the OD map displayed in Figure 3(n) demonstrates that the large O unit close to the center was lighter, thus indicating that the flow density was lower for the flows starting from the center. Meanwhile, the color of the D subunit near the edge of the area was darker, suggesting that the flows ending at the edge had higher flow densities. The flow density in Simulation 5 exhibited an increasing trend toward the center (see Figure 3(e,j)). The trend surface function of Simulation 5 can be roughly transformed into the form ofẑ ¼ 14 À 0:06ðx o À4Þ 2 À 0:06 y o À 4 ð Þ 2 À 0:06 x d À 4 ð Þ 2 À 0:06ðy d À4Þ 2 , suggesting that the flow with maximum density is fromð4, 4Þ to ð4, 4Þ: This result is opposite to that found in Simulation 4. The OD map in Figure 3(o) demonstrates that the color of the large O unit close to the center is darker than the D subunit near the edge of the area. This indicates that the density of flow from the center unit to the center was higher than the flow density between units around the edges.
All the trend surface functions of the flows with noise were consistent with the original functions (R 2 > 0.8, significance at 0.001 level). Overall, the simulation results indicate that the method is effective and can be used to unravel the linear and quadratic trend surface of flows with different characteristics. It should be noted that the simulation experiments here only considered the linear and quadratic trend surface of flows. For flows with complex distribution, we can change the degree of trend surface function, like cubic, quartic, or maybe higher to better satisfy their real trends, when higher value of R 2 is adopted for the trend analysis.

Case study
Due to the spatial difference of urban function districts, residents' travel flow is diverse, and the variation trends of the flow density can change during different periods. Identifying the flow trend surface is important to analyze the relationship between the flow distribution and spatial locations. The flow trends functions can quantify the variation of flow density with locations, and then explain the direction of flow aggregation or along which the congestion may generate. The flow trend surface, on the one hand, can reveal the overall distribution of travel flows, on the other hand, may indicate some factors that underpin the pattern (say the similar distribution of residences and the workplaces). As a result, the flow trend may provide some indications regarding urban planning in two aspects. First, some measures should be taken to maintain the balance between residence and workplace for the sake of reducing the abnormal flow trend. Second, the transportation system should be modified to accommodate the trend (say the road resources supporting those flows should be adjusted to adopt the flow trend) to mitigate the congestion. To elucidate the trend of travel flows, we applied the trend surface model of flow in Beijing using taxi OD data. Notably, taxis play an important role in Beijing for intra-urban transportation. In this section, we introduce the study area, taxi OD data, and the results of the trend surface analysis of flow are shown.

Study area and data preparation
We selected the Beijing broad-CBD (Central Business District) area as the study region, with the CBD as a core in the center, extending northward and eastward. The broad CBD is a popular commuter area, with intense traffic pressure within the eastern urban area of Beijing in the morning and evening rush hours. We surmise the existence of the increasing trend of flow density toward the inner 4 th Ring Road in the morning rush hours, and an expansion trend of high flow density toward the outer 4 th Ring Road in the evening rush hours. These trends are potentially attributed to the distribution of urban functional patches of workplaces and residences. We used the trend surface model for detecting potentially diverse trends of flows in the broad-CBD area during different periods. Then, we validated the proposed flow trend model. To simplify the calculation, we designed an 8 km Â 8 km rectangular area as the entire scope of the broad-CBD area, as shown in Figure 4.
The flow data we used was collected from the travel OD data of >25,000 taxis in Beijing covering 15 workdays, from 8 October to 29 October 2014 (excluding the missing data from 16 October). Each OD record contains a unique trip ID, pick-up/drop-off timestamps, and pick-up/drop-off locations projecting from Beijing's 54 coordinate system. The OD flows with a travel distance from the O site to the D site <1 km (which may result from tower signal switching) or an average travel speed of >120 km/h (the speed limit of a highway in China) were removed. Finally, we extracted $242,000 OD flows in the broad-CBD area, with an average of $16,000 trips per day. Travels in the morning rush hours from 7:00 to 9:00 am accounted for 8% of the dataset, and those in the evening rush hours from 5:00 to 7:00 pm accounted for 10%. To simply visualize the distribution of taxi OD flows in the study area, we illustrate only 5% randomly sampled data during the morning and evening rush hours in Figure 3. Given the sparsity of the flow data, the sampled observation data in the trend surface model should not be too clustered. Due to this, we calculated the flow density within 2 km Â 2 km in the experiment. The point of interest (POI) 2014 data was obtained from NavInfo, which is a leading provider of digital maps and navigation services in China. To identify the urban functional patches of workplaces and residencies, we extracted 708 POIs for workplaces and 4713 POIs for residences in the broad-CBD area, according to POI types.

Results and discussion
The linear and quadratic models defined above were the basis for obtaining the trend surfaces of taxi OD flows during different times. The goodness of fit of the models is shown in Table 2, while the coefficients of the trend surface functions are listed in Table 3. Table 2 shows that the quadratic model fits better for the trend surface of taxi OD flows in the broad-CBD area. We found that the R 2 goodness of fit of the models was 0.840 during the morning rush hours and 0.747 during the evening rush hours (0.001 level significance). This finding indicates that 84.0% and 74.7% of the variation in the flow density can be explained by the OD location.
The linear terms with higher absolute values of coefficients, compared with the nonlinear terms were essential in both morning and evening trend surface functions (see Table 3). The flow density generally decreased as the coordinates increased due to the negative coefficients of x o , y o , x d , y d : However, the reduction tendency in the X direction was stronger than that in the Y direction, noting that the absolute values of x o and x d were higher than y o and y d : Meanwhile, the quadratic components of To elucidate the trend characteristics of the flow density, we visualized the trend surfaces as the OD maps with contours ( Figure 5). Moreover, we referred to some intuitive phenomena of geographical objects to make the results more understandable. There are researches discovering that the land use is relevant to the flow distribution (Yang et al. 2016, Yang et al. 2019b. On this basis, we demonstrated the kernel density estimation of POIs of workplaces and residences in Figure 6 to analyze the relationship between trend surfaces and urban functional patches. Figure 5(a) shows the trend characteristics of flow. As seen, the flow density during morning rush hours exhibited a decreasing trend as the coordinates increased in both the X and Y directions. The overall color of the O-unit is darker in the left parts of the  OD map, which indicates that the density of flows starting from the western parts of the study area were higher. We identified more residences located within the East 4th Ring Road (e.g. Yong'anli, Sihui, Tianshuiyuan) ( Figure 6(b)). For the nested D space in each O-unit, the darker area was also mostly clustered in the west. The further southwestern the destination is, the higher the flow density is. We suggest that the driver of this trend is related to the increasing trend of the office facility density to the southwest (e.g. CBD core area, Hujialou, and Sanyuanqiao), as shown in Figure 6(a). The trend surface of flow during evening rush hours was similar to that in the morning, where the flow density decreased as the coordinates increased ( Figure 5(b)). Likewise, this pattern emerged due to the clustered distribution of residences and  workplaces in the southwestern parts of the study area ( Figure 6). As seen, the contours with high flow density in the D space extended toward the northeast. However, as the origin moved northwards, more flows ended in the northwestern region. This pattern was driven by more residences located outside the East 4th Ring Road (located in the northeast part of the study area) than workplaces ( Figure 6).
The residual surface, shown in Figure 7 was calculated by subtracting the trend value from the actual observed value of the flow density. It shows the spatial deviation of the trend surface model, where a positive value denotes an underestimation error (shown as red), and a negative value denotes an overestimation error (shown as blue). During morning rush hours, the flows with underestimated density mainly start from the southeast region and end at the CBD core zone, the popular workplace (see vectors in Figure 7(a)). The underestimations of the flows during evening rush hours are mainly from the west or south of the region to the Dongdaqiao area, where many residences are located (see vectors in Figure 7(b)).
In general, the trend surfaces of flows in the broad-CBD area exhibited a decreasing trend as the coordinates increased during both time periods. Furthermore, we added a supplementary experiment about the trend surface of O/D points to better conclude the characteristic of flow trend, and found they were similar with the detected flow trends in the case study (see the supplementary material). We then analyzed the distribution of urban functional patches and found it was similar with the flow trend. That may address the knowledge gap, explaining why the flow trend varies with geographical location. In the morning rush hour, the flow density tended to increase toward the southwest due to the aggregation of office facilities within the 4th Ring Road. During evening rush hours, the regions with high flow density extended to the northeast because of the abundant residences outside the 4th Ring Road. The spatial deviation of the trend surface model was normally associated with the facilities pairs that can attract numerous flows (e.g. CBD core zone and Dongdaqiao residence). To relieve the traffic pressure in the inner ring area of the broad-CBD, associated workplaces and residences could be built in the outer ring area and more public transportation stations could be added to locations with intensive interactions.

Conclusions
To determine the relationship between flow distribution and geographic locations is important to understand the mechanism of flow distributions. In this study, we proposed a flow trend surface model of to quantify the flow trends with coordinates. To this end, we defined an OD flow as a 4D point in the OD flow space, which is formed by the Cartesian product of two 2D planes, the O-plane, where O points are located, and the D-plane, where D points are located. Then, the trend surface model in the flow space (the function of which is signified by the orthogonal polynomial) was modified from the traditional method for geographical points by regression analysis. The visualization of the trend surface of flow was achieved using an OD map with contours to depict flow densities between OD units arranged in a geographic space.
The simulated and the real-world flow experiments indicated that the trend surface model performed well in demonstrating the trend characteristic of flow. Overall, the model was able to accurately identify the trend surface of flows with different linear or quadratic characteristics via the simulation experiments. Notably, the coefficients of the trend surface function can unravel the increasing or decreasing trend of flow density with the O and D coordinates. In the case study, we used the trend surface model to identify the diverse trends of flows in the broad-CBD area during different times. The goodness of fits of the morning model (R 2 was 0.840) and the evening model (R 2 was 0.747) demonstrated that the quadratic model performed better, compared with the linear models (R 2 was 0.554 in the morning model and 0.472 in the evening model). According to the trend surfaces, the flow density in the study area exhibited a decreasing trend as the coordinates increased. We then analyzed the distribution of urban functional patches and found it was similar with the flow trend. In the morning rush hour, the flow density seemingly increased toward the southwest, where office facilities are abundant. In the evening rush hour, the regions with high flow density extended from the existing southwest aggregation toward the northeast, where many residences are located. The spatial deviation from the trend surface aids the identification of site pairs that can attract plenty of flows (e.g. commerce centers and big communities). Such finding can lay the foundation for policy making, aimed at relieving traffic pressure, such as building the associated workplaces and residences in the outer ring area to release some commute flows from southwestern inner ring area, or adding public transportation stations to locations with intensive interactions.
However, the trend surface model has certain limitations. Firstly, the resolution of the grid used in the study was set as 2 km, considering the sparsity of taxi flow data. Reducing the resolution may lead to the heterogeneous distribution of observed samples (some regions were without data coverage), and thus weakens the model's performance. Secondly, when the study area is extended and the flow distribution is multiple, a higher-order model would be required to elucidate a more complex trend. The criterion for the polynomial construction points out 35 components for the cubic model, which may lead to coefficient redundancy and increase computational complexity. Hence, distinct forms of the trend surface function, such as Fourier series and trigonometric polynomials, require further investigation.
Compared with the nonparametric methods, such as kernel density estimation, the trend surface model is presented by the polynomial with origin and destination coordinates as independent variables and flows density as dependent variables, making it possible to quantify the flow trend by the trend surface function. Furthermore, the method of flow trend surface analysis could quantify the flow distribution with locations, and then be expanded to numerous applications with various datasets (e.g. smartcard data, mobile records, population migrations, and typhoon tracks). For instance, it can be applied for capturing travel behaviors for commuting groups by other modes, investigating the dynamic of human mobility between cities, and estimating the maximum wind speed of the typhoon in its birth and death process.
Prof. Peijun Du is a professor at Nanjing University. His research interests include machine learning and geoscientific analysis in remote sensing images.