10.6084/m9.figshare.1328421.v2 Tao Pei Tao Pei Weiyi Wang Weiyi Wang Hengcai Zhang Hengcai Zhang Ting Ma Ting Ma Yunyan Du Yunyan Du Chenghu Zhou Chenghu Zhou Density-based clustering for data containing two types of points Taylor & Francis Group 2015 origin destination cluster 2 D density domain taxicab trip data method point data type od 2015-03-24 09:31:37 Journal contribution https://tandf.figshare.com/articles/journal_contribution/Density_based_clustering_for_data_containing_two_types_of_points/1328421 <div><p>When only one type of point is distributed in a region, clustered points can be seen as an anomaly. When two different types of points coexist in a region, they overlap at different places with various densities. In such cases, the meaning of a cluster of one type of point may be altered if points of the other type show different densities within the same cluster. If we consider the origins and destinations (OD) of taxicab trips, the clustering of both in the morning may indicate a transportation hub, whereas clustered origins and sparse destinations (a hot spot where taxis are in short supply) could suggest a densely populated residential area. This cannot be identified by previous clustering methods, so it is worthwhile studying a clustering method for two types of points. The concept of two-component clustering is first defined in this paper as a group containing two types of points, at least one of which exhibits clustering. We then propose a density-based method for identifying two-component clusters. The method is divided into four steps. The first estimates the clustering scale of the point data. The second transforms the point data into the 2D density domain, where the x and y axes represent the local density of each type of point around each point, respectively. The third determines the thresholds for extracting the clusters, and the fourth generates two-component clusters using a density-connectivity mechanism. The method is applied to taxicab trip data in Beijing. Three types of two-component clusters are identified: high-density origins and destinations, high-density origins and low-density destinations, and low-density origins and high-density destinations. The clustering results are verified by the spatial relationship between the cluster locations and their land-use types over different periods of the day.</p></div>