Density-based clustering for data containing two types of points

Pei, Tao; Wang, Weiyi; Zhang, Hengcai; Ma, Ting; Du, Yunyan; Zhou, Chenghu

doi:10.6084/m9.figshare.1328421.v2

tgis_a_955027_sm3161.pdf (502.01 kB)

Density-based clustering for data containing two types of points

Version 2 2015-03-24, 09:31

Version 1 2015-02-01, 00:00

journal contribution

posted on 2015-03-24, 09:31 authored by Tao Pei, Weiyi Wang, Hengcai Zhang, Ting Ma, Yunyan Du, Chenghu Zhou

When only one type of point is distributed in a region, clustered points can be seen as an anomaly. When two different types of points coexist in a region, they overlap at different places with various densities. In such cases, the meaning of a cluster of one type of point may be altered if points of the other type show different densities within the same cluster. If we consider the origins and destinations (OD) of taxicab trips, the clustering of both in the morning may indicate a transportation hub, whereas clustered origins and sparse destinations (a hot spot where taxis are in short supply) could suggest a densely populated residential area. This cannot be identified by previous clustering methods, so it is worthwhile studying a clustering method for two types of points. The concept of two-component clustering is first defined in this paper as a group containing two types of points, at least one of which exhibits clustering. We then propose a density-based method for identifying two-component clusters. The method is divided into four steps. The first estimates the clustering scale of the point data. The second transforms the point data into the 2D density domain, where the x and y axes represent the local density of each type of point around each point, respectively. The third determines the thresholds for extracting the clusters, and the fourth generates two-component clusters using a density-connectivity mechanism. The method is applied to taxicab trip data in Beijing. Three types of two-component clusters are identified: high-density origins and destinations, high-density origins and low-density destinations, and low-density origins and high-density destinations. The clustering results are verified by the spatial relationship between the cluster locations and their land-use types over different periods of the day.