A data-dependent dissimilarity measure: An effective alternative to distance measures

2017-12-11T22:30:56Z (GMT) by SUNIL ARYAL
In data mining, the task-specific performances of conventional distance-based similarity measures vary significantly in different data distributions because they are data-independent and sensitive to units or scales of measurement. This thesis investigates a measure, where the similarity of two instances is determined by the distribution of data. It introduces a new (dis)similarity measure, which is data-dependent and robust to units and scales of measurement. The empirical evaluation conducted across a wide range of datasets shows that the new measure produces better or at least more consistent task-specific performance than widely-used distance-based measures, particularly in high-dimensional datasets.