rsta20160293_si_005.txt (18.71 kB)

8d synthetic dataset labels from Clustering: how much bias do we need?

Version 2 2020-10-15, 08:40

Version 1 2017-03-31, 06:57

dataset

posted on 2017-03-31, 06:57 authored by Tom Lorimer, Jenny Held, Ruedi Stoop

Scientific investigations in medicine and beyond, increasingly require observations to be described by more features than can be simultaneously visualized. Simply reducing the dimensionality by projections destroys essential relationships in the data. Similarly, traditional clustering algorithms introduce data bias that prevents detection of natural structures expected from generic nonlinear processes. We examine how these problems can best be addressed, where in particular we focus on two recent clustering approaches, Phenograph and Hebbian learning clustering, applied to synthetic and natural data examples. Our results reveal that already for very basic questions, minimizing clustering bias is essential, but that results can benefit further from biased post-processing.