figshare
Browse
rsta20160293_si_005.txt (18.71 kB)

8d synthetic dataset labels from Clustering: how much bias do we need?

Download (18.71 kB)
Version 2 2020-10-15, 08:40
Version 1 2017-03-31, 06:57
dataset
posted on 2017-03-31, 06:57 authored by Tom Lorimer, Jenny Held, Ruedi Stoop
Scientific investigations in medicine and beyond, increasingly require observations to be described by more features than can be simultaneously visualized. Simply reducing the dimensionality by projections destroys essential relationships in the data. Similarly, traditional clustering algorithms introduce data bias that prevents detection of natural structures expected from generic nonlinear processes. We examine how these problems can best be addressed, where in particular we focus on two recent clustering approaches, Phenograph and Hebbian learning clustering, applied to synthetic and natural data examples. Our results reveal that already for very basic questions, minimizing clustering bias is essential, but that results can benefit further from biased post-processing.

History

Usage metrics

    Philosophical Transactions of the Royal Society A: Mathematical, Physical & Engineering Sciences

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC