figshare
Browse
1/1
2 files

Finding the Number of Normal Groups in Model-Based Clustering via Constrained Likelihoods

dataset
posted on 2017-10-16, 18:47 authored by Andrea Cerioli, Luis Angel García-Escudero, Agustín Mayo-Iscar, Marco Riani

Deciding the number of clusters k is one of the most difficult problems in cluster analysis. For this purpose, complexity-penalized likelihood approaches have been introduced in model-based clustering, such as the well known BIC and ICL criteria. However, the classification/mixture likelihoods considered in these approaches are unbounded without any constraint on the cluster scatter matrices. Constraints also prevent traditional EM and CEM algorithms from being trapped in (spurious) local maxima. Controlling the maximal ratio between the eigenvalues of the scatter matrices to be smaller than a fixed constant c ≥ 1 is a sensible idea for setting such constraints. A new penalized likelihood criterion which takes into account the higher model complexity that a higher value of c entails, is proposed. Based on this criterion, a novel and fully automated procedure, leading to a small ranked list of optimal (k, c) couples is provided. A new plot called “car-bike” which provides a concise summary of the solutions is introduced. The performance of the procedure is assessed both in empirical examples and through a simulation study as a function of cluster overlap. Supplemental materials for the article are available online.

History

Usage metrics

    Journal of Computational and Graphical Statistics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC