A mixture of multivariate contaminated normal distributions is developed for model-based clustering. In addition to the parameters of the classical normal mixture, our contaminated mixture has, for each cluster, a parameter controlling the proportion of mild outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. Using a large-scale simulation study, the behavior of the proposed approach is investigated and comparison with well-established finite mixtures is provided. The performance of this novel family of models is also illustrated on artificial and real data.

Parsimonious mixtures of multivariate contaminated normal distributions

PUNZO, ANTONIO
;
2016-01-01

Abstract

A mixture of multivariate contaminated normal distributions is developed for model-based clustering. In addition to the parameters of the classical normal mixture, our contaminated mixture has, for each cluster, a parameter controlling the proportion of mild outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. Using a large-scale simulation study, the behavior of the proposed approach is investigated and comparison with well-established finite mixtures is provided. The performance of this novel family of models is also illustrated on artificial and real data.
2016
Contaminated normal distribution; Contamination; EM algorithm
File in questo prodotto:
File Dimensione Formato  
Punzo & McNicholas (2016) - Biometrical Journal.pdf

solo gestori archivio

Descrizione: Articolo principale
Tipologia: Versione Editoriale (PDF)
Dimensione 1.19 MB
Formato Adobe PDF
1.19 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/20265
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 81
  • ???jsp.display-item.citation.isi??? 59
social impact