Gaussian mixture models with eigen-decomposed covariance structures, i.e. the Gaussian parsimonious clustering models (GPCM), make up the most popular family of mixture models for clustering and classification. Although the GPCM family has been used for almost 20 years, selecting the best member of the family in a given situation remains a troublesome problem. Likelihood ratio (LR) tests are developed to tackle this problem; given a number of mixture components, these LR tests compare each member of the family to the heteroscedastic model under the alternative hypothesis. Along the way, a novel maximum likelihood estimation procedure is developed for two members of the GPCM family. Simulations show that the χ2 reference distribution provides a reasonable approximation for the LR statistics when the sample size is not too small and when the mixture components are separate enough; accordingly, in the remaining configurations, a parametric bootstrap approach is also discussed and evaluated. Furthermore, a closed testing procedure, having the defined LR tests as local tests, is considered to assess, in a straightforward way, a unique model in the general family. In contrast with the information criteria that are often employed in the literature as ‘black boxes’, it is only based on one subjective element, the significance level, whose meaning is clear to everyone. Simulation results are presented to investigate the performance of the procedure in situations with gradual departure from the homoscedastic model and its robustness with respect to elliptical departures from normality in each mixture component. Finally, the advantages of the procedure are illustrated via applications to some well-known data sets.

Hypothesis testing for mixture model selection

PUNZO, ANTONIO
;
2016-01-01

Abstract

Gaussian mixture models with eigen-decomposed covariance structures, i.e. the Gaussian parsimonious clustering models (GPCM), make up the most popular family of mixture models for clustering and classification. Although the GPCM family has been used for almost 20 years, selecting the best member of the family in a given situation remains a troublesome problem. Likelihood ratio (LR) tests are developed to tackle this problem; given a number of mixture components, these LR tests compare each member of the family to the heteroscedastic model under the alternative hypothesis. Along the way, a novel maximum likelihood estimation procedure is developed for two members of the GPCM family. Simulations show that the χ2 reference distribution provides a reasonable approximation for the LR statistics when the sample size is not too small and when the mixture components are separate enough; accordingly, in the remaining configurations, a parametric bootstrap approach is also discussed and evaluated. Furthermore, a closed testing procedure, having the defined LR tests as local tests, is considered to assess, in a straightforward way, a unique model in the general family. In contrast with the information criteria that are often employed in the literature as ‘black boxes’, it is only based on one subjective element, the significance level, whose meaning is clear to everyone. Simulation results are presented to investigate the performance of the procedure in situations with gradual departure from the homoscedastic model and its robustness with respect to elliptical departures from normality in each mixture component. Finally, the advantages of the procedure are illustrated via applications to some well-known data sets.
2016
Closed testing procedures; eigen decomposition; Gaussian mixtures
File in questo prodotto:
File Dimensione Formato  
Punzo, Browne & McNicholas (2016) - JSCS.pdf

solo gestori archivio

Descrizione: Articolo principale
Tipologia: Versione Editoriale (PDF)
Dimensione 2.38 MB
Formato Adobe PDF
2.38 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/34517
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 24
  • ???jsp.display-item.citation.isi??? 19
social impact