One of the challenges in cluster analysis is the evaluation of the obtained clustering results withoutusingauxiliaryinformation.Tothisend,acommonapproachistouseinternalvalid- ity criteria. For mixtures of linear regressions whose parameters are estimated by maximum likelihood, we propose a three-term decomposition of the total sum of squares as a starting point to define some internal validity criteria. In particular, three types of mixtures of regres- sions are considered: with fixed covariates, with concomitant variables, and with random covariates. A ternary diagram is also suggested for easier joint interpretation of the three terms of the proposed decomposition. Furthermore, local and overall coefficients of deter- mination are respectively defined to judge how well the model fits the data group-by-group but also taken as a whole. Artificial data are considered to find out more about the proposed decomposition, including violations of the model assumptions. Finally, an application to real data illustrates the use and the usefulness of these proposals.

Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition

Ingrassia S.;Punzo A.
2020-01-01

Abstract

One of the challenges in cluster analysis is the evaluation of the obtained clustering results withoutusingauxiliaryinformation.Tothisend,acommonapproachistouseinternalvalid- ity criteria. For mixtures of linear regressions whose parameters are estimated by maximum likelihood, we propose a three-term decomposition of the total sum of squares as a starting point to define some internal validity criteria. In particular, three types of mixtures of regres- sions are considered: with fixed covariates, with concomitant variables, and with random covariates. A ternary diagram is also suggested for easier joint interpretation of the three terms of the proposed decomposition. Furthermore, local and overall coefficients of deter- mination are respectively defined to judge how well the model fits the data group-by-group but also taken as a whole. Artificial data are considered to find out more about the proposed decomposition, including violations of the model assumptions. Finally, an application to real data illustrates the use and the usefulness of these proposals.
2020
Cluster validation, EM algorithm, Maximum likelihood, Mixtures of regressions, Model-based clustering, Ternary diagram
File in questo prodotto:
File Dimensione Formato  
Ingrassia & Punzo (2020) - JC.pdf

solo gestori archivio

Descrizione: Articolo principale
Tipologia: Versione Editoriale (PDF)
Dimensione 1.11 MB
Formato Adobe PDF
1.11 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/371719
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 11
social impact