The Gaussian cluster-weighted model (CWM) is a mixture of regression models with random covariates that allows for flexible clustering of a random vector composed of a response variable and some covariates. In each mixture component, a Gaussian distribution is adopted for both the covariates and the response given the covariates. To make the approach robust with respect to the presence of atypical observations, we propose to replace the Gaussian distribution with the sub-Gaussian α -stable (SG α S) distribution, an elliptical generalization of the Gaussian distribution having one additional parameter, α , governing the tails’ weight. The resulting SG α S CWM is able to accommodate outliers and leverage points, concepts of primary importance in the robust regression analysis. Advantageously with respect to the t-distribution, the tails of the SG α S distribution can be heavier, thus allowing robustness also with respect to gross atypical observations. A new algorithm, based on a combination of stochastic and conditional expectation maximizations, is used to obtain maximum likelihood estimates of the model parameters. Simulated and real data are used to illustrate and compare the proposal with CWMs based on Gaussian and t distributions.

On the use of the sub-Gaussian α-stable distribution in the cluster-weighted model

Ingrassia S;Punzo A
2019-01-01

Abstract

The Gaussian cluster-weighted model (CWM) is a mixture of regression models with random covariates that allows for flexible clustering of a random vector composed of a response variable and some covariates. In each mixture component, a Gaussian distribution is adopted for both the covariates and the response given the covariates. To make the approach robust with respect to the presence of atypical observations, we propose to replace the Gaussian distribution with the sub-Gaussian α -stable (SG α S) distribution, an elliptical generalization of the Gaussian distribution having one additional parameter, α , governing the tails’ weight. The resulting SG α S CWM is able to accommodate outliers and leverage points, concepts of primary importance in the robust regression analysis. Advantageously with respect to the t-distribution, the tails of the SG α S distribution can be heavier, thus allowing robustness also with respect to gross atypical observations. A new algorithm, based on a combination of stochastic and conditional expectation maximizations, is used to obtain maximum likelihood estimates of the model parameters. Simulated and real data are used to illustrate and compare the proposal with CWMs based on Gaussian and t distributions.
2019
Cluster-weighted model, Sub-Gaussian α-stable, Model-based clustering, Mixture models, Mixtures of regressions
File in questo prodotto:
File Dimensione Formato  
Zarei, Mohammadpour, Ingrassia & Punzo (2019) - IJST-TA.pdf

solo gestori archivio

Descrizione: Articolo principale
Tipologia: Versione Editoriale (PDF)
Dimensione 871.46 kB
Formato Adobe PDF
871.46 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/323954
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 6
social impact