The Gaussian cluster-weighted model (CWM) is a mixture of regression models with random covariates that allows for flexible clustering of a random vector composed of a response variable and some covariates. In each mixture component, a Gaussian distribution is adopted for both the covariates and the response given the covariates. To make the approach robust with respect to the presence of atypical observations, we propose to replace the Gaussian distribution with the sub-Gaussian α -stable (SG α S) distribution, an elliptical generalization of the Gaussian distribution having one additional parameter, α , governing the tails’ weight. The resulting SG α S CWM is able to accommodate outliers and leverage points, concepts of primary importance in the robust regression analysis. Advantageously with respect to the t-distribution, the tails of the SG α S distribution can be heavier, thus allowing robustness also with respect to gross atypical observations. A new algorithm, based on a combination of stochastic and conditional expectation maximizations, is used to obtain maximum likelihood estimates of the model parameters. Simulated and real data are used to illustrate and compare the proposal with CWMs based on Gaussian and t distributions.
On the use of the sub-Gaussian α-stable distribution in the cluster-weighted model
Ingrassia S;Punzo A
2019-01-01
Abstract
The Gaussian cluster-weighted model (CWM) is a mixture of regression models with random covariates that allows for flexible clustering of a random vector composed of a response variable and some covariates. In each mixture component, a Gaussian distribution is adopted for both the covariates and the response given the covariates. To make the approach robust with respect to the presence of atypical observations, we propose to replace the Gaussian distribution with the sub-Gaussian α -stable (SG α S) distribution, an elliptical generalization of the Gaussian distribution having one additional parameter, α , governing the tails’ weight. The resulting SG α S CWM is able to accommodate outliers and leverage points, concepts of primary importance in the robust regression analysis. Advantageously with respect to the t-distribution, the tails of the SG α S distribution can be heavier, thus allowing robustness also with respect to gross atypical observations. A new algorithm, based on a combination of stochastic and conditional expectation maximizations, is used to obtain maximum likelihood estimates of the model parameters. Simulated and real data are used to illustrate and compare the proposal with CWMs based on Gaussian and t distributions.File | Dimensione | Formato | |
---|---|---|---|
Zarei, Mohammadpour, Ingrassia & Punzo (2019) - IJST-TA.pdf
solo gestori archivio
Descrizione: Articolo principale
Tipologia:
Versione Editoriale (PDF)
Dimensione
871.46 kB
Formato
Adobe PDF
|
871.46 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.