Mixtures of regression models (MRMs) are widely used to investigate the relationship between variables coming from several unknown latent homogeneous groups. Usually, the conditional distribution of the response in each mixture component is assumed to be (multivariate) normal (MN-MRM). To robustify the approach with respect to possible elliptical heavy-tailed departures from normality, due to the presence of mild outliers, the multivariate contaminated normal MRM is here introduced. In addition to the parameters of the MN-MRM, each mixture component has a parameter controlling the proportion of outliers and one specifying the degree of contamination with respect to the response variable(s). Crucially, these parameters do not have to be specified a priori, adding flexibility to our approach. Furthermore, once the model is estimated and the observations are assigned to the groups, a finer intra-group classification in typical points and (mild) outliers, can be directly obtained. Identifiability conditions are provided, an expectation-conditional maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients are evaluated through Monte Carlo experiments and compared with other procedures. The performance of this novel family of models is also illustrated on artificial and real data, with particular emphasis to the application in allometric studies.

Mixtures of multivariate contaminated normal regression models

Mazza, A;Punzo, A
2020-01-01

Abstract

Mixtures of regression models (MRMs) are widely used to investigate the relationship between variables coming from several unknown latent homogeneous groups. Usually, the conditional distribution of the response in each mixture component is assumed to be (multivariate) normal (MN-MRM). To robustify the approach with respect to possible elliptical heavy-tailed departures from normality, due to the presence of mild outliers, the multivariate contaminated normal MRM is here introduced. In addition to the parameters of the MN-MRM, each mixture component has a parameter controlling the proportion of outliers and one specifying the degree of contamination with respect to the response variable(s). Crucially, these parameters do not have to be specified a priori, adding flexibility to our approach. Furthermore, once the model is estimated and the observations are assigned to the groups, a finer intra-group classification in typical points and (mild) outliers, can be directly obtained. Identifiability conditions are provided, an expectation-conditional maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients are evaluated through Monte Carlo experiments and compared with other procedures. The performance of this novel family of models is also illustrated on artificial and real data, with particular emphasis to the application in allometric studies.
2020
Contaminated normal distribution, Mixtures of regression models, Model-based clustering
File in questo prodotto:
File Dimensione Formato  
Mazza & Punzo (2020) - SP.pdf

solo gestori archivio

Descrizione: Articolo principale
Tipologia: Versione Editoriale (PDF)
Dimensione 734.7 kB
Formato Adobe PDF
734.7 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/323925
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 29
  • ???jsp.display-item.citation.isi??? 26
social impact