Contaminated mixture distributions have are parameterized to indicate the proportion of outliers and the degree of contamination. By their nature, they present a natural method for outlier detection and are very attractive for mixture modelbased clustering and classification. The first contribution of this paper is to introduce a mixture model whereby each mixture component is itself a contaminated Gaussian distribution. To introduce parsimony, a family of fourteen mixtures of contaminated Gaussian distributions is developed by applying constraints to eigen-decomposed component covariance matrices. This approach is, amongst other things, an effective alternative to trimmed clustering. An expectation-conditional maximization (ECM) algorithm is used to find maximum likelihood estimates of the parameters and thereby give classifications for the observations. The second contribution of this paper is to introduce a mixture model whereby each mixture component is itself a shifted asymmetric Laplace distribution. This approach allows the possibility to carry out robust clustering when there is skewness present in the data. Again, an ECM algorithm is used for parameter estimation. Our novel approaches are applied to artificial and real data in order to illustrate some of the advantages. Amongst them, and in contrast to the trimmed clustering approach, we have: 1) each observation has a posterior probability of belonging to a particular group and, inside each group, of being an outlier or not, 2) the models do not require pre-specification of quantities such as the proportion of observations to trim, 3) the approach can be easily used in high dimensions, 4) model-based classification is permitted in addition to clustering, and 5) (in the second contribution only) we can account for non-elliptical clusters.

Outlier Detection via Contaminated Mixture Distributions

PUNZO, ANTONIO;
2013-01-01

Abstract

Contaminated mixture distributions have are parameterized to indicate the proportion of outliers and the degree of contamination. By their nature, they present a natural method for outlier detection and are very attractive for mixture modelbased clustering and classification. The first contribution of this paper is to introduce a mixture model whereby each mixture component is itself a contaminated Gaussian distribution. To introduce parsimony, a family of fourteen mixtures of contaminated Gaussian distributions is developed by applying constraints to eigen-decomposed component covariance matrices. This approach is, amongst other things, an effective alternative to trimmed clustering. An expectation-conditional maximization (ECM) algorithm is used to find maximum likelihood estimates of the parameters and thereby give classifications for the observations. The second contribution of this paper is to introduce a mixture model whereby each mixture component is itself a shifted asymmetric Laplace distribution. This approach allows the possibility to carry out robust clustering when there is skewness present in the data. Again, an ECM algorithm is used for parameter estimation. Our novel approaches are applied to artificial and real data in order to illustrate some of the advantages. Amongst them, and in contrast to the trimmed clustering approach, we have: 1) each observation has a posterior probability of belonging to a particular group and, inside each group, of being an outlier or not, 2) the models do not require pre-specification of quantities such as the proportion of observations to trim, 3) the approach can be easily used in high dimensions, 4) model-based classification is permitted in addition to clustering, and 5) (in the second contribution only) we can account for non-elliptical clusters.
2013
9788867871179
File in questo prodotto:
File Dimensione Formato  
Punzo, McNicholas, Morris & Browne - CLADAG 2013.pdf

solo gestori archivio

Licenza: Non specificato
Dimensione 1.54 MB
Formato Adobe PDF
1.54 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/96795
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact