Contaminated mixture distributions have are parameterized to indicate the proportion of outliers and the degree of contamination. By their nature, they present a natural method for outlier detection and are very attractive for mixture modelbased clustering and classification. The first contribution of this paper is to introduce a mixture model whereby each mixture component is itself a contaminated Gaussian distribution. To introduce parsimony, a family of fourteen mixtures of contaminated Gaussian distributions is developed by applying constraints to eigen-decomposed component covariance matrices. This approach is, amongst other things, an effective alternative to trimmed clustering. An expectation-conditional maximization (ECM) algorithm is used to find maximum likelihood estimates of the parameters and thereby give classifications for the observations. The second contribution of this paper is to introduce a mixture model whereby each mixture component is itself a shifted asymmetric Laplace distribution. This approach allows the possibility to carry out robust clustering when there is skewness present in the data. Again, an ECM algorithm is used for parameter estimation. Our novel approaches are applied to artificial and real data in order to illustrate some of the advantages. Amongst them, and in contrast to the trimmed clustering approach, we have: 1) each observation has a posterior probability of belonging to a particular group and, inside each group, of being an outlier or not, 2) the models do not require pre-specification of quantities such as the proportion of observations to trim, 3) the approach can be easily used in high dimensions, 4) model-based classification is permitted in addition to clustering, and 5) (in the second contribution only) we can account for non-elliptical clusters.
Outlier Detection via Contaminated Mixture Distributions
PUNZO, ANTONIO;
2013-01-01
Abstract
Contaminated mixture distributions have are parameterized to indicate the proportion of outliers and the degree of contamination. By their nature, they present a natural method for outlier detection and are very attractive for mixture modelbased clustering and classification. The first contribution of this paper is to introduce a mixture model whereby each mixture component is itself a contaminated Gaussian distribution. To introduce parsimony, a family of fourteen mixtures of contaminated Gaussian distributions is developed by applying constraints to eigen-decomposed component covariance matrices. This approach is, amongst other things, an effective alternative to trimmed clustering. An expectation-conditional maximization (ECM) algorithm is used to find maximum likelihood estimates of the parameters and thereby give classifications for the observations. The second contribution of this paper is to introduce a mixture model whereby each mixture component is itself a shifted asymmetric Laplace distribution. This approach allows the possibility to carry out robust clustering when there is skewness present in the data. Again, an ECM algorithm is used for parameter estimation. Our novel approaches are applied to artificial and real data in order to illustrate some of the advantages. Amongst them, and in contrast to the trimmed clustering approach, we have: 1) each observation has a posterior probability of belonging to a particular group and, inside each group, of being an outlier or not, 2) the models do not require pre-specification of quantities such as the proportion of observations to trim, 3) the approach can be easily used in high dimensions, 4) model-based classification is permitted in addition to clustering, and 5) (in the second contribution only) we can account for non-elliptical clusters.File | Dimensione | Formato | |
---|---|---|---|
Punzo, McNicholas, Morris & Browne - CLADAG 2013.pdf
solo gestori archivio
Licenza:
Non specificato
Dimensione
1.54 MB
Formato
Adobe PDF
|
1.54 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.