Cluster-weighted models (CWMs) are mixtures of regression models with random co- variates. However, besides having recently become rather popular in statistics and data mining, there is still a lack of support for CWMs within the most popular statistical suites. In this paper, we introduce flexCWM, an R package specifically conceived for fitting CWMs. The package supports modeling the conditioned response variable by means of the most common distributions of the exponential family and by the t distribution. Covariates are allowed to be of a mixed-type and parsimonious modeling of multivariate normal covariates, based on the eigenvalue decomposition of the component covariance matrices, is supported. Furthermore, either the response or the covariates distributions can be omitted, yielding to mixtures of distributions and mixtures of regression models with fixed covariates, respectively. The expectation-maximization (EM) algorithm is used to obtain maximum-likelihood estimates of the parameters and likelihood-based information criteria are adopted to select the number of groups and/or the parsimonious model. For the component regression coefficients, standard errors and significance tests are also provided. Parallel computation can be used on multicore PCs and computer clusters, when several models have to be fitted. To exemplify the use of the package, applications to artificial and real datasets, included into the package, are presented.

flexCWM: A Flexible Framework for Cluster-Weighted Models

Mazza, A.
;
Punzo, A.;Ingrassia, S.
2018-01-01

Abstract

Cluster-weighted models (CWMs) are mixtures of regression models with random co- variates. However, besides having recently become rather popular in statistics and data mining, there is still a lack of support for CWMs within the most popular statistical suites. In this paper, we introduce flexCWM, an R package specifically conceived for fitting CWMs. The package supports modeling the conditioned response variable by means of the most common distributions of the exponential family and by the t distribution. Covariates are allowed to be of a mixed-type and parsimonious modeling of multivariate normal covariates, based on the eigenvalue decomposition of the component covariance matrices, is supported. Furthermore, either the response or the covariates distributions can be omitted, yielding to mixtures of distributions and mixtures of regression models with fixed covariates, respectively. The expectation-maximization (EM) algorithm is used to obtain maximum-likelihood estimates of the parameters and likelihood-based information criteria are adopted to select the number of groups and/or the parsimonious model. For the component regression coefficients, standard errors and significance tests are also provided. Parallel computation can be used on multicore PCs and computer clusters, when several models have to be fitted. To exemplify the use of the package, applications to artificial and real datasets, included into the package, are presented.
2018
Cluster-weighted models
EM algorithm
Mixture models
Model-based clustering
Random covariates
File in questo prodotto:
File Dimensione Formato  
Mazza, Punzo & Ingrassia (2018) - JSS (flexCWM).pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: Versione Editoriale (PDF)
Dimensione 774.67 kB
Formato Adobe PDF
774.67 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/299746
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 25
  • ???jsp.display-item.citation.isi??? 24
social impact