We propose a three-term decomposition of the total sum of squares for mixtures of linear regressions as a starting point to define some internal validity criteria. In particular, three types of mixtures of regressions are considered: with fixed covariates, with concomitant variables, and with random covariates (Cluster-Weighed Models). Furthermore, local and overall coefficients of determination are respectively defined to judge how well the model fits the data group-by-group but also taken as a whole. This decomposition is then extended to deviance measures for mixtures of GLMs. At the sample-level, we introduce an additive normalized decomposition of the total deviance into three terms, where each evaluates a different aspect of the fitted model: the cluster separation on the dependent variable, the proportion of the total deviance explained by the fitted model, and the proportion of the total deviance which remains unexplained. The approach is illustrated on the ground of some real datasets. In particular, the proposed fit measures are used to assess and interpret clusters of COVID-19 spread in Italy.

Local and overall R-squared measures for mixtures of regression models

Salvatore Ingrassia
2023-01-01

Abstract

We propose a three-term decomposition of the total sum of squares for mixtures of linear regressions as a starting point to define some internal validity criteria. In particular, three types of mixtures of regressions are considered: with fixed covariates, with concomitant variables, and with random covariates (Cluster-Weighed Models). Furthermore, local and overall coefficients of determination are respectively defined to judge how well the model fits the data group-by-group but also taken as a whole. This decomposition is then extended to deviance measures for mixtures of GLMs. At the sample-level, we introduce an additive normalized decomposition of the total deviance into three terms, where each evaluates a different aspect of the fitted model: the cluster separation on the dependent variable, the proportion of the total deviance explained by the fitted model, and the proportion of the total deviance which remains unexplained. The approach is illustrated on the ground of some real datasets. In particular, the proposed fit measures are used to assess and interpret clusters of COVID-19 spread in Italy.
2023
978-84-16829-90-3
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/618470
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact