Despite recent methodological advances in hidden Markov regression models and a rapid increase in their application in a wide range of empirical settings, complex clustering-based research questions that include the contribution of the covariates set to the classification and the presence of atypical observations are often addressed ignoring the possible effects of wrong model assumptions. Hidden Markov regression models with random covariates (HMRMRCs) have been recently proposed as an improvement over the classical fixed covariates approach, allowing the covariates to contribute to the underlying clustering structure. To make the approach more flexible, when all the considered random variables are continuous, HMRMRCs are here defined focusing on three multivariate elliptical distributions: the normal (reference distribution), the t, and the contaminated normal. The latter two, heavy-tailed generalizations of the normal distribution, are introduced to protect the reference model for the occurrence of mildly atypical points and also allow us their automatic detection. Identifiability conditions are provided, EM-based algorithms are outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through Monte Carlo experiments with the aim of showing the consequences of wrong model assumptions on paramaters estimates and inferred clustering. Artificial and real data analyses are provided to investigate models behavior in presence of heterogeneity and atypical observations.

Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions

Punzo, Antonio;Ingrassia, Salvatore;
2021

Abstract

Despite recent methodological advances in hidden Markov regression models and a rapid increase in their application in a wide range of empirical settings, complex clustering-based research questions that include the contribution of the covariates set to the classification and the presence of atypical observations are often addressed ignoring the possible effects of wrong model assumptions. Hidden Markov regression models with random covariates (HMRMRCs) have been recently proposed as an improvement over the classical fixed covariates approach, allowing the covariates to contribute to the underlying clustering structure. To make the approach more flexible, when all the considered random variables are continuous, HMRMRCs are here defined focusing on three multivariate elliptical distributions: the normal (reference distribution), the t, and the contaminated normal. The latter two, heavy-tailed generalizations of the normal distribution, are introduced to protect the reference model for the occurrence of mildly atypical points and also allow us their automatic detection. Identifiability conditions are provided, EM-based algorithms are outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through Monte Carlo experiments with the aim of showing the consequences of wrong model assumptions on paramaters estimates and inferred clustering. Artificial and real data analyses are provided to investigate models behavior in presence of heterogeneity and atypical observations.
Hidden Markov models, Multivariate outcome, Atypical observations, Clustering, Heavy-tailed distributions
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/20.500.11769/371716
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 5
social impact