Heckman Selection-Contaminated Normal Model

IRIS

The Heckman selection model is one of the most well-known econometric models in the analysis of data with sample selection. This model is designed to rectify sample selection biases based on the assumption of bivariate normal error terms. However, real data diverge from this assumption in the presence of heavy tails and/or atypical observations. Recently, this assumption has been relaxed via a more flexible Student’s t-distribution, which has appealing statistical properties. This article introduces a novel Heckman selection model using a bivariate contaminated normal distribution for the error terms. We present an efficient Expectation Conditional Maximization algorithm for parameter estimation with closed-form expressions at the E-step based on truncated multinormal distribution formulas. The point identifiability of the proposed model is also discussed, and its properties have been examined. Through simulation studies, we compare our proposed model with the normal and Student’s t counterparts and investigate the finite-sample properties and the variation in missing rate. Results obtained from two real data analyses showcase the usefulness and effectiveness of our model. The proposed algorithms are implemented in the R package HeckmanEM.

Heckman Selection-Contaminated Normal Model

Lim H.;Ordonez J. A.;Punzo A.;Lachos V. H.

2026-01-01

Abstract

The Heckman selection model is one of the most well-known econometric models in the analysis of data with sample selection. This model is designed to rectify sample selection biases based on the assumption of bivariate normal error terms. However, real data diverge from this assumption in the presence of heavy tails and/or atypical observations. Recently, this assumption has been relaxed via a more flexible Student’s t-distribution, which has appealing statistical properties. This article introduces a novel Heckman selection model using a bivariate contaminated normal distribution for the error terms. We present an efficient Expectation Conditional Maximization algorithm for parameter estimation with closed-form expressions at the E-step based on truncated multinormal distribution formulas. The point identifiability of the proposed model is also discussed, and its properties have been examined. Through simulation studies, we compare our proposed model with the normal and Student’s t counterparts and investigate the finite-sample properties and the variation in missing rate. Results obtained from two real data analyses showcase the usefulness and effectiveness of our model. The proposed algorithms are implemented in the R package HeckmanEM.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Parole chiave
	
				ECM algorithm
Heckman selection model
Multivariate contaminated normal
R package HeckmanEM
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/705353

Citazioni

ND

0

1

social impact