The Heckman selection model is one of the most well-known econometric models in the analysis of data with sample selection. This model is designed to rectify sample selection biases based on the assumption of bivariate normal error terms. However, real data diverge from this assumption in the presence of heavy tails and/or atypical observations. Recently, this assumption has been relaxed via a more flexible Student’s t-distribution, which has appealing statistical properties. This article introduces a novel Heckman selection model using a bivariate contaminated normal distribution for the error terms. We present an efficient Expectation Conditional Maximization algorithm for parameter estimation with closed-form expressions at the E-step based on truncated multinormal distribution formulas. The point identifiability of the proposed model is also discussed, and its properties have been examined. Through simulation studies, we compare our proposed model with the normal and Student’s t counterparts and investigate the finite-sample properties and the variation in missing rate. Results obtained from two real data analyses showcase the usefulness and effectiveness of our model. The proposed algorithms are implemented in the R package HeckmanEM.
Heckman Selection-Contaminated Normal Model
Punzo A.
;
2026-01-01
Abstract
The Heckman selection model is one of the most well-known econometric models in the analysis of data with sample selection. This model is designed to rectify sample selection biases based on the assumption of bivariate normal error terms. However, real data diverge from this assumption in the presence of heavy tails and/or atypical observations. Recently, this assumption has been relaxed via a more flexible Student’s t-distribution, which has appealing statistical properties. This article introduces a novel Heckman selection model using a bivariate contaminated normal distribution for the error terms. We present an efficient Expectation Conditional Maximization algorithm for parameter estimation with closed-form expressions at the E-step based on truncated multinormal distribution formulas. The point identifiability of the proposed model is also discussed, and its properties have been examined. Through simulation studies, we compare our proposed model with the normal and Student’s t counterparts and investigate the finite-sample properties and the variation in missing rate. Results obtained from two real data analyses showcase the usefulness and effectiveness of our model. The proposed algorithms are implemented in the R package HeckmanEM.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


