IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v41y2026i2d10.1007_s00180-026-01717-7.html

Bayesian analysis of heavy-tailed Heckman selection models using Hamiltonian Monte Carlo

Author

Listed:
  • Heeju Lim

    (University of Connecticut, Department of Statistics)

  • Victor E. Lachos

    (University of São Paulo, Department of Applied Mathematics and Statistics)

  • Victor H. Lachos

    (University of Connecticut, Department of Statistics)

Abstract

The Heckman selection model is widely used in econometric analysis and other social sciences to address sample selection bias in data modeling. A common assumption in Heckman selection models is that the error terms follow an independent bivariate normal distribution. However, real-world data often deviates from this assumption, exhibiting heavy-tailed behavior, which can lead to inconsistent estimates if not properly addressed. In this paper, we propose a Bayesian analysis of Heckman selection models that replace the Gaussian assumption with well-known members of the class of scale mixture of normal distributions, such as the Student’s-t and contaminated normal distributions. For these complex structures, Stan’s default No-U-Turn sampler is utilized to obtain posterior simulations. Through extensive simulation studies, we compare the performance of the Heckman selection models with normal, Student’s-t and contaminated normal distributions. We also demonstrate the broad applicability of this methodology by applying it to medical care and labor supply data. The proposed algorithms are implemented in the R package HeckmanStan.

Suggested Citation

  • Heeju Lim & Victor E. Lachos & Victor H. Lachos, 2026. "Bayesian analysis of heavy-tailed Heckman selection models using Hamiltonian Monte Carlo," Computational Statistics, Springer, vol. 41(2), pages 1-34, February.
  • Handle: RePEc:spr:compst:v:41:y:2026:i:2:d:10.1007_s00180-026-01717-7
    DOI: 10.1007/s00180-026-01717-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-026-01717-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-026-01717-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Mroz, Thomas A., 1999. "Discrete factor approximations in simultaneous equation models: Estimating the impact of a dummy endogenous variable on a continuous outcome," Journal of Econometrics, Elsevier, vol. 92(2), pages 233-274, October.
    2. Carpenter, Bob & Gelman, Andrew & Hoffman, Matthew D. & Lee, Daniel & Goodrich, Ben & Betancourt, Michael & Brubaker, Marcus & Guo, Jiqiang & Li, Peter & Riddell, Allen, 2017. "Stan: A Probabilistic Programming Language," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i01).
    3. Lachos, Victor H. & Prates, Marcos O. & Dey, Dipak K., 2021. "Heckman selection-t model: Parameter estimation via the EM-algorithm," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    4. Arellano-Valle, Reinaldo B. & Bolfarine, Heleno, 1995. "On some characterizations of the t-distribution," Statistics & Probability Letters, Elsevier, vol. 25(1), pages 79-85, October.
    5. Reinaldo B. Arellano-Valle & Marc G. Genton, 2010. "Multivariate extended skew-t distributions and related families," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(3), pages 201-234.
    6. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    7. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.
    8. Emmanuel O. Ogundimu & Jane L. Hutton, 2016. "A Sample Selection Model with Skew-normal Distribution," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 172-190, March.
    9. Victor H. Lachos & Dipankar Bandyopadhyay & Dipak K. Dey, 2011. "Linear and Nonlinear Mixed-Effects Models for Censored HIV Viral Loads Using Normal/Independent Distributions," Biometrics, The International Biometric Society, vol. 67(4), pages 1594-1604, December.
    10. A. Colin Cameron & Pravin K. Trivedi, 2010. "Microeconometrics Using Stata, Revised Edition," Stata Press books, StataCorp LLC, number musr, March.
    11. William H. Rogers & John W. Tukey, 1972. "Understanding some long‐tailed symmetrical distributions," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 26(3), pages 211-226, September.
    12. Aldo M. Garay & Francyelle L. Medina & Suelem Torres de Freitas & Víctor H. Lachos, 2024. "Bayesian analysis of linear regression models with autoregressive symmetrical errors and incomplete data," Statistical Papers, Springer, vol. 65(9), pages 5649-5690, December.
    13. Wan-Lun Wang & Victor Hugo Lachos & Yu-Chien Chen & Tsung-I Lin, 2025. "Flexible clustering via Gaussian parsimonious mixture models with censored and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 34(2), pages 431-458, June.
    14. Thaís C. O. Fonseca & Marco A. R. Ferreira & Helio S. Migon, 2008. "Objective Bayesian analysis for the Student-t regression model," Biometrika, Biometrika Trust, vol. 95(2), pages 325-333.
    15. Zhao, Jun & Kim, Hea-Jung & Kim, Hyoung-Moon, 2020. "New EM-type algorithms for the Heckman selection model," Computational Statistics & Data Analysis, Elsevier, vol. 146(C).
    16. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lachos, Victor H. & Prates, Marcos O. & Dey, Dipak K., 2021. "Heckman selection-t model: Parameter estimation via the EM-algorithm," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    2. Emmanuel O. Ogundimu, 2022. "Regularization and variable selection in Heckman selection model," Statistical Papers, Springer, vol. 63(2), pages 421-439, April.
    3. Gustavo Rocha & Reinaldo Arellano-Valle & Rosangela Loschi, 2015. "Maximum likelihood methods in a robust censored errors-in-variables model," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(4), pages 857-877, December.
    4. Saulo, Helton & Vila, Roberto & Cordeiro, Shayane S. & Leiva, Víctor, 2023. "Bivariate symmetric Heckman models and their characterization," Journal of Multivariate Analysis, Elsevier, vol. 193(C).
    5. Helton Saulo & Roberto Vila & Shayane S. Cordeiro, 2022. "Symmetric generalized Heckman models," Papers 2206.10054, arXiv.org.
    6. Divan A. Burger & Sean van der Merwe & Emmanuel Lesaffre & Peter C. le Roux & Morgan J. Raath‐Krüger, 2023. "A robust mixed‐effects parametric quantile regression model for continuous proportions: Quantifying the constraints to vitality in cushion plants," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 77(4), pages 444-470, November.
    7. Karling, Maicon J. & Durante, Daniele & Genton, Marc G., 2024. "Conjugacy properties of multivariate unified skew-elliptical distributions," Journal of Multivariate Analysis, Elsevier, vol. 204(C).
    8. Abbas Mahdavi & Anthony F. Desmond & Ahad Jamalizadeh & Tsung-I Lin, 2024. "Skew Multiple Scaled Mixtures of Normal Distributions with Flexible Tail Behavior and Their Application to Clustering," Journal of Classification, Springer;The Classification Society, vol. 41(3), pages 620-649, November.
    9. Gabriele Perrone & Gabriele Soffritti, 2023. "Seemingly unrelated clusterwise linear regression for contaminated data," Statistical Papers, Springer, vol. 64(3), pages 883-921, June.
    10. Gideon Danso-Abbeam & Gilbert Dagunga & Dennis Sedem Ehiakpor, 2019. "Adoption of Zai technology for soil fertility management: evidence from Upper East region, Ghana," Journal of Economic Structures, Springer;Pan-Pacific Association of Input-Output Studies (PAPAIOS), vol. 8(1), pages 1-14, December.
    11. Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    12. Ding, Peng, 2014. "Bayesian robust inference of sample selection using selection-t models," Journal of Multivariate Analysis, Elsevier, vol. 124(C), pages 451-464.
    13. Azzalini, Adelchi, 2022. "An overview on the progeny of the skew-normal family— A personal perspective," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    14. Moreno Bevilacqua & Christian Caamaño‐Carrillo & Reinaldo B. Arellano‐Valle & Víctor Morales‐Oñate, 2021. "Non‐Gaussian geostatistical modeling using (skew) t processes," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 212-245, March.
    15. Natalia Khorunzhina & Jean-François Richard, 2019. "Finite Gaussian Mixture Approximations to Analytically Intractable Density Kernels," Computational Economics, Springer;Society for Computational Economics, vol. 53(3), pages 991-1017, March.
    16. Sugasawa, Shonosuke & Kobayashi, Genya, 2022. "Robust fitting of mixture models using weighted complete estimating equations," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    17. Okhli, Kheirolah & Jabbari Nooghabi, Mehdi, 2021. "On the contaminated exponential distribution: A theoretical Bayesian approach for modeling positive-valued insurance claim data with outliers," Applied Mathematics and Computation, Elsevier, vol. 392(C).
    18. Christian E. Galarza & Tsung-I Lin & Wan-Lun Wang & Víctor H. Lachos, 2021. "On moments of folded and truncated multivariate Student-t distributions based on recurrence relations," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(6), pages 825-850, August.
    19. Francisco H. C. Alencar & Christian E. Galarza & Larissa A. Matos & Victor H. Lachos, 2022. "Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(3), pages 521-557, September.
    20. Gagnon, Philippe & Hayashi, Yoshiko, 2023. "Theoretical properties of Bayesian Student-t linear regression," Statistics & Probability Letters, Elsevier, vol. 193(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:41:y:2026:i:2:d:10.1007_s00180-026-01717-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.