IDEAS home Printed from https://ideas.repec.org/a/spr/jstada/v8y2021i1d10.1186_s40488-021-00121-4.html
   My bibliography  Save this article

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Author

Listed:
  • Cindy Xin Feng

    (Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University)

Abstract

Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

Suggested Citation

  • Cindy Xin Feng, 2021. "A comparison of zero-inflated and hurdle models for modeling zero-inflated count data," Journal of Statistical Distributions and Applications, Springer, vol. 8(1), pages 1-19, December.
  • Handle: RePEc:spr:jstada:v:8:y:2021:i:1:d:10.1186_s40488-021-00121-4
    DOI: 10.1186/s40488-021-00121-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1186/s40488-021-00121-4
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1186/s40488-021-00121-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Mullahy, John, 1986. "Specification and testing of some modified count data models," Journal of Econometrics, Elsevier, vol. 33(3), pages 341-365, December.
    2. Vuong, Quang H, 1989. "Likelihood Ratio Tests for Model Selection and Non-nested Hypotheses," Econometrica, Econometric Society, vol. 57(2), pages 307-333, March.
    3. Lizhen Xu & Andrew D Paterson & Williams Turpin & Wei Xu, 2015. "Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-30, July.
    4. Brian Neelon & Pulak Ghosh & Patrick F. Loebs, 2013. "A spatial Poisson hurdle model for exploring geographic variation in emergency department visits," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 176(2), pages 389-413, February.
    5. C.X. Feng & C.B. Dean, 2012. "Joint analysis of multivariate spatial count and zero‐heavy count outcomes using common spatial factor models," Environmetrics, John Wiley & Sons, Ltd., vol. 23(6), pages 493-508, September.
    6. D. Böhning & E. Dietz & P. Schlattmann & L. Mendonça & U. Kirchner, 1999. "The zero‐inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 162(2), pages 195-209.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Daiho Uhm & Sunghae Jun, 2022. "Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples," Future Internet, MDPI, vol. 14(7), pages 1-11, July.
    2. Coffey Stephanie & Elliott Michael R., 2023. "Predicting Days to Respondent Contact in Cross-Sectional Surveys Using a Bayesian Approach," Journal of Official Statistics, Sciendo, vol. 39(3), pages 325-349, September.
    3. Camila Pareja Yale & Hugo Tsugunobu Yoshida Yoshizaki & Luiz Paulo Fávero, 2022. "A New Zero-Inflated Negative Binomial Multilevel Model for Forecasting the Demand of Disaster Relief Supplies in the State of Sao Paulo, Brazil," Mathematics, MDPI, vol. 10(22), pages 1-11, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. L. Elbakidze & Y. H. Jin, 2015. "Are Economic Development and Education Improvement Associated with Participation in Transnational Terrorism?," Risk Analysis, John Wiley & Sons, vol. 35(8), pages 1520-1535, August.
    2. T. Martin Lukusa & Shen-Ming Lee & Chin-Shang Li, 2016. "Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 79(4), pages 457-483, May.
    3. Greene, William, 2007. "Functional Form and Heterogeneity in Models for Count Data," Foundations and Trends(R) in Econometrics, now publishers, vol. 1(2), pages 113-218, August.
    4. Christopher J. W. Zorn, 1998. "An Analytic and Empirical Examination of Zero-Inflated and Hurdle Poisson Specifications," Sociological Methods & Research, , vol. 26(3), pages 368-400, February.
    5. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    6. Ulf‐ G. Gerdtham, 1997. "Equity in Health Care Utilization: Further Tests Based on Hurdle Models and Swedish Micro Data," Health Economics, John Wiley & Sons, Ltd., vol. 6(3), pages 303-319, May.
    7. Soutik Ghosal & Timothy S. Lau & Jeremy Gaskins & Maiying Kong, 2020. "A hierarchical mixed effect hurdle model for spatiotemporal count data and its application to identifying factors impacting health professional shortages," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(5), pages 1121-1144, November.
    8. Stefano Mainardi, 2003. "Testing convergence in life expectancies: count regression models on panel data," Prague Economic Papers, Prague University of Economics and Business, vol. 2003(4), pages 350-370.
    9. J. M. C. Santos Silva, 2001. "A score test for non-nested hypotheses with applications to discrete data models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 16(5), pages 577-597.
    10. repec:fgv:epgrbe:v:66:n:1:a:3 is not listed on IDEAS
    11. Daniel Biftu Bekalo & Dufera Tejjeba Kebede, 2021. "Zero-Inflated Models for Count Data: An Application to Number of Antenatal Care Service Visits," Annals of Data Science, Springer, vol. 8(4), pages 683-708, December.
    12. Tousifur Rahman & Partha Jyoti Hazarika & M. Masoom Ali & Manash Pratim Barman, 2022. "Three-Inflated Poisson Distribution and its Application in Suicide Cases of India During Covid-19 Pandemic," Annals of Data Science, Springer, vol. 9(5), pages 1103-1127, October.
    13. Zaida C. Quiroz & Marcos O. Prates & Håvard Rue, 2015. "A Bayesian approach to estimate the biomass of anchovies off the coast of Perú," Biometrics, The International Biometric Society, vol. 71(1), pages 208-217, March.
    14. Christian Balcells, 2022. "Determinants of firm boundaries and organizational performance: an empirical investigation of the Chilean truck market," Journal of Evolutionary Economics, Springer, vol. 32(2), pages 423-461, April.
    15. Iyer, S. & Weeks, M., 2004. "Multiple Social Interaction and Reproductive Externalities: An Investigation of Fertility Behaviour in Kenya," Cambridge Working Papers in Economics 0461, Faculty of Economics, University of Cambridge.
    16. Antoni, Manfred, 2011. "Lifelong learning inequality? The relevance of family background for on-the-job training," IAB-Discussion Paper 201109, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    17. Teresa Bago d'Uva, 2006. "Latent class models for utilisation of health care," Health Economics, John Wiley & Sons, Ltd., vol. 15(4), pages 329-343, April.
    18. Majo, M.C., 2010. "A microeconometric analysis of health care utilization in Europe," Other publications TiSEM 1cf5fd2f-8146-4ef8-8eb5-e, Tilburg University, School of Economics and Management.
    19. Camila Pareja Yale & Hugo Tsugunobu Yoshida Yoshizaki & Luiz Paulo Fávero, 2022. "A New Zero-Inflated Negative Binomial Multilevel Model for Forecasting the Demand of Disaster Relief Supplies in the State of Sao Paulo, Brazil," Mathematics, MDPI, vol. 10(22), pages 1-11, November.
    20. Sisira Sarma & Wayne Simpson, 2006. "A microeconometric analysis of Canadian health care utilization," Health Economics, John Wiley & Sons, Ltd., vol. 15(3), pages 219-239, March.
    21. R. Martínez-Espiñeira, 2007. "‘Adopt a Hypothetical Pup’: A Count Data Approach to the Valuation of Wildlife," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 37(2), pages 335-360, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jstada:v:8:y:2021:i:1:d:10.1186_s40488-021-00121-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.