IDEAS home Printed from https://ideas.repec.org/a/spr/jstada/v8y2021i1d10.1186_s40488-021-00121-4.html
   My bibliography  Save this article

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Author

Listed:
  • Cindy Xin Feng

    (Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University)

Abstract

Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

Suggested Citation

  • Cindy Xin Feng, 2021. "A comparison of zero-inflated and hurdle models for modeling zero-inflated count data," Journal of Statistical Distributions and Applications, Springer, vol. 8(1), pages 1-19, December.
  • Handle: RePEc:spr:jstada:v:8:y:2021:i:1:d:10.1186_s40488-021-00121-4
    DOI: 10.1186/s40488-021-00121-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1186/s40488-021-00121-4
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1186/s40488-021-00121-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Brian Neelon & Pulak Ghosh & Patrick F. Loebs, 2013. "A spatial Poisson hurdle model for exploring geographic variation in emergency department visits," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 176(2), pages 389-413, February.
    2. Vuong, Quang H, 1989. "Likelihood Ratio Tests for Model Selection and Non-nested Hypotheses," Econometrica, Econometric Society, vol. 57(2), pages 307-333, March.
    3. Mullahy, John, 1986. "Specification and testing of some modified count data models," Journal of Econometrics, Elsevier, vol. 33(3), pages 341-365, December.
    4. D. Böhning & E. Dietz & P. Schlattmann & L. Mendonça & U. Kirchner, 1999. "The zero‐inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 162(2), pages 195-209.
    5. Lizhen Xu & Andrew D Paterson & Williams Turpin & Wei Xu, 2015. "Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-30, July.
    6. C.X. Feng & C.B. Dean, 2012. "Joint analysis of multivariate spatial count and zero‐heavy count outcomes using common spatial factor models," Environmetrics, John Wiley & Sons, Ltd., vol. 23(6), pages 493-508, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Daiho Uhm & Sunghae Jun, 2022. "Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples," Future Internet, MDPI, vol. 14(7), pages 1-11, July.
    2. Camila Pareja Yale & Hugo Tsugunobu Yoshida Yoshizaki & Luiz Paulo Fávero, 2022. "A New Zero-Inflated Negative Binomial Multilevel Model for Forecasting the Demand of Disaster Relief Supplies in the State of Sao Paulo, Brazil," Mathematics, MDPI, vol. 10(22), pages 1-11, November.
    3. Coffey Stephanie & Elliott Michael R., 2023. "Predicting Days to Respondent Contact in Cross-Sectional Surveys Using a Bayesian Approach," Journal of Official Statistics, Sciendo, vol. 39(3), pages 325-349, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. T. Martin Lukusa & Shen-Ming Lee & Chin-Shang Li, 2016. "Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 79(4), pages 457-483, May.
    2. L. Elbakidze & Y. H. Jin, 2015. "Are Economic Development and Education Improvement Associated with Participation in Transnational Terrorism?," Risk Analysis, John Wiley & Sons, vol. 35(8), pages 1520-1535, August.
    3. Mozhaeva, Irina, 2022. "Inequalities in utilization of institutional care among older people in Estonia," Health Policy, Elsevier, vol. 126(7), pages 704-714.
    4. Silva João M. C. Santos & Tenreyro Silvana & Windmeijer Frank, 2015. "Testing Competing Models for Non-negative Data with Many Zeros," Journal of Econometric Methods, De Gruyter, vol. 4(1), pages 1-18, January.
    5. Greene, William, 2007. "Functional Form and Heterogeneity in Models for Count Data," Foundations and Trends(R) in Econometrics, now publishers, vol. 1(2), pages 113-218, August.
    6. Christopher J. W. Zorn, 1998. "An Analytic and Empirical Examination of Zero-Inflated and Hurdle Poisson Specifications," Sociological Methods & Research, , vol. 26(3), pages 368-400, February.
    7. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    8. Niklas Elert, 2014. "What determines entry? Evidence from Sweden," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 53(1), pages 55-92, August.
    9. Abbas Moghimbeigi & Mohammed Reza Eshraghian & Kazem Mohammad & Brian Mcardle, 2008. "Multilevel zero-inflated negative binomial regression modeling for over-dispersed count data with extra zeros," Journal of Applied Statistics, Taylor & Francis Journals, vol. 35(10), pages 1193-1202.
    10. Ulf‐ G. Gerdtham, 1997. "Equity in Health Care Utilization: Further Tests Based on Hurdle Models and Swedish Micro Data," Health Economics, John Wiley & Sons, Ltd., vol. 6(3), pages 303-319, May.
    11. Soutik Ghosal & Timothy S. Lau & Jeremy Gaskins & Maiying Kong, 2020. "A hierarchical mixed effect hurdle model for spatiotemporal count data and its application to identifying factors impacting health professional shortages," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(5), pages 1121-1144, November.
    12. Stefano Mainardi, 2003. "Testing convergence in life expectancies: count regression models on panel data," Prague Economic Papers, Prague University of Economics and Business, vol. 2003(4), pages 350-370.
    13. Samuel Muehlemann & Juerg Schweri & Rainer Winkelmann & Stefan C. Wolter, 2007. "An Empirical Analysis of the Decision to Train Apprentices," LABOUR, CEIS, vol. 21(3), pages 419-441, September.
    14. J. M. C. Santos Silva, 2001. "A score test for non-nested hypotheses with applications to discrete data models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 16(5), pages 577-597.
    15. David Todem & Wei‐Wen Hsu & KyungMann Kim, 2023. "Nonparametric scanning tests of homogeneity for hierarchical models with continuous covariates," Biometrics, The International Biometric Society, vol. 79(3), pages 2063-2075, September.
    16. repec:fgv:epgrbe:v:66:n:1:a:3 is not listed on IDEAS
    17. Daniel Biftu Bekalo & Dufera Tejjeba Kebede, 2021. "Zero-Inflated Models for Count Data: An Application to Number of Antenatal Care Service Visits," Annals of Data Science, Springer, vol. 8(4), pages 683-708, December.
    18. Tousifur Rahman & Partha Jyoti Hazarika & M. Masoom Ali & Manash Pratim Barman, 2022. "Three-Inflated Poisson Distribution and its Application in Suicide Cases of India During Covid-19 Pandemic," Annals of Data Science, Springer, vol. 9(5), pages 1103-1127, October.
    19. Zaida C. Quiroz & Marcos O. Prates & Håvard Rue, 2015. "A Bayesian approach to estimate the biomass of anchovies off the coast of Perú," Biometrics, The International Biometric Society, vol. 71(1), pages 208-217, March.
    20. Melvyn Weeks & Sriya Iyer, 2004. "Multiple social interactions and reproductive externalities: An investigation of fertility behaviour in Kenya," Econometric Society 2004 Latin American Meetings 143, Econometric Society.
    21. William Greene, 2007. "Discrete Choice Modeling," Working Papers 07-6, New York University, Leonard N. Stern School of Business, Department of Economics.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jstada:v:8:y:2021:i:1:d:10.1186_s40488-021-00121-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.