IDEAS home Printed from https://ideas.repec.org/a/bla/istatr/v90y2022i2p216-236.html
   My bibliography  Save this article

Modelling Excess Zeros in Count Data: A New Perspective on Modelling Approaches

Author

Listed:
  • John Haslett
  • Andrew C. Parnell
  • John Hinde
  • Rafael de Andrade Moral

Abstract

We consider the analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: over‐dispersion (OD) models and zero‐inflation (ZI) models, both of which can be seen as generalisations of the Poisson distribution; we refer to these as implicit and explicit ZI models, respectively. Although sometimes seen as competing approaches, they can be complementary; OD is a consequence of ZI modelling, and ZI is a by‐product of OD modelling. The central objective in such analyses is often concerned with inference on the effect of covariates on the mean, in light of the apparent excess of zeros in the counts. Typically, the modelling of the excess zeros per se is a secondary objective, and there are choices to be made between, and within, the OD and ZI approaches. The contribution of this paper is primarily conceptual. We contrast, descriptively, the impact on zeros of the two approaches. We further offer a novel descriptive characterisation of alternative ZI models, including the classic hurdle and mixture models, by providing a unifying theoretical framework for their comparison. This in turn leads to a novel and technically simpler ZI model. We develop the underlying theory for univariate counts and touch on its implication for multivariate count data.

Suggested Citation

  • John Haslett & Andrew C. Parnell & John Hinde & Rafael de Andrade Moral, 2022. "Modelling Excess Zeros in Count Data: A New Perspective on Modelling Approaches," International Statistical Review, International Statistical Institute, vol. 90(2), pages 216-236, August.
  • Handle: RePEc:bla:istatr:v:90:y:2022:i:2:p:216-236
    DOI: 10.1111/insr.12479
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/insr.12479
    Download Restriction: no

    File URL: https://libkey.io/10.1111/insr.12479?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Cameron,A. Colin & Trivedi,Pravin K., 2013. "Regression Analysis of Count Data," Cambridge Books, Cambridge University Press, number 9781107667273.
    2. Markus Jochmann, 2013. "What belongs where? Variable selection for zero-inflated count models with an application to the demand for health care," Computational Statistics, Springer, vol. 28(5), pages 1947-1964, October.
    3. Galit Shmueli & Thomas P. Minka & Joseph B. Kadane & Sharad Borle & Peter Boatwright, 2005. "A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(1), pages 127-142, January.
    4. Joan Del Castillo & Marta Pérez-Casany, 1998. "Weighted Poisson Distributions for Overdispersion and Underdispersion Situations," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 50(3), pages 567-585, September.
    5. Zeileis, Achim & Kleiber, Christian & Jackman, Simon, 2008. "Regression Models for Count Data in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i08).
    6. Gurmu, Shiferaw & Trivedi, Pravin K, 1996. "Excess Zeros in Count Models for Recreational Trips," Journal of Business & Economic Statistics, American Statistical Association, vol. 14(4), pages 469-477, October.
    7. Håvard Rue & Sara Martino & Nicolas Chopin, 2009. "Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 319-392, April.
    8. Daniel B. Hall, 2000. "Zero-Inflated Poisson and Binomial Regression with Random Effects: A Case Study," Biometrics, The International Biometric Society, vol. 56(4), pages 1030-1039, December.
    9. Mullahy, John, 1986. "Specification and testing of some modified count data models," Journal of Econometrics, Elsevier, vol. 33(3), pages 341-365, December.
    10. R. A. Rigby & D. M. Stasinopoulos, 2005. "Generalized additive models for location, scale and shape," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 507-554, June.
    11. Michael Salter‐Townshend & John Haslett, 2012. "Fast inversion of a flexible regression model for multivariate pollen counts data," Environmetrics, John Wiley & Sons, Ltd., vol. 23(7), pages 595-605, November.
    12. Puig, Pedro & Valero, Jordi, 2006. "Count Data Distributions: Some Characterizations With Applications," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 332-340, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christian Kleiber & Achim Zeileis, 2016. "Visualizing Count Data Regressions Using Rootograms," The American Statistician, Taylor & Francis Journals, vol. 70(3), pages 296-303, July.
    2. Soutik Ghosal & Timothy S. Lau & Jeremy Gaskins & Maiying Kong, 2020. "A hierarchical mixed effect hurdle model for spatiotemporal count data and its application to identifying factors impacting health professional shortages," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(5), pages 1121-1144, November.
    3. Nan-Ting Liu & Feng-Chang Lin & Yu-Shan Shih, 2020. "Count regression trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 5-27, March.
    4. Livio Finos & Fortunato Pesarin, 2020. "On zero-inflated permutation testing and some related problems," Statistical Papers, Springer, vol. 61(5), pages 2157-2174, October.
    5. Bae, S. & Famoye, F. & Wulu, J.T. & Bartolucci, A.A. & Singh, K.P., 2005. "A rich family of generalized Poisson regression models with applications," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 69(1), pages 4-11.
    6. Cristian Roner & Claudia Di Caterina & Davide Ferrari, 2021. "Exponential Tilting for Zero-inflated Interval Regression with Applications to Cyber Security Survey Data," BEMPS - Bozen Economics & Management Paper Series BEMPS85, Faculty of Economics and Management at the Free University of Bozen.
    7. Ana María Martínez-Rodríguez & Antonio Conde-Sánchez & María José Olmo-Jiménez, 2019. "A new approach to truncated regression for count data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 103(4), pages 503-526, December.
    8. Moritz Berger & Gerhard Tutz, 2021. "Transition models for count data: a flexible alternative to fixed distribution models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1259-1283, October.
    9. A. Baccini & L. Barabesi & M. Cioni & C. Pisani, 2014. "Crossing the hurdle: the determinants of individual scientific performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 2035-2062, December.
    10. Gozde Ozonder & Eric J. Miller, 2021. "Longitudinal analysis of activity generation in the Greater Toronto and Hamilton Area," Transportation, Springer, vol. 48(3), pages 1149-1183, June.
    11. Alberto Baccini & Lucio Barabesi & Martina Cioni & Caterina Pisani, 2013. "Crossing the hurdle: the determinants of individual scientific performance," Department of Economics University of Siena 691, Department of Economics, University of Siena.
    12. Olivier Finance & Clémentine Cottineau, 2019. "Are the absent always wrong? Dealing with zero values in urban scaling," Environment and Planning B, , vol. 46(9), pages 1663-1677, November.
    13. Luiz Paulo Fávero & Joseph F. Hair & Rafael de Freitas Souza & Matheus Albergaria & Talles V. Brugni, 2021. "Zero-Inflated Generalized Linear Mixed Models: A Better Way to Understand Data Relationships," Mathematics, MDPI, vol. 9(10), pages 1-28, May.
    14. Boncinelli, Fabio & Bartolini, Fabio & Casini, Leonardo, 2018. "Structural factors of labour allocation for farm diversification activities," Land Use Policy, Elsevier, vol. 71(C), pages 204-212.
    15. Scott, Ryan P. & Scott, Tyler A., 2019. "Investing in collaboration for safety: Assessing grants to states for oil and gas distribution pipeline safety program enhancement," Energy Policy, Elsevier, vol. 124(C), pages 332-345.
    16. Bilgic, Abdulbaki & Florkowski, Wojciech J., 2003. "Truncated-At-Zero Count Data Models With Partial Observability: An Application To The Freshwater Fishing Demand In The Southeastern U.S," 2003 Annual Meeting, February 1-5, 2003, Mobile, Alabama 35185, Southern Agricultural Economics Association.
    17. Greene, William, 2007. "Functional Form and Heterogeneity in Models for Count Data," Foundations and Trends(R) in Econometrics, now publishers, vol. 1(2), pages 113-218, August.
    18. Gauss Cordeiro & Josemar Rodrigues & Mário Castro, 2012. "The exponential COM-Poisson distribution," Statistical Papers, Springer, vol. 53(3), pages 653-664, August.
    19. Cho, Daegon & Hwang, Youngdeok & Park, Jongwon, 2018. "More buzz, more vibes: Impact of social media on concert distribution," Journal of Economic Behavior & Organization, Elsevier, vol. 156(C), pages 103-113.
    20. Ali Arab, 2015. "Spatial and Spatio-Temporal Models for Modeling Epidemiological Data with Excess Zeros," IJERPH, MDPI, vol. 12(9), pages 1-13, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:istatr:v:90:y:2022:i:2:p:216-236. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/isiiinl.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.