IDEAS home Printed from https://ideas.repec.org/p/bsl/wpaper/2014-13.html
   My bibliography  Save this paper

Visualizing Count Data Regressions Using Rootograms

Author

Listed:
  • Kleiber, Christian

    (University of Basel)

  • Zeileis, Achim

Abstract

The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here we show that rootograms are also useful for diagnosing and treating issues such as overdispersion and/or excess zeros in regression models for count data. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, e.g., in finite mixture models. Two empirical illustrations are included, one from ethology, the other from public health. The former employs a negative binomial hurdle regression, the latter a two-component finite mixture of negative binomial models. The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here we show that rootograms are also useful for diagnosing and treating issues such as overdispersion and/or excess zeros in regression models for count data. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, e.g., in finite mixture models. Two empirical illustrations are included, one from ethology, the other from public health. The former employs a negative binomial hurdle regression, the latter a two-component finite mixture of negative binomial models. A further illustration involving underdispersion and an R implementation of our tools are available in the R package 'countreg'.

Suggested Citation

  • Kleiber, Christian & Zeileis, Achim, 2014. "Visualizing Count Data Regressions Using Rootograms," Working papers 2014/13, Faculty of Business and Economics - University of Basel.
  • Handle: RePEc:bsl:wpaper:2014/13
    as

    Download full text from publisher

    File URL: https://edoc.unibas.ch/42891/1/Rootograms_Kleiber_2014.13_final.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Cameron,A. Colin & Trivedi,Pravin K., 2013. "Regression Analysis of Count Data," Cambridge Books, Cambridge University Press, number 9781107667273.
    2. Zeileis, Achim & Kleiber, Christian & Jackman, Simon, 2008. "Regression Models for Count Data in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i08).
    3. Mullahy, John, 1997. "Heterogeneity, Excess Zeros, and the Structure of Count Data Models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 12(3), pages 337-350, May-June.
    4. Mullahy, John, 1986. "Specification and testing of some modified count data models," Journal of Econometrics, Elsevier, vol. 33(3), pages 341-365, December.
    5. Deb, Partha & Trivedi, Pravin K, 1997. "Demand for Medical Care by the Elderly: A Finite Mixture Approach," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 12(3), pages 313-336, May-June.
    6. Fox, John & Hong, Jangman, 2009. "Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 32(i01).
    7. Stasinopoulos, D. Mikis & Rigby, Robert A., 2007. "Generalized Additive Models for Location Scale and Shape (GAMLSS) in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i07).
    8. Fox, John, 2003. "Effect Displays in R for Generalised Linear Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 8(i15).
    9. Leisch, Friedrich, 2004. "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i08).
    10. Grün, Bettina & Leisch, Friedrich, 2008. "FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i04).
    11. Cameron, A. Colin & Trivedi, Pravin K., 1990. "Regression-based tests for overdispersion in the Poisson model," Journal of Econometrics, Elsevier, vol. 46(3), pages 347-364, December.
    12. R. A. Rigby & D. M. Stasinopoulos, 2005. "Generalized additive models for location, scale and shape," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 507-554, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Balakrishnan, Srijith & Lim, Taehoon & Zhang, Zhanmin, 2022. "A methodology for evaluating the economic risks of hurricane-related disruptions to port operations," Transportation Research Part A: Policy and Practice, Elsevier, vol. 162(C), pages 58-79.
    2. Lagona, Francesco & Padovano, Fabio, 2021. "How does legislative behavior change when the country becomes democratic? The case of South Korea," European Journal of Political Economy, Elsevier, vol. 69(C).
    3. Cornelia Fuetterer & Thomas Augustin & Christiane Fuchs, 2020. "Adapted single-cell consensus clustering (adaSC3)," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(4), pages 885-896, December.
    4. Yaşar Tonta & Müge Akbulut, 2020. "Does monetary support increase citation impact of scholarly papers?," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1617-1641, November.
    5. Chiara Bocci & Laura Grassini & Emilia Rocco, 2021. "A multiple inflated negative binomial hurdle regression model: analysis of the Italians’ tourism behaviour during the Great Recession," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1109-1133, October.
    6. Adrian Richter & Julia Truthmann & Jean-François Chenot & Carsten Oliver Schmidt, 2021. "Predicting Physician Consultations for Low Back Pain Using Claims Data and Population-Based Cohort Data—An Interpretable Machine Learning Approach," IJERPH, MDPI, vol. 18(22), pages 1-14, November.
    7. Cornelius Fritz & Göran Kauermann, 2022. "On the interplay of regional mobility, social connectedness and the spread of COVID‐19 in Germany," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 400-424, January.
    8. Candelon, Bertrand & Joëts, Marc & Mignon, Valérie, 2023. "What Makes Econometric Ideas Popular: The Role of Connectivity," LIDAM Discussion Papers LFIN 2023005, Université catholique de Louvain, Louvain Finance (LFIN).
    9. Gozde Ozonder & Eric J. Miller, 2021. "Longitudinal analysis of activity generation in the Greater Toronto and Hamilton Area," Transportation, Springer, vol. 48(3), pages 1149-1183, June.
    10. Virgili, Auriane & Racine, Mélanie & Authier, Matthieu & Monestiez, Pascal & Ridoux, Vincent, 2017. "Comparison of habitat models for scarcely detected species," Ecological Modelling, Elsevier, vol. 346(C), pages 88-98.
    11. Marcelo Bourguignon & Rodrigo M. R. Medeiros, 2022. "A simple and useful regression model for fitting count data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(3), pages 790-827, September.
    12. Bilal Barakat, 2017. "Generalised count distributions for modelling parity," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 36(26), pages 745-758.
    13. Brutti, Zelda & Montolio, Daniel, 2021. "Preventing criminal minds: Early education access and adult offending behavior," Journal of Economic Behavior & Organization, Elsevier, vol. 191(C), pages 97-126.
    14. Thorsten Simon & Georg J. Mayr & Nikolaus Umlauf & Achim Zeileis, 2018. "Lightning Prediction Using Model Output Statistics," Working Papers 2018-14, Faculty of Economics and Statistics, Universität Innsbruck.
    15. Evangelos Papadias & Vassilis Detsis & Antonis Hadjikyriacou & Apostolos G. Papadopoulos & Christoforos Vradis & Christos Chalkias, 2023. "Long-Term Dynamics of Viticultural Landscape in Cyprus—Four Centuries of Expansion, Contraction and Spatial Displacement," Land, MDPI, vol. 12(6), pages 1-23, May.
    16. Francesco Lagona & Fabio Padovano, 2020. "How does legislative behavior change when the country becomes democratic? The case of South Korea," Economics Working Paper from Condorcet Center for political Economy at CREM-CNRS 2020-02-ccr, Condorcet Center for political Economy.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zeileis, Achim & Kleiber, Christian & Jackman, Simon, 2008. "Regression Models for Count Data in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i08).
    2. Kneib, Thomas & Silbersdorff, Alexander & Säfken, Benjamin, 2023. "Rage Against the Mean – A Review of Distributional Regression Approaches," Econometrics and Statistics, Elsevier, vol. 26(C), pages 99-123.
    3. Ana María Martínez-Rodríguez & Antonio Conde-Sánchez & María José Olmo-Jiménez, 2019. "A new approach to truncated regression for count data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 103(4), pages 503-526, December.
    4. Moritz Berger & Gerhard Tutz, 2021. "Transition models for count data: a flexible alternative to fixed distribution models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1259-1283, October.
    5. John Haslett & Andrew C. Parnell & John Hinde & Rafael de Andrade Moral, 2022. "Modelling Excess Zeros in Count Data: A New Perspective on Modelling Approaches," International Statistical Review, International Statistical Institute, vol. 90(2), pages 216-236, August.
    6. repec:jss:jstsof:27:i08 is not listed on IDEAS
    7. Stefano Mainardi, 2003. "Testing convergence in life expectancies: count regression models on panel data," Prague Economic Papers, Prague University of Economics and Business, vol. 2003(4), pages 350-370.
    8. Livio Finos & Fortunato Pesarin, 2020. "On zero-inflated permutation testing and some related problems," Statistical Papers, Springer, vol. 61(5), pages 2157-2174, October.
    9. Bach, Philipp & Farbmacher, Helmut & Spindler, Martin, 2018. "Semiparametric count data modeling with an application to health service demand," Econometrics and Statistics, Elsevier, vol. 8(C), pages 125-140.
    10. Sisira Sarma & Wayne Simpson, 2006. "A microeconometric analysis of Canadian health care utilization," Health Economics, John Wiley & Sons, Ltd., vol. 15(3), pages 219-239, March.
    11. Margarita E. Romero Rodríguez & Enrique Los Arcos & Victor Cano Fernández & Miguel Sánchez Padrón, 2001. "Modelo para datos de recuentro de corte transversal con exceso de ceros. Aplicación a citas patentes," Documentos de trabajo conjunto ULL-ULPGC 2001-05, Facultad de Ciencias Económicas de la ULPGC.
    12. Valérie Mignon & Marc Joëts & Bertrand Candelon, 2023. "What Makes Econometric Ideas Popular: The Role of Connectivity," Working Papers hal-04343996, HAL.
    13. Nan-Ting Liu & Feng-Chang Lin & Yu-Shan Shih, 2020. "Count regression trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 5-27, March.
    14. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    15. Yixuan Zou & Jan Hannig & Derek S. Young, 2021. "Generalized fiducial inference on the mean of zero-inflated Poisson and Poisson hurdle models," Journal of Statistical Distributions and Applications, Springer, vol. 8(1), pages 1-15, December.
    16. Sergi Jiménez‐Martín & José M. Labeaga & Maite Martínez‐Granado, 2002. "Latent class versus two‐part models in the demand for physician services across the European Union," Health Economics, John Wiley & Sons, Ltd., vol. 11(4), pages 301-321, June.
    17. repec:jss:jstsof:36:i07 is not listed on IDEAS
    18. Gozde Ozonder & Eric J. Miller, 2021. "Longitudinal analysis of activity generation in the Greater Toronto and Hamilton Area," Transportation, Springer, vol. 48(3), pages 1149-1183, June.
    19. Olivier Finance & Clémentine Cottineau, 2019. "Are the absent always wrong? Dealing with zero values in urban scaling," Environment and Planning B, , vol. 46(9), pages 1663-1677, November.
    20. Francesco Zuniga & Tomasz J. Kozubowski & Anna K. Panorska, 2021. "A new trivariate model for stochastic episodes," Journal of Statistical Distributions and Applications, Springer, vol. 8(1), pages 1-21, December.
    21. Yixuan Wang & Jianzhu Li & Ping Feng & Rong Hu, 2015. "A Time-Dependent Drought Index for Non-Stationary Precipitation Series," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 29(15), pages 5631-5647, December.
    22. Luiz Paulo Fávero & Joseph F. Hair & Rafael de Freitas Souza & Matheus Albergaria & Talles V. Brugni, 2021. "Zero-Inflated Generalized Linear Mixed Models: A Better Way to Understand Data Relationships," Mathematics, MDPI, vol. 9(10), pages 1-28, May.

    More about this item

    Keywords

    rootogram ; visualization ; goodness of fit ; count data ; Poisson regression ; negative binomial regression ; hurdle model ; finite mixture;
    All these keywords.

    JEL classification:

    • C25 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions; Probabilities
    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
    • C87 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Econometric Software

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bsl:wpaper:2014/13. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: WWZ (email available below). General contact details of provider: https://edirc.repec.org/data/wwzbsch.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.