IDEAS home Printed from https://ideas.repec.org/a/bpj/ijbist/v18y2022i1p183-202n12.html
   My bibliography  Save this article

The effect of data aggregation on dispersion estimates in count data models

Author

Listed:
  • Errington Adam
  • Einbeck Jochen
  • Cumming Jonathan

    (Department of Mathematical Sciences, Durham University, Durham, UK)

  • Rössler Ute
  • Endesfelder David

    (Bundesamt für Strahlenschutz (BfS), Oberschleissheim, Germany)

Abstract

For the modelling of count data, aggregation of the raw data over certain subgroups or predictor configurations is common practice. This is, for instance, the case for count data biomarkers of radiation exposure. Under the Poisson law, count data can be aggregated without loss of information on the Poisson parameter, which remains true if the Poisson assumption is relaxed towards quasi-Poisson. However, in biodosimetry in particular, but also beyond, the question of how the dispersion estimates for quasi-Poisson models behave under data aggregation have received little attention. Indeed, for real data sets featuring unexplained heterogeneities, dispersion estimates can increase strongly after aggregation, an effect which we will demonstrate and quantify explicitly for some scenarios. The increase in dispersion estimates implies an inflation of the parameter standard errors, which, however, by comparison with random effect models, can be shown to serve a corrective purpose. The phenomena are illustrated by γ-H2AX foci data as used for instance in radiation biodosimetry for the calibration of dose-response curves.

Suggested Citation

  • Errington Adam & Einbeck Jochen & Cumming Jonathan & Rössler Ute & Endesfelder David, 2022. "The effect of data aggregation on dispersion estimates in count data models," The International Journal of Biostatistics, De Gruyter, vol. 18(1), pages 183-202, May.
  • Handle: RePEc:bpj:ijbist:v:18:y:2022:i:1:p:183-202:n:12
    DOI: 10.1515/ijb-2020-0079
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/ijb-2020-0079
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/ijb-2020-0079?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hinde, John & Demetrio, Clarice G. B., 1998. "Overdispersion: Models and estimation," Computational Statistics & Data Analysis, Elsevier, vol. 27(2), pages 151-170, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maria Iannario, 2015. "Detecting latent components in ordinal data with overdispersion by means of a mixture distribution," Quality & Quantity: International Journal of Methodology, Springer, vol. 49(3), pages 977-987, May.
    2. Molenberghs, Geert & Verbeke, Geert & Iddi, Samuel & Demétrio, Clarice G.B., 2012. "A combined beta and normal random-effects model for repeated, overdispersed binary and binomial data," Journal of Multivariate Analysis, Elsevier, vol. 111(C), pages 94-109.
    3. Steven Abrams & Marc Aerts & Geert Molenberghs & Niel Hens, 2017. "Parametric overdispersed frailty models for current status data," Biometrics, The International Biometric Society, vol. 73(4), pages 1388-1400, December.
    4. Aeberhard, William H. & Cantoni, Eva & Heritier, Stephane, 2017. "Saddlepoint tests for accurate and robust inference on overdispersed count data," Computational Statistics & Data Analysis, Elsevier, vol. 107(C), pages 162-175.
    5. Sami Mestiri & Abdeljelil Farhat, 2021. "Using Non-parametric Count Model for Credit Scoring," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 19(1), pages 39-49, March.
    6. I. Gijbels & I. Prosdocimi & G. Claeskens, 2010. "Nonparametric estimation of mean and dispersion functions in extended generalized linear models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 19(3), pages 580-608, November.
    7. I. Gijbels & I. Prosdocimi, 2011. "Smooth estimation of mean and dispersion function in extended generalized additive models with application to Italian induced abortion data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(11), pages 2391-2411, December.
    8. Iddi, Samuel & Molenberghs, Geert, 2012. "A combined overdispersed and marginalized multilevel model," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1944-1951.
    9. Cory Anderson & Shuai Zhou & Guangqing Chi, 2023. "Population-Wide Vaccination Hesitancy among the Amish: A County-Level Study of COVID-19 Vaccine Adoption and Implications for Public Health Policy and Practice," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 42(4), pages 1-24, August.
    10. Iraj Kazemi & Fatemeh Hassanzadeh, 2021. "Marginalized random-effects models for clustered binomial data through innovative link functions," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(2), pages 197-228, June.
    11. Croux, C. & Gijbels, I. & Prosdocimi, I., 2010. "Robust Estimation of Mean and Dispersion Functions in Extended Generalized Additive Models," Other publications TiSEM a188c2bc-8a96-44c9-b1e6-0, Tilburg University, School of Economics and Management.
    12. Borges, Patrick & Rodrigues, Josemar & Balakrishnan, Narayanaswamy & Bazán, Jorge, 2014. "A COM–Poisson type generalization of the binomial distribution and its properties and applications," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 158-166.
    13. Nasim Vahabi & Anoshirvan Kazemnejad & Somnath Datta, 2018. "A Marginalized Overdispersed Location Scale Model for Clustered Ordinal Data," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(1), pages 103-134, December.
    14. Jussiane Nader Gonçalves & Wagner Barreto-Souza, 2020. "Flexible regression models for counts with high-inflation of zeros," METRON, Springer;Sapienza Università di Roma, vol. 78(1), pages 71-95, April.
    15. Aregay, Mehreteab & Shkedy, Ziv & Molenberghs, Geert, 2013. "A hierarchical Bayesian approach for the analysis of longitudinal count data with overdispersion: A simulation study," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 233-245.
    16. Oludare Ariyo & Emmanuel Lesaffre & Geert Verbeke & Adrian Quintero, 2022. "Bayesian Model Selection for Longitudinal Count Data," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 84(2), pages 516-547, November.
    17. William H. Greene & David A. Hensher, 2008. "Modeling Ordered Choices: A Primer and Recent Developments," Working Papers 08-26, New York University, Leonard N. Stern School of Business, Department of Economics.
    18. Lee, Dae-Jin & Durbán, María, 2008. "Smooth-car mixed models for spatial count data," DES - Working Papers. Statistics and Econometrics. WS ws085820, Universidad Carlos III de Madrid. Departamento de Estadística.
    19. Rahma Abid & Célestin C. Kokonendji & Afif Masmoudi, 2021. "On Poisson-exponential-Tweedie models for ultra-overdispersed count data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(1), pages 1-23, March.
    20. Adrián Quintero-Sarmiento & Edilberto Cepeda-Cuervo & Vicente Núñez-Antón, 2012. "Estimating infant mortality in Colombia: some overdispersion modelling approaches," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(5), pages 1011-1036, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:ijbist:v:18:y:2022:i:1:p:183-202:n:12. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.