IDEAS home Printed from
   My bibliography  Save this article

Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach


  • Björn Stollenwerk

    () (Helmholtz Zentrum München (GmbH))

  • Thomas Welchowski

    () (Helmholtz Zentrum München (GmbH)
    Universitätsklinikum Bonn)

  • Matthias Vogl

    () (Helmholtz Zentrum München (GmbH))

  • Stephanie Stock

    () (University of Cologne)


Abstract Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.

Suggested Citation

  • Björn Stollenwerk & Thomas Welchowski & Matthias Vogl & Stephanie Stock, 2016. "Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 17(3), pages 235-244, April.
  • Handle: RePEc:spr:eujhec:v:17:y:2016:i:3:d:10.1007_s10198-015-0667-z
    DOI: 10.1007/s10198-015-0667-z

    Download full text from publisher

    File URL:
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    1. Ament, Andre & Evers, Silvia, 1993. "Cost of illness studies in health care: a comparison of two cases," Health Policy, Elsevier, vol. 26(1), pages 29-42, November.
    2. Simon N. Wood, 2003. "Thin plate regression splines," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(1), pages 95-114.
    3. Wiseman, Virginia & Mooney, Gavin, 1998. "SOUNDING BOARD: Burden of illness estimates for priority setting: a debate revisited," Health Policy, Elsevier, vol. 43(3), pages 243-251, March.
    4. Manning, Willard G. & Basu, Anirban & Mullahy, John, 2005. "Generalized modeling approaches to risk adjustment of skewed outcomes data," Journal of Health Economics, Elsevier, vol. 24(3), pages 465-488, May.
    5. Christian Kronborg Andersen & Kjeld Andersen & Per Kragh-Sørensen, 2000. "Cost function estimation: the choice of a model to apply to dementia," Health Economics, John Wiley & Sons, Ltd., vol. 9(5), pages 397-409.
    6. Blough, David K. & Madden, Carolyn W. & Hornbrook, Mark C., 1999. "Modeling risk using generalized linear models," Journal of Health Economics, Elsevier, vol. 18(2), pages 153-171, April.
    7. Shiell, Alan & Gerard, Karen & Donaldson, Cam, 1987. "Cost of illness studies: An aid to decision-making?," Health Policy, Elsevier, vol. 8(3), pages 317-323, December.
    8. Manning, Willard G. & Mullahy, John, 2001. "Estimating log models: to transform or not to transform?," Journal of Health Economics, Elsevier, vol. 20(4), pages 461-494, July.
    9. Christina Wenig, 2012. "The impact of BMI on direct costs in Children and Adolescents: empirical findings for the German Healthcare System based on the KiGGS-study," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 13(1), pages 39-50, February.
    Full references (including those not matched with items on IDEAS)


    Blog mentions

    As found by, the blog aggregator for Economics research:
    1. Method of the month: Semiparametric models with penalised splines
      by Sam Watson in The Academic Health Economists' Blog on 2017-12-19 13:00:05


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. repec:spr:eujhec:v:19:y:2018:i:2:d:10.1007_s10198-017-0873-y is not listed on IDEAS

    More about this item


    Cost-of-illness; Massive data; Generalized additive models; Error propagation;

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General
    • C5 - Mathematical and Quantitative Methods - - Econometric Modeling


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:eujhec:v:17:y:2016:i:3:d:10.1007_s10198-015-0667-z. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Sonal Shukla) or (Rebekah McClure). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.