IDEAS home Printed from https://ideas.repec.org/a/spr/hecrev/v13y2023i1d10.1186_s13561-023-00422-1.html
   My bibliography  Save this article

Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets

Author

Listed:
  • Nhung Nghiem

    (University of Otago)

  • June Atkinson

    (University of Otago)

  • Binh P. Nguyen

    (Victoria University of Wellington)

  • An Tran-Duy

    (Melbourne School of Population and Global Health, University of Melbourne)

  • Nick Wilson

    (University of Otago)

Abstract

Objectives To optimise planning of public health services, the impact of high-cost users needs to be considered. However, most of the existing statistical models for costs do not include many clinical and social variables from administrative data that are associated with elevated health care resource use, and are increasingly available. This study aimed to use machine learning approaches and big data to predict high-cost users among people with cardiovascular disease (CVD). Methods We used nationally representative linked datasets in New Zealand to predict CVD prevalent cases with the most expensive cost belonging to the top quintiles by cost. We compared the performance of four popular machine learning models (L1-regularised logistic regression, classification trees, k-nearest neighbourhood (KNN) and random forest) with the traditional regression models. Results The machine learning models had far better accuracy in predicting high health-cost users compared with the logistic models. The harmony score F1 (combining sensitivity and positive predictive value) of the machine learning models ranged from 30.6% to 41.2% (compared with 8.6–9.1% for the logistic models). Previous health costs, income, age, chronic health conditions, deprivation, and receiving a social security benefit were among the most important predictors of the CVD high-cost users. Conclusions This study provides additional evidence that machine learning can be used as a tool together with big data in health economics for identification of new risk factors and prediction of high-cost users with CVD. As such, machine learning may potentially assist with health services planning and preventive measures to improve population health while potentially saving healthcare costs.

Suggested Citation

  • Nhung Nghiem & June Atkinson & Binh P. Nguyen & An Tran-Duy & Nick Wilson, 2023. "Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets," Health Economics Review, Springer, vol. 13(1), pages 1-13, December.
  • Handle: RePEc:spr:hecrev:v:13:y:2023:i:1:d:10.1186_s13561-023-00422-1
    DOI: 10.1186/s13561-023-00422-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1186/s13561-023-00422-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1186/s13561-023-00422-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Andrew H. Briggs, 2022. "Healing the past, reimagining the present, investing in the future: What should be the role of race as a proxy covariate in health economics informed health care policy?," Health Economics, John Wiley & Sons, Ltd., vol. 31(10), pages 2115-2119, October.
    2. Steve Ryder & Kathleen Fox & Pratik Rane & Nigel Armstrong & Ching-Yun Wei & Sohan Deshpande & Lisa Stirk & Yi Qian & Jos Kleijnen, 2019. "A Systematic Review of Direct Cardiovascular Event Costs: An International Perspective," PharmacoEconomics, Springer, vol. 37(7), pages 895-919, July.
    3. Thomas G. McGuire & Anna L. Zink & Sherri Rose, 2020. "Simplifying and Improving the Performance of Risk Adjustment Systems," NBER Working Papers 26736, National Bureau of Economic Research, Inc.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marica Iommi & Savannah Bergquist & Gianluca Fiorentini & Francesco Paolucci, 2022. "Comparing risk adjustment estimation methods under data availability constraints," Health Economics, John Wiley & Sons, Ltd., vol. 31(7), pages 1368-1380, July.
    2. Anell, Anders & Dackehag, Margareta & Dietrichson, Jens & Ellegård, Lina Maria & Kjellsson, Gustav, 2022. "Better Off by Risk Adjustment? Socioeconomic Disparities in Care Utilization in Sweden Following a Payment Reform," Working Papers 2022:15, Lund University, Department of Economics, revised 12 Mar 2024.
    3. Richard C. van Kleef & René C. J. A. van Vliet, 2022. "How to deal with persistently low/high spenders in health plan payment systems?," Health Economics, John Wiley & Sons, Ltd., vol. 31(5), pages 784-805, May.

    More about this item

    Keywords

    Machine learning; High-cost users; CVD cost prediction; Health and social administrative data; New Zealand;
    All these keywords.

    JEL classification:

    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • I15 - Health, Education, and Welfare - - Health - - - Health and Economic Development
    • N37 - Economic History - - Labor and Consumers, Demography, Education, Health, Welfare, Income, Wealth, Religion, and Philanthropy - - - Africa; Oceania

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:hecrev:v:13:y:2023:i:1:d:10.1186_s13561-023-00422-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com/economics/journal/13561 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.