IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0279540.html
   My bibliography  Save this article

The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data

Author

Listed:
  • Benedikt Langenberger
  • Timo Schulte
  • Oliver Groene

Abstract

Our aim was to predict future high-cost patients with machine learning using healthcare claims data. We applied a random forest (RF), a gradient boosting machine (GBM), an artificial neural network (ANN) and a logistic regression (LR) to predict high-cost patients in the following year. Therefore, we exploited routinely collected sickness funds claims and cost data of the years 2016, 2017 and 2018. Various specifications of each algorithm were trained and cross-validated on training data (n = 20,984) with claims and cost data from 2016 and outcomes from 2017. The best performing specifications of each algorithm were selected based on validation dataset performance. For performance comparison, selected models were applied to unforeseen data with features of the year 2017 and outcomes of the year 2018 (n = 21,146). The RF was the best performing algorithm measured by the area under the receiver operating curve (AUC) with a value of 0.883 (95% confidence interval (CI): 0.872–0.893) on test data, followed by the GBM (AUC = 0.878; 95% CI: 0.867–0.889). The ANN (AUC = 0.846; 95% CI: 0.834–0.857) and LR (AUC = 0.839; 95% CI: 0.826–0.852) were significantly outperformed by the GBM and the RF. All ML algorithms and the LR performed ´good´ (i.e. 0.9 > AUC ≥ 0.8). We were able to develop machine learning models that predict high-cost patients with ‘good’ performance facilitating routinely collected sickness fund claims and cost data. We found that tree-based models performed best and outperformed the ANN and LR.

Suggested Citation

  • Benedikt Langenberger & Timo Schulte & Oliver Groene, 2023. "The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data," PLOS ONE, Public Library of Science, vol. 18(1), pages 1-16, January.
  • Handle: RePEc:plo:pone00:0279540
    DOI: 10.1371/journal.pone.0279540
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0279540
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0279540&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0279540?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Eric French & Elaine Kelly & Pieter Bakx & Owen O'Donnell & Eddy Doorslaer, 2016. "Spending on Health Care in the Netherlands: Not Going So Dutch," Fiscal Studies, Institute for Fiscal Studies, vol. 37, pages 593-625, September.
    2. Muchlinski, David & Siroky, David & He, Jingrui & Kocher, Matthew, 2016. "Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data," Political Analysis, Cambridge University Press, vol. 24(1), pages 87-103, January.
    3. Steven B. Cohen, 2016. "The concentration of health care expenditures in the U.S. and predictions of future spending," Journal of Economic and Social Measurement, IOS Press, issue 2, pages 167-189.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bakx, Pieter & Wouterse, Bram & van Doorslaer, Eddy & Wong, Albert, 2020. "Better off at home? Effects of nursing home eligibility on costs, hospitalizations and survival," Journal of Health Economics, Elsevier, vol. 73(C).
    2. Songul Cinaroglu, 2020. "Modelling unbalanced catastrophic health expenditure data by using machine‐learning methods," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 27(4), pages 168-181, October.
    3. Hua Chen & Xiaobo Peng & Menghan Shen, 2021. "Concentration and Persistence of Healthcare Spending: Evidence from China," Sustainability, MDPI, vol. 13(11), pages 1-17, May.
    4. David Siroky & Carolyn M. Warner & Gabrielle Filip-Crawford & Anna Berlin & Steven L. Neuberg, 2020. "Grievances and rebellion: Comparing relative deprivation and horizontal inequality," Conflict Management and Peace Science, Peace Science Society (International), vol. 37(6), pages 694-715, November.
    5. Bonekamp, Johan & Wouterse, Bram, 2023. "Do different shocks in health matter for wealth?," Journal of Health Economics, Elsevier, vol. 87(C).
    6. Friedrich Breyer & Normann Lorenz, 2021. "The “red herring” after 20 years: ageing and health care expenditures," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 22(5), pages 661-667, July.
    7. Zhaochen He & John Camobreco & Keith Perkins, 2022. "How he won: Using machine learning to understand Trump’s 2016 victory," Journal of Computational Social Science, Springer, vol. 5(1), pages 905-947, May.
    8. Kárpáti, Daniel, 2023. "Essays in finance & health," Other publications TiSEM 5505e140-1f4d-4f61-a5a5-e, Tilburg University, School of Economics and Management.
    9. Marie K. Schellens & Salim Belyazid, 2020. "Revisiting the Contested Role of Natural Resources in Violent Conflict Risk through Machine Learning," Sustainability, MDPI, vol. 12(16), pages 1-29, August.
    10. Kallestrup-Lamb, Malene & Marin, Alexander O.K. & Menon, Seetha & Søgaard, Jes, 2024. "Aging populations and expenditures on health," The Journal of the Economics of Ageing, Elsevier, vol. 29(C).
    11. Julio López Laborda & Carmen Marín González & Jorge Onrubia, 2020. "Observatorio sobre el reparto de los impuestos y las prestaciones entre los hogares españoles. Quinto informe – Sanidad y educación, 2013 - 2017," Studies on the Spanish Economy eee2020-28, FEDEA.
    12. Felix Ettensperger, 2020. "Comparing supervised learning algorithms and artificial neural networks for conflict prediction: performance and applicability of deep learning in the field," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(2), pages 567-601, April.
    13. Macis, Luca & Tagliapietra, Marco & Meo, Rosa & Pisano, Paola, 2024. "Breaking the trend: Anomaly detection models for early warning of socio-political unrest," Technological Forecasting and Social Change, Elsevier, vol. 206(C).
    14. Antonietta di Salvatore & Mirko Moscatelli, 2024. "Improving survey information on household debt using granular credit databases," Questioni di Economia e Finanza (Occasional Papers) 839, Bank of Italy, Economic Research and International Relations Area.
    15. Vestby, Jonas & Buhaug, Halvard & von Uexkull, Nina, 2021. "Why do some poor countries see armed conflict while others do not? A dual sector approach," World Development, Elsevier, vol. 138(C).
    16. Breyer, Friedrich & Lorenz, Normann & Ihle, Peter, 2020. "Aging and Health Care Expenditure: A non-parametric approach," VfS Annual Conference 2020 (Virtual Conference): Gender Economics 224635, Verein für Socialpolitik / German Economic Association.
    17. Miszczyńska Katarzyna M. & Miszczyński Piotr M., 2020. "Inpatient Costs in the Perspective of Polish Health Policy: Scenario Analysis," South East European Journal of Economics and Business, Sciendo, vol. 15(2), pages 43-56, December.
    18. Stefano Benati & Matteo Bon & Filippo Nardi, 2025. "Exploring the predictors of the populist vote using random forests," Quality & Quantity: International Journal of Methodology, Springer, vol. 59(2), pages 1393-1426, April.
    19. Ku, Arthur Lin & Qiu, Yueming (Lucy) & Lou, Jiehong & Nock, Destenie & Xing, Bo, 2022. "Changes in hourly electricity consumption under COVID mandates: A glance to future hourly residential power consumption pattern with remote work in Arizona," Applied Energy, Elsevier, vol. 310(C).
    20. Gallego, Jorge & Rivero, Gonzalo & Martínez, Juan, 2021. "Preventing rather than punishing: An early warning model of malfeasance in public procurement," International Journal of Forecasting, Elsevier, vol. 37(1), pages 360-377.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0279540. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.