IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0279540.html
   My bibliography  Save this article

The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data

Author

Listed:
  • Benedikt Langenberger
  • Timo Schulte
  • Oliver Groene

Abstract

Our aim was to predict future high-cost patients with machine learning using healthcare claims data. We applied a random forest (RF), a gradient boosting machine (GBM), an artificial neural network (ANN) and a logistic regression (LR) to predict high-cost patients in the following year. Therefore, we exploited routinely collected sickness funds claims and cost data of the years 2016, 2017 and 2018. Various specifications of each algorithm were trained and cross-validated on training data (n = 20,984) with claims and cost data from 2016 and outcomes from 2017. The best performing specifications of each algorithm were selected based on validation dataset performance. For performance comparison, selected models were applied to unforeseen data with features of the year 2017 and outcomes of the year 2018 (n = 21,146). The RF was the best performing algorithm measured by the area under the receiver operating curve (AUC) with a value of 0.883 (95% confidence interval (CI): 0.872–0.893) on test data, followed by the GBM (AUC = 0.878; 95% CI: 0.867–0.889). The ANN (AUC = 0.846; 95% CI: 0.834–0.857) and LR (AUC = 0.839; 95% CI: 0.826–0.852) were significantly outperformed by the GBM and the RF. All ML algorithms and the LR performed ´good´ (i.e. 0.9 > AUC ≥ 0.8). We were able to develop machine learning models that predict high-cost patients with ‘good’ performance facilitating routinely collected sickness fund claims and cost data. We found that tree-based models performed best and outperformed the ANN and LR.

Suggested Citation

  • Benedikt Langenberger & Timo Schulte & Oliver Groene, 2023. "The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data," PLOS ONE, Public Library of Science, vol. 18(1), pages 1-16, January.
  • Handle: RePEc:plo:pone00:0279540
    DOI: 10.1371/journal.pone.0279540
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0279540
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0279540&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0279540?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Eric French & Elaine Kelly & Pieter Bakx & Owen O'Donnell & Eddy Doorslaer, 2016. "Spending on Health Care in the Netherlands: Not Going So Dutch," Fiscal Studies, Institute for Fiscal Studies, vol. 37, pages 593-625, September.
    2. Muchlinski, David & Siroky, David & He, Jingrui & Kocher, Matthew, 2016. "Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data," Political Analysis, Cambridge University Press, vol. 24(1), pages 87-103, January.
    3. Cohen, Steven B., 2016. "The concentration of health care expenditures in the U.S. and predictions of future spending," Journal of Economic and Social Measurement, IOS Press, issue 2, pages 167-189.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bakx, Pieter & Wouterse, Bram & van Doorslaer, Eddy & Wong, Albert, 2020. "Better off at home? Effects of nursing home eligibility on costs, hospitalizations and survival," Journal of Health Economics, Elsevier, vol. 73(C).
    2. Songul Cinaroglu, 2020. "Modelling unbalanced catastrophic health expenditure data by using machine‐learning methods," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 27(4), pages 168-181, October.
    3. Ku, Arthur Lin & Qiu, Yueming (Lucy) & Lou, Jiehong & Nock, Destenie & Xing, Bo, 2022. "Changes in hourly electricity consumption under COVID mandates: A glance to future hourly residential power consumption pattern with remote work in Arizona," Applied Energy, Elsevier, vol. 310(C).
    4. Hua Chen & Xiaobo Peng & Menghan Shen, 2021. "Concentration and Persistence of Healthcare Spending: Evidence from China," Sustainability, MDPI, vol. 13(11), pages 1-17, May.
    5. David Siroky & Carolyn M. Warner & Gabrielle Filip-Crawford & Anna Berlin & Steven L. Neuberg, 2020. "Grievances and rebellion: Comparing relative deprivation and horizontal inequality," Conflict Management and Peace Science, Peace Science Society (International), vol. 37(6), pages 694-715, November.
    6. Gallego, Jorge & Rivero, Gonzalo & Martínez, Juan, 2021. "Preventing rather than punishing: An early warning model of malfeasance in public procurement," International Journal of Forecasting, Elsevier, vol. 37(1), pages 360-377.
    7. Friedrich Breyer & Normann Lorenz, 2021. "The “red herring” after 20 years: ageing and health care expenditures," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 22(5), pages 661-667, July.
    8. Bonekamp, Johan & Wouterse, Bram, 2023. "Do different shocks in health matter for wealth?," Journal of Health Economics, Elsevier, vol. 87(C).
    9. Normann Lorenz & Peter Ihle & Friedrich Breyer, 2020. "Aging and Health Care Expenditures: A Non-Parametric Approach," CESifo Working Paper Series 8216, CESifo.
    10. Cäzilia Loibl & Wändi Bruine de Bruin & Barbara Summers & Simon McNair & Pieter Verhallen, 2022. "Which financial stressors are linked to food insecurity among older adults in the United Kingdom, Germany, and the Netherlands? An exploratory study," Food Security: The Science, Sociology and Economics of Food Production and Access to Food, Springer;The International Society for Plant Pathology, vol. 14(2), pages 533-556, April.
    11. Zhaochen He & John Camobreco & Keith Perkins, 2022. "How he won: Using machine learning to understand Trump’s 2016 victory," Journal of Computational Social Science, Springer, vol. 5(1), pages 905-947, May.
    12. Kárpáti, Daniel, 2023. "Essays in finance & health," Other publications TiSEM 5505e140-1f4d-4f61-a5a5-e, Tilburg University, School of Economics and Management.
    13. Pieter Bakx & Bram Wouterse & Eddy (E.K.A.) van Doorslaer & Albert Wong, 2018. "Better off at home? Effects of a nursing home admission on costs, hospitalizations and survival," Tinbergen Institute Discussion Papers 18-060/V, Tinbergen Institute.
    14. Phil Henrickson, 2020. "Predicting the costs of war," The Journal of Defense Modeling and Simulation, , vol. 17(3), pages 285-308, July.
    15. Marie K. Schellens & Salim Belyazid, 2020. "Revisiting the Contested Role of Natural Resources in Violent Conflict Risk through Machine Learning," Sustainability, MDPI, vol. 12(16), pages 1-29, August.
    16. Kallestrup-Lamb, Malene & Marin, Alexander O.K. & Menon, Seetha & Søgaard, Jes, 2024. "Aging populations and expenditures on health," The Journal of the Economics of Ageing, Elsevier, vol. 29(C).
    17. John Cuffe & Sudip Bhattacharjee & Ugochukwu Etudo & Justin C. Smith & Nevada Basdeo & Nathaniel Burbank & Shawn R. Roberts, 2019. "Using Public Data to Generate Industrial Classification Codes," NBER Chapters, in: Big Data for Twenty-First-Century Economic Statistics, pages 229-246, National Bureau of Economic Research, Inc.
    18. Wouterse, B.; & Hussem, A.; & Wong, A.;, 2018. "The effect of co-payments in Long Term Care on the distribution of payments,consumption, and risk," Health, Econometrics and Data Group (HEDG) Working Papers 18/24, HEDG, c/o Department of Economics, University of York.
    19. Krabbe-Alkemade, Yvonne & Makai, Peter & Shestalova, Victoria & Voesenek, Tessa, 2020. "Containing or shifting? Health expenditure decomposition for the aging Dutch population after a major reform," Health Policy, Elsevier, vol. 124(3), pages 268-274.
    20. Julio López Laborda & Carmen Marín González & Jorge Onrubia, 2020. "Observatorio sobre el reparto de los impuestos y las prestaciones entre los hogares españoles. Quinto informe – Sanidad y educación, 2013 - 2017," Studies on the Spanish Economy eee2020-28, FEDEA.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0279540. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.