IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v19y2022i20p13672-d949324.html
   My bibliography  Save this article

Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea

Author

Listed:
  • Yeongah Choi

    (Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Korea)

  • Jiho An

    (Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Korea)

  • Seiyoung Ryu

    (Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Korea)

  • Jaekyeong Kim

    (Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Korea
    School of Management, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Korea)

Abstract

In this study, socioeconomic, medical treatment, and health check-up data from 2010 to 2017 of the National Health Insurance Service (NHIS) of Korea were analyzed. This year’s socioeconomic, treatment, and health check-up data are used to develop a predictive model for high medical expenses in the next year. The characteristic of this study is to derive important variables related to the high cost of domestic medical expenses users by using data on health check-up items conducted by the country. In this study, we tried to classify data and evaluate its performance using classification supervised learning algorithms for high-cost medical expense prediction. Supervised learning for predicting high-cost medical expenses was performed using the logistic regression model, random forest, and XGBoost, which have been known to result the best performance and explanatory power among the machine learning algorithms used in previous studies. Our experimental results show that the XGBoost model had the best performance with 77.1% accuracy. The contribution of this study is to identify the variables that affect the prediction of high-cost medical expenses by analyzing the medical bills using the health check-up variables and the Korea Classification Disease (KCD) large group as input variables. Through this study, it was confirmed that musculoskeletal disorders (M) and respiratory diseases (J), which are the most frequently treated diseases, as important KCD disease groups for high-cost prediction in Korea, affect the future high cost prediction. In addition, it was confirmed that malignant neoplasia diseases (C) with high medical cost per treatment are a group of diseases related to high future medical cost prediction. Unlike previous studies, it is the result of analyzing all disease data, so it is expected that the study will be more meaningful when compared with the results of other national health check-up data.

Suggested Citation

  • Yeongah Choi & Jiho An & Seiyoung Ryu & Jaekyeong Kim, 2022. "Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea," IJERPH, MDPI, vol. 19(20), pages 1-16, October.
  • Handle: RePEc:gam:jijerp:v:19:y:2022:i:20:p:13672-:d:949324
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/19/20/13672/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/19/20/13672/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Dimitris Bertsimas & Margrét V. Bjarnadóttir & Michael A. Kane & J. Christian Kryder & Rudra Pandey & Santosh Vempala & Grant Wang, 2008. "Algorithmic Prediction of Health-Care Costs," Operations Research, INFORMS, vol. 56(6), pages 1382-1392, December.
    2. I. Duncan & M. Loginov & M. Ludkovski, 2016. "Testing Alternative Regression Frameworks for Predictive Modeling of Health Care Costs," North American Actuarial Journal, Taylor & Francis Journals, vol. 20(1), pages 65-87, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sungchul Park & Anirban Basu, 2018. "Alternative evaluation metrics for risk adjustment methods," Health Economics, John Wiley & Sons, Ltd., vol. 27(6), pages 984-1010, June.
    2. Adam Maidman & Lan Wang, 2018. "New semiparametric method for predicting high‐cost patients," Biometrics, The International Biometric Society, vol. 74(3), pages 1104-1111, September.
    3. Lennon, Conor, 2021. "Are the costs of employer-sponsored health insurance passed on to workers at the individual level?," Economics & Human Biology, Elsevier, vol. 41(C).
    4. Sriubaite, I. & Harris, A. & Jones, A.M. & Gabbe, B., 2020. "Economic Consequences of Road Traffic Injuries. Application of the Super Learner algorithm," Health, Econometrics and Data Group (HEDG) Working Papers 20/20, HEDG, c/o Department of Economics, University of York.
    5. Alexandre Vimont & Henri Leleu & Isabelle Durand-Zaleski, 2022. "Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 23(2), pages 211-223, March.
    6. Florian Buchner & Jürgen Wasem & Sonja Schillo, 2017. "Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?," Health Economics, John Wiley & Sons, Ltd., vol. 26(1), pages 74-85, January.
    7. Fabio Baione & Davide Biancalana & Paolo De Angelis, 2020. "A Risk Based approach for the Solvency Capital requirement for Health Plans," Papers 2011.09254, arXiv.org.
    8. William, Jananie & Loong, Bronwyn & Hanna, Dana & Parkinson, Bonny & Loxton, Deborah, 2022. "Lifetime health costs of intimate partner violence: A prospective longitudinal cohort study with linked data for out-of-hospital and pharmaceutical costs," Economic Modelling, Elsevier, vol. 116(C).
    9. D. Cattel & R. C. Kleef & R. C. J. A. Vliet, 2017. "A method to simulate incentives for cost containment under various cost sharing designs: an application to a first-euro deductible and a doughnut hole," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 18(8), pages 987-1000, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:19:y:2022:i:20:p:13672-:d:949324. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.