IDEAS home Printed from https://ideas.repec.org/a/spr/eujhec/v23y2022i2d10.1007_s10198-021-01363-4.html
   My bibliography  Save this article

Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France

Author

Listed:
  • Alexandre Vimont

    (Public Health Expertise (PHE)
    Assistance Publique Hôpitaux de Paris, URC-ECO, CRESS-UMR1153)

  • Henri Leleu

    (Public Health Expertise (PHE))

  • Isabelle Durand-Zaleski

    (Assistance Publique Hôpitaux de Paris, URC-ECO, CRESS-UMR1153)

Abstract

Background Innovative provider payment methods that avoid adverse selection and reward performance require accurate prediction of healthcare costs based on individual risk adjustment. Our objective was to compare the performances of a simple neural network (NN) and random forest (RF) to a generalized linear model (GLM) for the prediction of medical cost at the individual level. Methods A 1/97 representative sample of the French National Health Data Information System was used. Predictors selected were: demographic information; pre-existing conditions, Charlson comorbidity index; healthcare service use and costs. Predictive performances of each model were compared through individual-level (adjusted R-squared (adj-R2), mean absolute error (MAE) and hit ratio (HiR)), and distribution-level metrics on different sets of covariates in the general population and by pre-existing morbid condition, using a quasi-Monte Carlo design. Results We included 510,182 subjects alive on 31st December, 2015. Mean annual costs were 1894€ (standard deviation 9326€) (median 393€, IQ range 95€; 1480€), including zero-claim subjects. All models performed similarly after adjustment on demographics. RF model had better performances on other sets of covariates (pre-existing conditions, resource counts and past year costs). On full model, RF reached an adj-R2 of 47.5%, a MAE of 1338€ and a HiR of 67%, while GLM and NN had an adj-R2 of 34.7% and 31.6%, a MAE of 1635€ and 1660€, and a HiR of 58% and 55 M, respectively. RF model outperformed GLM and NN for most conditions and for high-cost subjects. Conclusions RF should be preferred when the objective is to best predict medical costs. When the objective is to understand the contribution of predictors, GLM was well suited with demographics, conditions and base year cost.

Suggested Citation

  • Alexandre Vimont & Henri Leleu & Isabelle Durand-Zaleski, 2022. "Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 23(2), pages 211-223, March.
  • Handle: RePEc:spr:eujhec:v:23:y:2022:i:2:d:10.1007_s10198-021-01363-4
    DOI: 10.1007/s10198-021-01363-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10198-021-01363-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10198-021-01363-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jason Brown & Mark Duggan & Ilyana Kuziemko & William Woolston, 2014. "How Does Risk Selection Respond to Risk Adjustment? New Evidence from the Medicare Advantage Program," American Economic Review, American Economic Association, vol. 104(10), pages 3335-3364, October.
    2. Mark McClellan, 2011. "Reforming Payments to Healthcare Providers: The Key to Slowing Healthcare Cost Growth While Improving Quality?," Journal of Economic Perspectives, American Economic Association, vol. 25(2), pages 69-92, Spring.
    3. Ellis, Randall P. & Martins, Bruno & Zhu, Wenjia, 2017. "Demand elasticities and service selection incentives among competing private health plans," Journal of Health Economics, Elsevier, vol. 56(C), pages 352-367.
    4. Frank, Richard G. & Glazer, Jacob & McGuire, Thomas G., 2000. "Measuring adverse selection in managed health care," Journal of Health Economics, Elsevier, vol. 19(6), pages 829-854, November.
    5. Grégoire Lagasnerie & Anne-Sophie Aguadé & Pierre Denis & Anne Fagot-Campagna & Christelle Gastaldi-Menager, 2018. "The economic burden of diabetes to French national health insurance: a new cost-of-illness method based on a combined medicalized and incremental approach," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 19(2), pages 189-201, March.
    6. Sungchul Park & Anirban Basu, 2018. "Alternative evaluation metrics for risk adjustment methods," Health Economics, John Wiley & Sons, Ltd., vol. 27(6), pages 984-1010, June.
    7. Andrew M. Jones & James Lomas & Peter T. Moore & Nigel Rice, 2016. "A quasi-Monte-Carlo comparison of parametric and semiparametric regression methods for heavy-tailed and non-normal data: an application to healthcare costs," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 179(4), pages 951-974, October.
    8. Florian Buchner & Jürgen Wasem & Sonja Schillo, 2017. "Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?," Health Economics, John Wiley & Sons, Ltd., vol. 26(1), pages 74-85, January.
    9. Jones, A.M, 2010. "Models For Health Care," Health, Econometrics and Data Group (HEDG) Working Papers 10/01, HEDG, c/o Department of Economics, University of York.
    10. Glazer, Jacob & McGuire, Thomas G., 2006. "Optimal quality reporting in markets for health plans," Journal of Health Economics, Elsevier, vol. 25(2), pages 295-310, March.
    11. S. H. C. M. van Veen & R. C. van Kleef & W. P. M. M. van de Ven & R. C. J. A. van Vliet, 2018. "Exploring the predictive power of interaction terms in a sophisticated risk equalization model using regression trees," Health Economics, John Wiley & Sons, Ltd., vol. 27(2), pages 1-12, February.
    12. Colin Cameron, A. & Windmeijer, Frank A. G., 1997. "An R-squared measure of goodness of fit for some common nonlinear regression models," Journal of Econometrics, Elsevier, vol. 77(2), pages 329-342, April.
    13. I. Duncan & M. Loginov & M. Ludkovski, 2016. "Testing Alternative Regression Frameworks for Predictive Modeling of Health Care Costs," North American Actuarial Journal, Taylor & Francis Journals, vol. 20(1), pages 65-87, January.
    14. Borislava Mihaylova & Andrew Briggs & Anthony O'Hagan & Simon G. Thompson, 2011. "Review of statistical methods for analysing healthcare resources and costs," Health Economics, John Wiley & Sons, Ltd., vol. 20(8), pages 897-916, August.
    15. Manning, Willard G. & Mullahy, John, 2001. "Estimating log models: to transform or not to transform?," Journal of Health Economics, Elsevier, vol. 20(4), pages 461-494, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi Yao & Joan Schmit & Julie Shi, 2019. "Promoting sustainability for micro health insurance: a risk-adjusted subsidy approach for maternal healthcare service," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 44(3), pages 382-409, July.
    2. Toni Mora & Joan Gil & Antoni Sicras-Mainar, 2012. "The Influence of BMI, Obesity and Overweight on Medical Costs: A Panel Data Approach," Working Papers 2012-08, FEDEA.
    3. Tor Iversen & Eline Aas & Gunnar Rosenqvist & Unto Häkkinen & on behalf of the EuroHOPE study group, 2015. "Comparative Analysis of Treatment Costs in EUROHOPE," Health Economics, John Wiley & Sons, Ltd., vol. 24(S2), pages 5-22, December.
    4. Ma, Ching-to Albert & Mak, Henry Y., 2015. "Information disclosure and the equivalence of prospective payment and cost reimbursement," Journal of Economic Behavior & Organization, Elsevier, vol. 117(C), pages 439-452.
    5. Marica Iommi & Savannah Bergquist & Gianluca Fiorentini & Francesco Paolucci, 2022. "Comparing risk adjustment estimation methods under data availability constraints," Health Economics, John Wiley & Sons, Ltd., vol. 31(7), pages 1368-1380, July.
    6. Toni Mora & Joan Gil & Antoni Sicras-Mainar, 2012. "The Influence of BMI, Obesity and Overweight on Medical Costs: A Panel Data Approach," Working Papers 2012-08, FEDEA.
    7. Julie Shi & Yi Yao & Gordon Liu, 2018. "Modeling individual health care expenditures in China: Evidence to assist payment reform in public insurance," Health Economics, John Wiley & Sons, Ltd., vol. 27(12), pages 1945-1962, December.
    8. Samuel Sebsibie & Workineh Asmare & Tessema Endalkachew, 2015. "Agricultural Technology Adoption and Rural Poverty: a Study on Smallholders in Amhara Regional State, Ethiopia," Ethiopian Journal of Economics, Ethiopian Economics Association, vol. 23(2), December.
    9. Timothy J. Layton & Randall P. Ellis & Thomas G. McGuire, 2015. "Assessing Incentives for Adverse Selection in Health Plan Payment Systems," Boston University - Department of Economics - Working Papers Series wp2015-024, Boston University - Department of Economics.
    10. Bergquist, Savannah L. & Layton, Timothy J. & McGuire, Thomas G. & Rose, Sherri, 2019. "Data transformations to improve the performance of health plan payment methods," Journal of Health Economics, Elsevier, vol. 66(C), pages 195-207.
    11. Zhiyuan Hou & Ellen Van de Poel & Eddy Van Doorslaer & Baorong Yu & Qingyue Meng, 2014. "Effects Of Ncms On Access To Care And Financial Protection In China," Health Economics, John Wiley & Sons, Ltd., vol. 23(8), pages 917-934, August.
    12. Caballer-Tarazona, Vicent & Guadalajara-Olmeda, Natividad & Vivas-Consuelo, David, 2019. "Predicting healthcare expenditure by multimorbidity groups," Health Policy, Elsevier, vol. 123(4), pages 427-434.
    13. Mark Braverman & Sylvain Chassang, 2016. "Data-Driven Incentive Alignment in Capitation Schemes," Working Papers 073_2015, Princeton University, Department of Economics, Econometric Research Program..
    14. Sungchul Park & Anirban Basu, 2018. "Alternative evaluation metrics for risk adjustment methods," Health Economics, John Wiley & Sons, Ltd., vol. 27(6), pages 984-1010, June.
    15. Mark Braverman & Sylvain Chassang, 2020. "Data-Driven Incentive Alignment in Capitation Schemes," Working Papers 2020-60, Princeton University. Economics Department..
    16. Toni Mora & Joan Gil & Antoni Sicras-Mainar, 2015. "The influence of obesity and overweight on medical costs: a panel data perspective," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 16(2), pages 161-173, March.
    17. Mona Aghdaee & Bonny Parkinson & Kompal Sinha & Yuanyuan Gu & Rajan Sharma & Emma Olin & Henry Cutler, 2022. "An examination of machine learning to map non‐preference based patient reported outcome measures to health state utility values," Health Economics, John Wiley & Sons, Ltd., vol. 31(8), pages 1525-1557, August.
    18. Jay Dev Dubey, 2021. "Measuring Income Elasticity of Healthcare-Seeking Behavior in India: A Conditional Quantile Regression Approach," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 19(4), pages 767-793, December.
    19. Kaushik Ghosh & Irina Bondarenko & Kassandra L Messer & Susan T Stewart & Trivellore Raghunathan & Allison B Rosen & David M Cutler, 2020. "Attributing medical spending to conditions: A comparison of methods," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-17, August.
    20. Kurt Lavetti & Thomas DeLeire & Nicolas R. Ziebarth, 2023. "How do low‐income enrollees in the Affordable Care Act marketplaces respond to cost‐sharing?," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 90(1), pages 155-183, March.

    More about this item

    Keywords

    Predictive analytics; Machine learning; Cost containment; Healthcare management; Healthcare costs; Random forest; Neural network;
    All these keywords.

    JEL classification:

    • I11 - Health, Education, and Welfare - - Health - - - Analysis of Health Care Markets
    • I13 - Health, Education, and Welfare - - Health - - - Health Insurance, Public and Private
    • I15 - Health, Education, and Welfare - - Health - - - Health and Economic Development

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:eujhec:v:23:y:2022:i:2:d:10.1007_s10198-021-01363-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.