IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v10y2025i6p90-d1680916.html
   My bibliography  Save this article

Machine Learning Applications for Predicting High-Cost Claims Using Insurance Data

Author

Listed:
  • Esmeralda Brati

    (Department of Statistics and Applied Informatics, Faculty of Economy, University of Tirana, 1010 Tirana, Albania)

  • Alma Braimllari

    (Department of Statistics and Applied Informatics, Faculty of Economy, University of Tirana, 1010 Tirana, Albania)

  • Ardit Gjeçi

    (Department of Economics and Finance, University of New York Tirana, 1000 Tirana, Albania)

Abstract

Insurance is essential for financial risk protection, but claim management is complex and requires accurate classification and forecasting strategies. This study aimed to empirically evaluate the performance of classification algorithms, including Logistic Regression, Decision Tree, Random Forest, XGBoost, K-Nearest Neighbors, Support Vector Machine, and Naïve Bayes to predict high insurance claims. The research analyses the variables of claims, vehicles, and insured parties that influence the classification of high-cost claims. This investigation utilizes a dataset comprising 802 observations of bodily injury claims from the motor liability portfolio of a private insurance company in Albania, covering the period from 2018 to 2024. In order to evaluate and compare the performance of the models, we employed evaluation criteria, including classification accuracy (CA), area under the curve (AUC), confusion matrix, and error rates. We found that Random Forest performs better, achieving the highest classification accuracy (CA = 0.8867, AUC = 0.9437) with the lowest error rates, followed by the XGBoost model. At the same time, logistic regression demonstrated the weakest performance. Key predictive factors in high claim classification include claim type, deferred period, vehicle brand and age of driver. These findings highlight the potential of machine learning models in improving claim classification and risk assessment and refine underwriting policy.

Suggested Citation

  • Esmeralda Brati & Alma Braimllari & Ardit Gjeçi, 2025. "Machine Learning Applications for Predicting High-Cost Claims Using Insurance Data," Data, MDPI, vol. 10(6), pages 1-22, June.
  • Handle: RePEc:gam:jdataj:v:10:y:2025:i:6:p:90-:d:1680916
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/10/6/90/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/10/6/90/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:10:y:2025:i:6:p:90-:d:1680916. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.