IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0314975.html
   My bibliography  Save this article

Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset

Author

Listed:
  • Gadir Alomair

Abstract

Accurate forecasting of claim frequency in automobile insurance is essential for insurers to assess risks effectively and establish appropriate pricing policies. Traditional methods typically rely on a Poisson distribution for modeling claim counts; however, this approach can be inadequate due to frequent zero-claim periods, leading to zero inflation in the data. Zero inflation occurs when more zeros are observed than expected under standard Poisson or negative binomial (NB) models. While machine learning (ML) techniques have been explored for predictive analytics in other contexts, their application to zero-inflated insurance data remains limited. This study investigates the utility of ML in improving forecast accuracy under conditions of zero-inflation, a data characteristic common in automobile insurance. The research involved a comparative evaluation of several models, including Poisson, NB, zero-inflated Poisson (ZIP), hurdle Poisson, zero-inflated negative binomial (ZINB), hurdle negative binomial, random forest (RF), support vector machine (SVM), and artificial neural network (ANN) on an insurance dataset. The performance of these models was assessed using mean absolute error. The results reveal that the SVM model outperforms others in predictive accuracy, particularly in handling zero-inflation, followed by the ZIP and ZINB models. In contrast, the traditional Poisson and NB models showed lower predictive capabilities. By addressing the challenge of zero-inflation in automobile claim data, this study offers insights into improving the accuracy of claim frequency predictions. Although this study is based on a single dataset, the findings provide valuable perspectives on enhancing prediction accuracy and improving risk management practices in the insurance industry.

Suggested Citation

  • Gadir Alomair, 2024. "Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset," PLOS ONE, Public Library of Science, vol. 19(12), pages 1-12, December.
  • Handle: RePEc:plo:pone00:0314975
    DOI: 10.1371/journal.pone.0314975
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0314975
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0314975&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0314975?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Thomas Poufinas & Periklis Gogas & Theophilos Papadimitriou & Emmanouil Zaganidis, 2023. "Machine Learning in Forecasting Motor Insurance Claims," Risks, MDPI, vol. 11(9), pages 1-19, September.
    2. Carina Clemente & Gracinda R. Guerreiro & Jorge M. Bravo, 2023. "Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting," Risks, MDPI, vol. 11(9), pages 1-20, September.
    3. Rafaella L. S. Nascimento & Roberta A. de A. Fagundes & Renata M. C. R. Souza, 2022. "Statistical Learning for Predicting School Dropout in Elementary Education: A Comparative Study," Annals of Data Science, Springer, vol. 9(4), pages 801-828, August.
    4. Winfried Pohlmeier & Volker Ulrich, 1995. "An Econometric Model of the Two-Part Decisionmaking Process in the Demand for Health Care," Journal of Human Resources, University of Wisconsin Press, vol. 30(2), pages 339-361.
    5. Michel Denuit & Pierre Devolder & Anne‐Cécile Goderniaux, 2007. "Securitization of Longevity Risk: Pricing Survivor Bonds With Wang Transform in the Lee‐Carter Framework," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 74(1), pages 87-113, March.
    6. Alinta Ann Wilson & Antonio Nehme & Alisha Dhyani & Khaled Mahbub, 2024. "A Comparison of Generalised Linear Modelling with Machine Learning Approaches for Predicting Loss Cost in Motor Insurance," Risks, MDPI, vol. 12(4), pages 1-29, March.
    7. Banghee So, 2024. "Enhanced gradient boosting for zero-inflated insurance claims and comparative analysis of CatBoost, XGBoost, and LightGBM," Scandinavian Actuarial Journal, Taylor & Francis Journals, vol. 2024(10), pages 1013-1035, November.
    8. Rainer Winkelmann, 2008. "Econometric Analysis of Count Data," Springer Books, Springer, edition 0, number 978-3-540-78389-3, July.
    9. Jose, Alex & Macdonald, Angus S. & Tzougas, George & Streftaris, George, 2024. "Interpretable zero-inflated neural network models for predicting admission counts," Annals of Actuarial Science, Cambridge University Press, vol. 18(3), pages 644-674, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gregori Baetschmann & Rainer Winkelmann, 2014. "A dynamic hurdle model for zero-inflated count data: with an application to health care utilization," ECON - Working Papers 151, Department of Economics - University of Zurich.
    2. Borislava Mihaylova & Andrew Briggs & Anthony O'Hagan & Simon G. Thompson, 2011. "Review of statistical methods for analysing healthcare resources and costs," Health Economics, John Wiley & Sons, Ltd., vol. 20(8), pages 897-916, August.
    3. Helmut Farbmacher, 2013. "Extensions Of Hurdle Models For Overdispersed Count Data," Health Economics, John Wiley & Sons, Ltd., vol. 22(11), pages 1398-1404, November.
    4. Stefan Boes, 2010. "Count Data Models with Correlated Unobserved Heterogeneity," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 37(3), pages 382-402, September.
    5. Tansel, Aysit & Keskin, Halil Ibrahim, 2017. "Education Effects on Days Hospitalized and Days out of Work by Gender: Evidence from Turkey," IZA Discussion Papers 11210, Institute of Labor Economics (IZA).
    6. Bono, Pierre-Henri & David, Quentin & Desbordes, Rodolphe & Py, Loriane, 2022. "Metro infrastructure and metropolitan attractiveness," Regional Science and Urban Economics, Elsevier, vol. 93(C).
    7. Mozhaeva, Irina, 2022. "Inequalities in utilization of institutional care among older people in Estonia," Health Policy, Elsevier, vol. 126(7), pages 704-714.
    8. Kalle Hirvonen & John Hoddinott, 2017. "Agricultural production and children's diets: evidence from rural Ethiopia," Agricultural Economics, International Association of Agricultural Economists, vol. 48(4), pages 469-480, July.
    9. Noel Perceval Assogba & Daowei Zhang, 2020. "An Economic Analysis of Tropical Forest Resource Conservation in a Protected Area," Sustainability, MDPI, vol. 12(14), pages 1-12, July.
    10. Riccardo Crescenzi & Carlo Pietrobelli & Roberta Rabellotti, 2012. "Innovation Drivers, Value Chains and the Geography of Multinational Firms in European Regions," LEQS – LSE 'Europe in Question' Discussion Paper Series 53, European Institute, LSE.
    11. Marco Dueñas & Giorgio Fagiolo, 2013. "Modeling the International-Trade Network: a gravity approach," Journal of Economic Interaction and Coordination, Springer;Society for Economic Science with Heterogeneous Interacting Agents, vol. 8(1), pages 155-178, April.
    12. Carillo, Maria Rosaria & Papagni, Erasmo & Sapio, Alessandro, 2013. "Do collaborations enhance the high-quality output of scientific institutions? Evidence from the Italian Research Assessment Exercise," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 47(C), pages 25-36.
    13. Gamba, Simona & Magazzini, Laura & Pertile, Paolo, 2021. "R&D and market size: Who benefits from orphan drug legislation?," Journal of Health Economics, Elsevier, vol. 80(C).
    14. Erik Schokkaert & Tom Van Ourti & Diana De Graeve & Ann Lecluyse & Carine Van de Voorde, 2010. "Supplemental health insurance and equality of access in Belgium," Health Economics, John Wiley & Sons, Ltd., vol. 19(4), pages 377-395, April.
    15. Anikó Bíró, 2014. "Supplementary private health insurance and health care utilization of people aged 50+," Empirical Economics, Springer, vol. 46(2), pages 501-524, March.
    16. Chen, Fen-Ying & Yang, Sharon S. & Huang, Hong-Chih, 2022. "Modeling pandemic mortality risk and its application to mortality-linked security pricing," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 341-363.
    17. Gurmu, Shiferaw, 1998. "Generalized hurdle count data regression models," Economics Letters, Elsevier, vol. 58(3), pages 263-268, March.
    18. Hirvonen, Kalle & Hoddinott, John F., 2014. "Agricultural production and children’s diets: Evidence from rural Ethiopia," ESSP working papers 69, International Food Policy Research Institute (IFPRI).
    19. Deb, Partha & Trivedi, Pravin K., 2002. "The structure of demand for health care: latent class versus two-part models," Journal of Health Economics, Elsevier, vol. 21(4), pages 601-625, July.
    20. Darcy Steeg Morris & Kimberly F. Sellers, 2022. "A Flexible Mixed Model for Clustered Count Data," Stats, MDPI, vol. 5(1), pages 1-18, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0314975. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.