IDEAS home Printed from https://ideas.repec.org/a/eee/insuma/v106y2022icp13-32.html
   My bibliography  Save this article

Imbalanced learning for insurance using modified loss functions in tree-based models

Author

Listed:
  • Hu, Changyue
  • Quan, Zhiyu
  • Chong, Wing Fung

Abstract

Tree-based models have gained momentum in insurance claim loss modeling; however, the point mass at zero and the heavy tail of insurance loss distribution pose the challenge to apply conventional methods directly to claim loss modeling. With a simple illustrative dataset, we first demonstrate how the traditional tree-based algorithm's splitting function fails to cope with a large proportion of data with zero responses. To address the imbalance issue presented in such loss modeling, this paper aims to modify the traditional splitting function of Classification and Regression Tree (CART). In particular, we propose two novel modified loss functions, namely, the weighted sum of squared error and the sum of squared Canberra error. These modified loss functions impose a significant penalty on grouping observations of non-zero response with those of zero response at the splitting procedure, and thus significantly enhance their separation. Finally, we examine and compare the predictive performance of such modified tree-based models to the traditional model on synthetic datasets that imitate insurance loss. The results show that such modification leads to substantially different tree structures and improved prediction performance.

Suggested Citation

  • Hu, Changyue & Quan, Zhiyu & Chong, Wing Fung, 2022. "Imbalanced learning for insurance using modified loss functions in tree-based models," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 13-32.
  • Handle: RePEc:eee:insuma:v:106:y:2022:i:c:p:13-32
    DOI: 10.1016/j.insmatheco.2022.04.010
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167668722000555
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.insmatheco.2022.04.010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yi Yang & Wei Qian & Hui Zou, 2018. "Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(3), pages 456-470, July.
    2. Roel Henckaerts & Marie-Pier Côté & Katrien Antonio & Roel Verbelen, 2021. "Boosting Insights in Insurance Tariff Plans with Tree-Based Machine Learning Methods," North American Actuarial Journal, Taylor & Francis Journals, vol. 25(2), pages 255-285, April.
    3. Lopez, Olivier & Milhaud, Xavier & Thérond, Pierre-E., 2019. "A Tree-Based Algorithm Adapted To Microlevel Reserving And Long Development Claims," ASTIN Bulletin, Cambridge University Press, vol. 49(3), pages 741-762, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yaojun Zhang & Lanpeng Ji & Georgios Aivaliotis & Charles Taylor, 2023. "Bayesian CART models for insurance claims frequency," Papers 2303.01923, arXiv.org, revised Dec 2023.
    2. Zhiyu Quan & Changyue Hu & Panyi Dong & Emiliano A. Valdez, 2024. "Improving Business Insurance Loss Models by Leveraging InsurTech Innovation," Papers 2401.16723, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhiyu Quan & Changyue Hu & Panyi Dong & Emiliano A. Valdez, 2024. "Improving Business Insurance Loss Models by Leveraging InsurTech Innovation," Papers 2401.16723, arXiv.org.
    2. Christopher Blier-Wong & Hélène Cossette & Luc Lamontagne & Etienne Marceau, 2020. "Machine Learning in P&C Insurance: A Review for Pricing and Reserving," Risks, MDPI, vol. 9(1), pages 1-26, December.
    3. Freek Holvoet & Katrien Antonio & Roel Henckaerts, 2023. "Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff," Papers 2310.12671, arXiv.org, revised Oct 2023.
    4. Kevin Kuo & Daniel Lupton, 2020. "Towards Explainability of Machine Learning Models in Insurance Pricing," Papers 2003.10674, arXiv.org.
    5. Thomas Poufinas & Periklis Gogas & Theophilos Papadimitriou & Emmanouil Zaganidis, 2023. "Machine Learning in Forecasting Motor Insurance Claims," Risks, MDPI, vol. 11(9), pages 1-19, September.
    6. Eduardo Ramos-P'erez & Pablo J. Alonso-Gonz'alez & Jos'e Javier N'u~nez-Vel'azquez, 2020. "Stochastic reserving with a stacked model based on a hybridized Artificial Neural Network," Papers 2008.07564, arXiv.org.
    7. Gu, Zheng & Li, Yunxian & Zhang, Minghui & Liu, Yifei, 2023. "Modelling economic losses from earthquakes using regression forests: Application to parametric insurance," Economic Modelling, Elsevier, vol. 125(C).
    8. Crevecoeur, Jonas & Robben, Jens & Antonio, Katrien, 2022. "A hierarchical reserving model for reported non-life insurance claims," Insurance: Mathematics and Economics, Elsevier, vol. 104(C), pages 158-184.
    9. Gao, Lisa & Shi, Peng, 2022. "Leveraging high-resolution weather information to predict hail damage claims: A spatial point process for replicated point patterns," Insurance: Mathematics and Economics, Elsevier, vol. 107(C), pages 161-179.
    10. Wei Qian & Craig A. Rolling & Gang Cheng & Yuhong Yang, 2019. "On the Forecast Combination Puzzle," Econometrics, MDPI, vol. 7(3), pages 1-26, September.
    11. Qian, Wei & Rolling, Craig A. & Cheng, Gang & Yang, Yuhong, 2022. "Combining forecasts for universally optimal performance," International Journal of Forecasting, Elsevier, vol. 38(1), pages 193-208.
    12. Dong-Young Lim, 2021. "A Neural Frequency-Severity Model and Its Application to Insurance Claims," Papers 2106.10770, arXiv.org, revised Feb 2024.
    13. Kristian Buchardt & Christian Furrer & Oliver Lunding Sandqvist, 2022. "Transaction time models in multi-state life insurance," Papers 2209.06902, arXiv.org, revised Feb 2023.
    14. Catalina Lozano-Murcia & Francisco P. Romero & Jesus Serrano-Guerrero & Jose A. Olivas, 2023. "A Comparison between Explainable Machine Learning Methods for Classification and Regression Problems in the Actuarial Context," Mathematics, MDPI, vol. 11(14), pages 1-20, July.
    15. Marian Reiff & Erik Šoltés & Silvia Komara & Tatiana Šoltésová & Silvia Zelinová, 2022. "Segmentation and estimation of claim severity in motor third-party liability insurance through contrast analysis," Equilibrium. Quarterly Journal of Economics and Economic Policy, Institute of Economic Research, vol. 17(3), pages 803-842, September.
    16. Maciak, Matúš & Okhrin, Ostap & Pešta, Michal, 2021. "Infinitely stochastic micro reserving," Insurance: Mathematics and Economics, Elsevier, vol. 100(C), pages 30-58.
    17. Łukasz Delong & Mario V. Wüthrich, 2020. "Neural Networks for the Joint Development of Individual Payments and Claim Incurred," Risks, MDPI, vol. 8(2), pages 1-34, April.
    18. Viktor Stojkoski & Petar Jolakoski & Igor Ivanovski, 2021. "The short‐run impact of COVID‐19 on the activity in the insurance industry in the Republic of North Macedonia," Risk Management and Insurance Review, American Risk and Insurance Association, vol. 24(3), pages 221-242, September.
    19. Trufin, Julien & Denuit, Michel, 2021. "Boosting cost-complexity pruned trees On Tweedie responses: the ABT machine," LIDAM Discussion Papers ISBA 2021015, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    20. Eduardo Ramos-P'erez & Pablo J. Alonso-Gonz'alez & Jos'e Javier N'u~nez-Vel'azquez, 2022. "Mack-Net model: Blending Mack's model with Recurrent Neural Networks," Papers 2205.07334, arXiv.org.

    More about this item

    Keywords

    Predictive model of insurance claims; Imbalanced learning; Custom loss; Canberra distance; Regression tree; Tree-based algorithms;
    All these keywords.

    JEL classification:

    • G22 - Financial Economics - - Financial Institutions and Services - - - Insurance; Insurance Companies; Actuarial Studies
    • C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques
    • C02 - Mathematical and Quantitative Methods - - General - - - Mathematical Economics
    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
    • C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods
    • O30 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:insuma:v:106:y:2022:i:c:p:13-32. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/505554 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.