Imbalanced learning for insurance using modified loss functions in tree-based models

Imbalanced learning for insurance using modified loss functions in tree-based models

Author

Listed:

Hu, Changyue
Quan, Zhiyu
Chong, Wing Fung

Abstract

Tree-based models have gained momentum in insurance claim loss modeling; however, the point mass at zero and the heavy tail of insurance loss distribution pose the challenge to apply conventional methods directly to claim loss modeling. With a simple illustrative dataset, we first demonstrate how the traditional tree-based algorithm's splitting function fails to cope with a large proportion of data with zero responses. To address the imbalance issue presented in such loss modeling, this paper aims to modify the traditional splitting function of Classification and Regression Tree (CART). In particular, we propose two novel modified loss functions, namely, the weighted sum of squared error and the sum of squared Canberra error. These modified loss functions impose a significant penalty on grouping observations of non-zero response with those of zero response at the splitting procedure, and thus significantly enhance their separation. Finally, we examine and compare the predictive performance of such modified tree-based models to the traditional model on synthetic datasets that imitate insurance loss. The results show that such modification leads to substantially different tree structures and improved prediction performance.

Suggested Citation

Hu, Changyue & Quan, Zhiyu & Chong, Wing Fung, 2022. "Imbalanced learning for insurance using modified loss functions in tree-based models," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 13-32.

Handle: RePEc:eee:insuma:v:106:y:2022:i:c:p:13-32
DOI: 10.1016/j.insmatheco.2022.04.010

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Yi Yang & Wei Qian & Hui Zou, 2018. "Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(3), pages 456-470, July.
Roel Henckaerts & Marie-Pier Côté & Katrien Antonio & Roel Verbelen, 2021. "Boosting Insights in Insurance Tariff Plans with Tree-Based Machine Learning Methods," North American Actuarial Journal, Taylor & Francis Journals, vol. 25(2), pages 255-285, April.
Lopez, Olivier & Milhaud, Xavier & Thérond, Pierre-E., 2019. "A Tree-Based Algorithm Adapted To Microlevel Reserving And Long Development Claims – Erratum," ASTIN Bulletin, Cambridge University Press, vol. 49(3), pages 919-919, September.
- Lopez, Olivier & Milhaud, Xavier & Thérond, Pierre-E., 2019. "A Tree-Based Algorithm Adapted To Microlevel Reserving And Long Development Claims," ASTIN Bulletin, Cambridge University Press, vol. 49(3), pages 741-762, September.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Saskia Puspa Kenaka & Andi Cakravastia & Anas Ma’ruf & Rully Tri Cahyono, 2025. "Enhancing Intermittent Spare Part Demand Forecasting: A Novel Ensemble Approach with Focal Loss and SMOTE," Logistics, MDPI, vol. 9(1), pages 1-25, February.
Zhang, Yaojun & Ji, Lanpeng & Aivaliotis, Georgios & Taylor, Charles, 2024. "Bayesian CART models for insurance claims frequency," Insurance: Mathematics and Economics, Elsevier, vol. 114(C), pages 108-131.
Yang Qiao & Chou-Wen Wang & Wenjun Zhu, 2024. "Machine learning in long-term mortality forecasting," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 49(2), pages 340-362, April.
Yaojun Zhang & Lanpeng Ji & Georgios Aivaliotis & Charles Taylor, 2023. "Bayesian CART models for insurance claims frequency," Papers 2303.01923, arXiv.org, revised Dec 2023.
Zhiyu Quan & Changyue Hu & Panyi Dong & Emiliano A. Valdez, 2024. "Improving Business Insurance Loss Models by Leveraging InsurTech Innovation," Papers 2401.16723, arXiv.org.
Dong, Panyi & Quan, Zhiyu, 2025. "Automated machine learning in insurance," Insurance: Mathematics and Economics, Elsevier, vol. 120(C), pages 17-41.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Zhiyu Quan & Changyue Hu & Panyi Dong & Emiliano A. Valdez, 2024. "Improving Business Insurance Loss Models by Leveraging InsurTech Innovation," Papers 2401.16723, arXiv.org.
Christopher Blier-Wong & Hélène Cossette & Luc Lamontagne & Etienne Marceau, 2020. "Machine Learning in P&C Insurance: A Review for Pricing and Reserving," Risks, MDPI, vol. 9(1), pages 1-26, December.
Yang Qiao & Chou-Wen Wang & Wenjun Zhu, 2024. "Machine learning in long-term mortality forecasting," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 49(2), pages 340-362, April.
Freek Holvoet & Katrien Antonio & Roel Henckaerts, 2023. "Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff," Papers 2310.12671, arXiv.org, revised Jan 2025.
Kevin Kuo & Daniel Lupton, 2020. "Towards Explainability of Machine Learning Models in Insurance Pricing," Papers 2003.10674, arXiv.org.
Thomas Poufinas & Periklis Gogas & Theophilos Papadimitriou & Emmanouil Zaganidis, 2023. "Machine Learning in Forecasting Motor Insurance Claims," Risks, MDPI, vol. 11(9), pages 1-19, September.
Eduardo Ramos-P'erez & Pablo J. Alonso-Gonz'alez & Jos'e Javier N'u~nez-Vel'azquez, 2020. "Stochastic reserving with a stacked model based on a hybridized Artificial Neural Network," Papers 2008.07564, arXiv.org.
Freek Holvoet & Christopher Blier-Wong & Katrien Antonio, 2025. "A multi-view contrastive learning framework for spatial embeddings in risk modelling," Papers 2511.17954, arXiv.org.
Gu, Zheng & Li, Yunxian & Zhang, Minghui & Liu, Yifei, 2023. "Modelling economic losses from earthquakes using regression forests: Application to parametric insurance," Economic Modelling, Elsevier, vol. 125(C).
Loeys, Stijn & Boute, Robert N. & Antonio, Katrien, 2025. "The use of IoT sensor data to dynamically assess maintenance risk in service contracts," European Journal of Operational Research, Elsevier, vol. 324(2), pages 454-465.
Jan Janoušek & Michal Pešta, 2025. "Bagging and regression trees in individual claims reserving," Statistical Papers, Springer, vol. 66(4), pages 1-26, June.
Gao, Lisa & Shi, Peng, 2022. "Leveraging high-resolution weather information to predict hail damage claims: A spatial point process for replicated point patterns," Insurance: Mathematics and Economics, Elsevier, vol. 107(C), pages 161-179.
Wei Qian & Craig A. Rolling & Gang Cheng & Yuhong Yang, 2019. "On the Forecast Combination Puzzle," Econometrics, MDPI, vol. 7(3), pages 1-26, September.
Qian, Wei & Rolling, Craig A. & Cheng, Gang & Yang, Yuhong, 2022. "Combining forecasts for universally optimal performance," International Journal of Forecasting, Elsevier, vol. 38(1), pages 193-208.
Jaiswal, Rachana & Gupta, Shashank & Tiwari, Aviral Kumar, 2024. "Big data and machine learning-based decision support system to reshape the vaticination of insurance claims," Technological Forecasting and Social Change, Elsevier, vol. 209(C).
Yaojun Zhang & Lanpeng Ji & Georgios Aivaliotis & Charles C. Taylor, 2024. "Bayesian CART models for aggregate claim modeling," Papers 2409.01908, arXiv.org, revised Aug 2025.
Dong-Young Lim, 2021. "A Neural Frequency-Severity Model and Its Application to Insurance Claims," Papers 2106.10770, arXiv.org, revised Mar 2025.
Simon Hatzesberger & Iris Nonneman, 2025. "Advanced Applications of Generative AI in Actuarial Science: Case Studies Beyond ChatGPT," Papers 2506.18942, arXiv.org.
Kristian Buchardt & Christian Furrer & Oliver Lunding Sandqvist, 2022. "Transaction time models in multi-state life insurance," Papers 2209.06902, arXiv.org, revised Feb 2023.
Catalina Lozano-Murcia & Francisco P. Romero & Jesus Serrano-Guerrero & Jose A. Olivas, 2023. "A Comparison between Explainable Machine Learning Methods for Classification and Regression Problems in the Actuarial Context," Mathematics, MDPI, vol. 11(14), pages 1-20, July.

More about this item

Keywords

; ; ; ; ; ;

JEL classification:

G22 - Financial Economics - - Financial Institutions and Services - - - Insurance; Insurance Companies; Actuarial Studies
C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques
C02 - Mathematical and Quantitative Methods - - General - - - Mathematical Economics
C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods
O30 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - General

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:insuma:v:106:y:2022:i:c:p:13-32. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/505554 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Imbalanced learning for insurance using modified loss functions in tree-based models

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

JEL classification:

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data