IDEAS home Printed from https://ideas.repec.org/a/eee/insuma/v121y2025icp45-62.html
   My bibliography  Save this article

Insurance loss modeling with gradient tree-boosted mixture models

Author

Listed:
  • Hou, Yanxi
  • Li, Jiahong
  • Gao, Guangyuan

Abstract

In actuarial practice, finite mixture model is one widely applied statistical method to model the insurance loss. Although the Expectation-Maximization (EM) algorithm usually plays an essential tool for the parameter estimation of mixture models, it suffers from other issues which cause unstable predictions. For example, feature engineering and variable selection are two crucial modeling issues that are challenging for mixture models as they involve several component models. Avoiding overfitting is another technical concern of the modeling method for the prediction of future losses. To address those issues, we propose an Expectation-Boosting (EB) algorithm, which implements the gradient boosting decision trees to adaptively increase the likelihood in the second step. Our proposed EB algorithm can estimate both the mixing probabilities and the component parameters non-parametrically and overfitting-sensitively, and further perform automated feature engineering, model fitting, and variable selection simultaneously, which fully explores the predictive power of feature space. Moreover, the proposed algorithm can be combined with parallel computation methods to improve computation efficiency. Finally, we conduct two simulation studies to show the good performance of the proposed algorithm and an empirical analysis of the claim amounts for illustration.

Suggested Citation

  • Hou, Yanxi & Li, Jiahong & Gao, Guangyuan, 2025. "Insurance loss modeling with gradient tree-boosted mixture models," Insurance: Mathematics and Economics, Elsevier, vol. 121(C), pages 45-62.
  • Handle: RePEc:eee:insuma:v:121:y:2025:i:c:p:45-62
    DOI: 10.1016/j.insmatheco.2024.12.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016766872400132X
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.insmatheco.2024.12.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Delong, Łukasz & Lindholm, Mathias & Wüthrich, Mario V., 2021. "Gamma Mixture Density Networks and their application to modelling insurance claim amounts," Insurance: Mathematics and Economics, Elsevier, vol. 101(PB), pages 240-261.
    2. Tsz Chai Fung & George Tzougas & Mario V. Wüthrich, 2023. "Mixture Composite Regression Models with Multi-type Feature Selection," North American Actuarial Journal, Taylor & Francis Journals, vol. 27(2), pages 396-428, April.
    3. Naik, Prasad A. & Shi, Peide & Tsai, Chih-Ling, 2007. "Extending the Akaike Information Criterion to Mixture Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 244-254, March.
    4. Goldfeld, Stephen M. & Quandt, Richard E., 1973. "A Markov model for switching regressions," Journal of Econometrics, Elsevier, vol. 1(1), pages 3-15, March.
    5. Mian Huang & Runze Li & Shaoli Wang, 2013. "Nonparametric Mixture of Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(503), pages 929-941, September.
    6. Sylvia. Richardson & Peter J. Green, 1997. "On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(4), pages 731-792.
    7. Zhang, Pengcheng & Pitt, David & Wu, Xueyuan, 2022. "A New Multivariate Zero-Inflated Hurdle Model With Applications In Automobile Insurance," ASTIN Bulletin, Cambridge University Press, vol. 52(2), pages 393-416, May.
    8. Tseung, Spark C. & Badescu, Andrei L. & Fung, Tsz Chai & Lin, X. Sheldon, 2021. "LRMoE.jl: a software package for insurance loss modelling using mixture of experts regression model," Annals of Actuarial Science, Cambridge University Press, vol. 15(2), pages 419-440, July.
    9. Zhang, Pengcheng & Calderin, Enrique & Li, Shuanming & Wu, Xueyuan, 2020. "On the Type I multivariate zero-truncated hurdle model with applications in health insurance," Insurance: Mathematics and Economics, Elsevier, vol. 90(C), pages 35-45.
    10. Simon Lee & X. Lin, 2010. "Modeling and Evaluating Insurance Losses Via Mixtures of Erlang Distributions," North American Actuarial Journal, Taylor & Francis Journals, vol. 14(1), pages 107-130.
    11. Hiroyuki Kasahara & Katsumi Shimotsu, 2015. "Testing the Number of Components in Normal Mixture Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1632-1645, December.
    12. Mian Huang & Weixin Yao, 2012. "Mixture of Regression Models With Varying Mixing Proportions: A Semiparametric Approach," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 711-724, June.
    13. Verbelen, Roel & Gong, Lan & Antonio, Katrien & Badescu, Andrei & Lin, Sheldon, 2015. "Fitting Mixtures Of Erlangs To Censored And Truncated Data Using The Em Algorithm," ASTIN Bulletin, Cambridge University Press, vol. 45(3), pages 729-758, September.
    14. Khalili, Abbas & Chen, Jiahua, 2007. "Variable Selection in Finite Mixture of Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1025-1038, September.
    15. Fung, Tsz Chai & Badescu, Andrei L. & Lin, X. Sheldon, 2019. "A class of mixture of experts models for general insurance: Theoretical developments," Insurance: Mathematics and Economics, Elsevier, vol. 89(C), pages 111-127.
    16. Chai Fung, Tsz & Badescu, Andrei L. & Sheldon Lin, X., 2019. "A Class Of Mixture Of Experts Models For General Insurance: Application To Correlated Claim Frequencies," ASTIN Bulletin, Cambridge University Press, vol. 49(3), pages 647-688, September.
    17. Lee, Simon C.K. & Lin, X. Sheldon, 2012. "Modeling Dependent Risks with Multivariate Erlang Mixtures," ASTIN Bulletin, Cambridge University Press, vol. 42(1), pages 153-180, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qi, Xuefei & Xu, Xingbai & Feng, Zhenghui & Peng, Heng, 2025. "Component selection and variable selection for mixture regression models," Computational Statistics & Data Analysis, Elsevier, vol. 206(C).
    2. Delong, Łukasz & Lindholm, Mathias & Wüthrich, Mario V., 2021. "Gamma Mixture Density Networks and their application to modelling insurance claim amounts," Insurance: Mathematics and Economics, Elsevier, vol. 101(PB), pages 240-261.
    3. Počuča, Nikola & Jevtić, Petar & McNicholas, Paul D. & Miljkovic, Tatjana, 2020. "Modeling frequency and severity of claims with the zero-inflated generalized cluster-weighted models," Insurance: Mathematics and Economics, Elsevier, vol. 94(C), pages 79-93.
    4. Gustavo Alexis Sabillón & Luiz Gabriel Fernandes Cotrim & Daiane Aparecida Zuanetti, 2023. "A data-driven reversible jump for estimating a finite mixture of regression models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(1), pages 350-369, March.
    5. Marco Berrettini & Giuliano Galimberti & Saverio Ranciati, 2023. "Semiparametric finite mixture of regression models with Bayesian P-splines," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 745-775, September.
    6. Ye, Mao & Lu, Zhao-Hua & Li, Yimei & Song, Xinyuan, 2019. "Finite mixture of varying coefficient model: Estimation and component selection," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 452-474.
    7. Sphiwe B. Skhosana & Salomon M. Millard & Frans H. J. Kanfer, 2023. "A Novel EM-Type Algorithm to Estimate Semi-Parametric Mixtures of Partially Linear Models," Mathematics, MDPI, vol. 11(5), pages 1-20, February.
    8. Alessandro Staino & Emilio Russo & Massimo Costabile & Arturo Leccadito, 2023. "Minimum capital requirement and portfolio allocation for non-life insurance: a semiparametric model with Conditional Value-at-Risk (CVaR) constraint," Computational Management Science, Springer, vol. 20(1), pages 1-32, December.
    9. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2016. "Multivariate mixtures of Erlangs for density estimation under censoring," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 22(3), pages 429-455, July.
    10. Sijia Xiang & Weixin Yao, 2018. "Semiparametric mixtures of nonparametric regressions," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(1), pages 131-154, February.
    11. You, Na & Dai, Hongsheng & Wang, Xueqin & Yu, Qingyun, 2024. "Sequential estimation for mixture of regression models for heterogeneous population," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    12. Bae, Taehan & Miljkovic, Tatjana, 2024. "Loss modeling with the size-biased lognormal mixture and the entropy regularized EM algorithm," Insurance: Mathematics and Economics, Elsevier, vol. 117(C), pages 182-195.
    13. Fung, Tsz Chai & Badescu, Andrei L. & Lin, X. Sheldon, 2019. "A class of mixture of experts models for general insurance: Theoretical developments," Insurance: Mathematics and Economics, Elsevier, vol. 89(C), pages 111-127.
    14. Li, Zhengxiao & Wang, Fei & Zhao, Zhengtang, 2024. "A new class of composite GBII regression models with varying threshold for modeling heavy-tailed data," Insurance: Mathematics and Economics, Elsevier, vol. 117(C), pages 45-66.
    15. Bladt, Martin & Yslas, Jorge, 2023. "Robust claim frequency modeling through phase-type mixture-of-experts regression," Insurance: Mathematics and Economics, Elsevier, vol. 111(C), pages 1-22.
    16. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    17. Fisher, Mark & Jensen, Mark J., 2022. "Bayesian nonparametric learning of how skill is distributed across the mutual fund industry," Journal of Econometrics, Elsevier, vol. 230(1), pages 131-153.
    18. Xue, Jiacheng & Yao, Weixin, 2022. "Machine Learning Embedded Semiparametric Mixtures of Regressions with Covariate-Varying Mixing Proportions," Econometrics and Statistics, Elsevier, vol. 22(C), pages 159-171.
    19. Abbas Khalili & Farhad Shokoohi & Masoud Asgharian & Shili Lin, 2023. "Sparse estimation in semiparametric finite mixture of varying coefficient regression models," Biometrics, The International Biometric Society, vol. 79(4), pages 3445-3457, December.
    20. Hoshino Tadao & Yanagi Takahide, 2022. "Estimating marginal treatment effects under unobserved group heterogeneity," Journal of Causal Inference, De Gruyter, vol. 10(1), pages 197-216, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:insuma:v:121:y:2025:i:c:p:45-62. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/505554 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.