IDEAS home Printed from https://ideas.repec.org/a/gam/jrisks/v9y2021i3p53-d517868.html
   My bibliography  Save this article

Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance

Author

Listed:
  • Yves Staudt

    (Department Alpine Region Development, Institute for Tourism and Leisure, University of Applied Sciences of the Grisons, Comercialstrasse 19, 7000 Chur, Switzerland
    Center of Data Analysis, Simulation and Visualization, Department Applied Future Technologies, University of Applied Sciences of the Grisons, Ringstrasse 34, 7000 Chur, Switzerland
    These authors contributed equally to this work.)

  • Joël Wagner

    (Department of Actuarial Science, Faculty of Business and Economics (HEC Lausanne), University of Lausanne, Extranef, 1015 Lausanne, Switzerland
    Swiss Finance Institute, University of Lausanne, 1015 Lausanne, Switzerland
    These authors contributed equally to this work.)

Abstract

For calculating non-life insurance premiums, actuaries traditionally rely on separate severity and frequency models using covariates to explain the claims loss exposure. In this paper, we focus on the claim severity. First, we build two reference models, a generalized linear model and a generalized additive model, relying on a log-normal distribution of the severity and including the most significant factors. Thereby, we relate the continuous variables to the response in a nonlinear way. In the second step, we tune two random forest models, one for the claim severity and one for the log-transformed claim severity, where the latter requires a transformation of the predicted results. We compare the prediction performance of the different models using the relative error, the root mean squared error and the goodness-of-lift statistics in combination with goodness-of-fit statistics. In our application, we rely on a dataset of a Swiss collision insurance portfolio covering the loss exposure of the period from 2011 to 2015, and including observations from 81 309 settled claims with a total amount of CHF 184 mio. In the analysis, we use the data from 2011 to 2014 for training and from 2015 for testing. Our results indicate that the use of a log-normal transformation of the severity is not leading to performance gains with random forests. However, random forests with a log-normal transformation are the favorite choice for explaining right-skewed claims. Finally, when considering all indicators, we conclude that the generalized additive model has the best overall performance.

Suggested Citation

  • Yves Staudt & Joël Wagner, 2021. "Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance," Risks, MDPI, vol. 9(3), pages 1-28, March.
  • Handle: RePEc:gam:jrisks:v:9:y:2021:i:3:p:53-:d:517868
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-9091/9/3/53/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-9091/9/3/53/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    2. Klein, Nadja & Denuit, Michel & Lang, Stefan & Kneib, Thomas, 2014. "Nonlife ratemaking and risk management with Bayesian generalized additive models for location, scale, and shape," Insurance: Mathematics and Economics, Elsevier, vol. 55(C), pages 225-249.
    3. Klein, Nadja & Denuit, Michel & Lang, Stefan & Kneib, Thomas, 2014. "Nonlife ratemaking and risk management with Bayesian generalized additive models for location, scale, and shape," LIDAM Reprints ISBA 2014006, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    4. Manning, Willard G., 1998. "The logged dependent variable, heteroscedasticity, and the retransformation problem," Journal of Health Economics, Elsevier, vol. 17(3), pages 283-295, June.
    5. Denuit, Michel & Lang, Stefan, 2004. "Non-life rate-making with Bayesian GAMs," Insurance: Mathematics and Economics, Elsevier, vol. 35(3), pages 627-647, December.
    6. Eling, Martin, 2014. "Fitting asset returns to skewed distributions: Are the skew-normal and skew-student good models?," Insurance: Mathematics and Economics, Elsevier, vol. 59(C), pages 45-56.
    7. Victor Chernozhukov & Christian Hansen & Martin Spindler, 2015. "Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach," Annual Review of Economics, Annual Reviews, vol. 7(1), pages 649-688, August.
    8. Denuit, Michel & Hainaut, Donatien & Trufin, Julien, 2020. "Effective Statistical Learning Methods for Actuaries II : Tree-Based Methods and Extensions," LIDAM Reprints ISBA 2020035, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    9. Denuit, Michel & Sznajder, Dominik & Trufin, Julien, 2019. "Model selection based on Lorenz and concentration curves, Gini indices and convex order," LIDAM Reprints ISBA 2019046, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    10. Katrien Antonio & Emiliano Valdez, 2012. "Statistical concepts of a priori and a posteriori risk classification in insurance," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 96(2), pages 187-224, June.
    11. Ai, Chunrong & Norton, Edward C., 2000. "Standard errors for the retransformation problem with heteroscedasticity," Journal of Health Economics, Elsevier, vol. 19(5), pages 697-718, September.
    12. Quan Zhiyu & Valdez Emiliano A., 2018. "Predictive analytics of insurance claims using multivariate decision trees," Dependence Modeling, De Gruyter, vol. 6(1), pages 377-407, December.
    13. Edward W. Frees & Gee Lee & Lu Yang, 2016. "Multivariate Frequency-Severity Regression Models in Insurance," Risks, MDPI, vol. 4(1), pages 1-36, February.
    14. Jean-Philippe Boucher & Michel Denuit & Montserrat Guillén, 2007. "Risk Classification for Claim Counts," North American Actuarial Journal, Taylor & Francis Journals, vol. 11(4), pages 110-131.
    15. Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
    16. Edward W. Frees, 2015. "Analytics of Insurance Markets," Annual Review of Financial Economics, Annual Reviews, vol. 7(1), pages 253-277, December.
    17. Manning, Willard G. & Mullahy, John, 2001. "Estimating log models: to transform or not to transform?," Journal of Health Economics, Elsevier, vol. 20(4), pages 461-494, July.
    18. Denuit, Michel & Sznajder, Dominik & Trufin, Julien, 2019. "Model selection based on Lorenz and concentration curves, Gini indices and convex order," Insurance: Mathematics and Economics, Elsevier, vol. 89(C), pages 128-139.
    19. Daniela Laas & Hato Schmeiser & Joël Wagner, 2016. "Empirical Findings on Motor Insurance Pricing in Germany, Austria and Switzerland," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 41(3), pages 398-431, July.
    20. Dalkilic, Turkan Erbay & Tank, Fatih & Kula, Kamile Sanli, 2009. "Neural networks approach for determining total claim amounts in insurance," Insurance: Mathematics and Economics, Elsevier, vol. 45(2), pages 236-241, October.
    21. Denuit, Michel & Sznajder, Dominik & Trufin, Julien, 2019. "Model selection based on Lorenz and concentration curves, Gini indices and convex order," LIDAM Discussion Papers ISBA 2019006, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mogens Steffensen, 2022. "Special Issue “Risks: Feature Papers 2021”," Risks, MDPI, vol. 10(3), pages 1-2, March.
    2. Carina Clemente & Gracinda R. Guerreiro & Jorge M. Bravo, 2023. "Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting," Risks, MDPI, vol. 11(9), pages 1-20, September.
    3. Ahmed, Hanan, 2022. "Extreme value statistics using related variables," Other publications TiSEM 246f0f13-701c-4c0d-8e09-e, Tilburg University, School of Economics and Management.
    4. Anja Breuer & Yves Staudt, 2022. "Equalization Reserves for Reinsurance and Non-Life Undertakings in Switzerland," Risks, MDPI, vol. 10(3), pages 1-41, March.
    5. Zuleyka Díaz Martínez & José Fernández Menéndez & Luis Javier García Villalba, 2023. "Tariff Analysis in Automobile Insurance: Is It Time to Switch from Generalized Linear Models to Generalized Additive Models?," Mathematics, MDPI, vol. 11(18), pages 1-16, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jolien Ponnet & Robin Van Oirbeek & Tim Verdonck, 2021. "Concordance Probability for Insurance Pricing Models," Risks, MDPI, vol. 9(10), pages 1-26, October.
    2. Denuit, Michel & Trufin, Julien & Verdebout, Thomas, 2021. "Testing for more positive expectation dependence with application to model comparison," Insurance: Mathematics and Economics, Elsevier, vol. 101(PB), pages 163-172.
    3. George Tzougas, 2020. "EM Estimation for the Poisson-Inverse Gamma Regression Model with Varying Dispersion: An Application to Insurance Ratemaking," Risks, MDPI, vol. 8(3), pages 1-23, September.
    4. Mihaela Covrig & Iulian Mircea & Gheorghita Zbaganu & Alexandru Coser & Alexandru Tindeche, 2015. "Using R In Generalized Linear Models," Romanian Statistical Review, Romanian Statistical Review, vol. 63(3), pages 33-45, September.
    5. Tzougas, George, 2020. "EM estimation for the Poisson-Inverse Gamma regression model with varying dispersion: an application to insurance ratemaking," LSE Research Online Documents on Economics 106539, London School of Economics and Political Science, LSE Library.
    6. Willame, Gireg & Trufin, Julien & Denuit, Michel, 2023. "Boosted Poisson regression trees: A guide to the BT package in R," LIDAM Discussion Papers ISBA 2023008, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    7. Tzougas, George & Vrontos, Spyridon D. & Frangos, Nickolaos E., 2015. "Risk classification for claim counts and losses using regression models for location, scale and shape," LSE Research Online Documents on Economics 70921, London School of Economics and Political Science, LSE Library.
    8. Christopher Blier-Wong & Hélène Cossette & Luc Lamontagne & Etienne Marceau, 2020. "Machine Learning in P&C Insurance: A Review for Pricing and Reserving," Risks, MDPI, vol. 9(1), pages 1-26, December.
    9. Denuit, Michel & Trufin, Julien & Verdebout, Thomas, 2021. "Testing for more positive expectation dependence with application to model comparison," LIDAM Discussion Papers ISBA 2021021, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    10. Mihaela DAVID, 2014. "Modeling The Frequency Of Claims In Auto Insurance With Application To A French Case," Review of Economic and Business Studies, Alexandru Ioan Cuza University, Faculty of Economics and Business Administration, issue 13, pages 69-85, June.
    11. Aivars Spilbergs & Andris Fomins & Māris Krastiņš, 2022. "Multivariate Modelling of Motor Third Party Liability Insurance Claims," European Journal of Business Science and Technology, Mendel University in Brno, Faculty of Business and Economics, vol. 8(1), pages 5-18.
    12. Denuit, Michel & Trufin, Julien, 2021. "Lorenz curve, Gini coefficient, and Tweedie dominance for autocalibrated predictors," LIDAM Discussion Papers ISBA 2021036, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    13. Jones, A.M, 2010. "Models For Health Care," Health, Econometrics and Data Group (HEDG) Working Papers 10/01, HEDG, c/o Department of Economics, University of York.
    14. Denuit, Michel & Charpentier, Arthur & Trufin, Julien, 2021. "Autocalibration and Tweedie-dominance for insurance pricing with machine learning," Insurance: Mathematics and Economics, Elsevier, vol. 101(PB), pages 485-497.
    15. Denuit, Michel & Trufin, Julien, 2022. "Autocalibration by balance correction in nonlife insurance pricing," LIDAM Discussion Papers ISBA 2022041, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    16. Denuit, Michel & Legrand, Catherine, 2016. "Risk Classification in Life Insurance: Extension to Continuous Covariates," LIDAM Discussion Papers ISBA 2016045, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    17. Shengkun Xie & Anna T. Lawniczak, 2018. "Estimating Major Risk Factor Relativities in Rate Filings Using Generalized Linear Models," IJFS, MDPI, vol. 6(4), pages 1-14, October.
    18. Devriendt, Sander & Antonio, Katrien & Reynkens, Tom & Verbelen, Roel, 2021. "Sparse regression with Multi-type Regularized Feature modeling," Insurance: Mathematics and Economics, Elsevier, vol. 96(C), pages 248-261.
    19. Wakker, Peter P. & Yang, Jingni, 2021. "Concave/convex weighting and utility functions for risk: A new light on classical theorems," Insurance: Mathematics and Economics, Elsevier, vol. 100(C), pages 429-435.
    20. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2018. "Unravelling the predictive power of telematics data in car insurance pricing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1275-1304, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jrisks:v:9:y:2021:i:3:p:53-:d:517868. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.