IDEAS home Printed from https://ideas.repec.org/a/gam/jrisks/v9y2021i3p53-d517868.html
   My bibliography  Save this article

Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance

Author

Listed:
  • Yves Staudt

    (Department Alpine Region Development, Institute for Tourism and Leisure, University of Applied Sciences of the Grisons, Comercialstrasse 19, 7000 Chur, Switzerland
    Center of Data Analysis, Simulation and Visualization, Department Applied Future Technologies, University of Applied Sciences of the Grisons, Ringstrasse 34, 7000 Chur, Switzerland
    These authors contributed equally to this work.)

  • Joël Wagner

    (Department of Actuarial Science, Faculty of Business and Economics (HEC Lausanne), University of Lausanne, Extranef, 1015 Lausanne, Switzerland
    Swiss Finance Institute, University of Lausanne, 1015 Lausanne, Switzerland
    These authors contributed equally to this work.)

Abstract

For calculating non-life insurance premiums, actuaries traditionally rely on separate severity and frequency models using covariates to explain the claims loss exposure. In this paper, we focus on the claim severity. First, we build two reference models, a generalized linear model and a generalized additive model, relying on a log-normal distribution of the severity and including the most significant factors. Thereby, we relate the continuous variables to the response in a nonlinear way. In the second step, we tune two random forest models, one for the claim severity and one for the log-transformed claim severity, where the latter requires a transformation of the predicted results. We compare the prediction performance of the different models using the relative error, the root mean squared error and the goodness-of-lift statistics in combination with goodness-of-fit statistics. In our application, we rely on a dataset of a Swiss collision insurance portfolio covering the loss exposure of the period from 2011 to 2015, and including observations from 81 309 settled claims with a total amount of CHF 184 mio. In the analysis, we use the data from 2011 to 2014 for training and from 2015 for testing. Our results indicate that the use of a log-normal transformation of the severity is not leading to performance gains with random forests. However, random forests with a log-normal transformation are the favorite choice for explaining right-skewed claims. Finally, when considering all indicators, we conclude that the generalized additive model has the best overall performance.

Suggested Citation

  • Yves Staudt & Joël Wagner, 2021. "Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance," Risks, MDPI, vol. 9(3), pages 1-28, March.
  • Handle: RePEc:gam:jrisks:v:9:y:2021:i:3:p:53-:d:517868
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-9091/9/3/53/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-9091/9/3/53/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Denuit, Michel & Hainaut, Donatien & Trufin, Julien, 2020. "Effective Statistical Learning Methods for Actuaries II : Tree-Based Methods and Extensions," LIDAM Reprints ISBA 2020035, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    2. Dalkilic, Turkan Erbay & Tank, Fatih & Kula, Kamile Sanli, 2009. "Neural networks approach for determining total claim amounts in insurance," Insurance: Mathematics and Economics, Elsevier, vol. 45(2), pages 236-241, October.
    3. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    4. Eling, Martin, 2014. "Fitting asset returns to skewed distributions: Are the skew-normal and skew-student good models?," Insurance: Mathematics and Economics, Elsevier, vol. 59(C), pages 45-56.
    5. Denuit, Michel & Sznajder, Dominik & Trufin, Julien, 2019. "Model selection based on Lorenz and concentration curves, Gini indices and convex order," Insurance: Mathematics and Economics, Elsevier, vol. 89(C), pages 128-139.
    6. Klein, Nadja & Denuit, Michel & Lang, Stefan & Kneib, Thomas, 2014. "Nonlife ratemaking and risk management with Bayesian generalized additive models for location, scale, and shape," Insurance: Mathematics and Economics, Elsevier, vol. 55(C), pages 225-249.
    7. Katrien Antonio & Emiliano Valdez, 2012. "Statistical concepts of a priori and a posteriori risk classification in insurance," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 96(2), pages 187-224, June.
    8. Jean-Philippe Boucher & Michel Denuit & Montserrat Guillén, 2007. "Risk Classification for Claim Counts," North American Actuarial Journal, Taylor & Francis Journals, vol. 11(4), pages 110-131.
    9. Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
    10. Daniela Laas & Hato Schmeiser & Joël Wagner, 2016. "Empirical Findings on Motor Insurance Pricing in Germany, Austria and Switzerland," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 41(3), pages 398-431, July.
    11. Victor Chernozhukov & Christian Hansen & Martin Spindler, 2015. "Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach," Annual Review of Economics, Annual Reviews, vol. 7(1), pages 649-688, August.
    12. Quan Zhiyu & Valdez Emiliano A., 2018. "Predictive analytics of insurance claims using multivariate decision trees," Dependence Modeling, De Gruyter, vol. 6(1), pages 377-407, December.
    13. Manning, Willard G., 1998. "The logged dependent variable, heteroscedasticity, and the retransformation problem," Journal of Health Economics, Elsevier, vol. 17(3), pages 283-295, June.
    14. Denuit, Michel & Lang, Stefan, 2004. "Non-life rate-making with Bayesian GAMs," Insurance: Mathematics and Economics, Elsevier, vol. 35(3), pages 627-647, December.
    15. Edward W. Frees, 2015. "Analytics of Insurance Markets," Annual Review of Financial Economics, Annual Reviews, vol. 7(1), pages 253-277, December.
    16. Denuit, Michel & Sznajder, Dominik & Trufin, Julien, 2019. "Model selection based on Lorenz and concentration curves, Gini indices and convex order," LIDAM Discussion Papers ISBA 2019006, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    17. Klein, Nadja & Denuit, Michel & Lang, Stefan & Kneib, Thomas, 2014. "Nonlife ratemaking and risk management with Bayesian generalized additive models for location, scale, and shape," LIDAM Reprints ISBA 2014006, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    18. Denuit, Michel & Sznajder, Dominik & Trufin, Julien, 2019. "Model selection based on Lorenz and concentration curves, Gini indices and convex order," LIDAM Reprints ISBA 2019046, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    19. Ai, Chunrong & Norton, Edward C., 2000. "Standard errors for the retransformation problem with heteroscedasticity," Journal of Health Economics, Elsevier, vol. 19(5), pages 697-718, September.
    20. Edward W. Frees & Gee Lee & Lu Yang, 2016. "Multivariate Frequency-Severity Regression Models in Insurance," Risks, MDPI, vol. 4(1), pages 1-36, February.
    21. Manning, Willard G. & Mullahy, John, 2001. "Estimating log models: to transform or not to transform?," Journal of Health Economics, Elsevier, vol. 20(4), pages 461-494, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zuleyka Díaz Martínez & José Fernández Menéndez & Luis Javier García Villalba, 2023. "Tariff Analysis in Automobile Insurance: Is It Time to Switch from Generalized Linear Models to Generalized Additive Models?," Mathematics, MDPI, vol. 11(18), pages 1-16, September.
    2. Mogens Steffensen, 2022. "Special Issue “Risks: Feature Papers 2021”," Risks, MDPI, vol. 10(3), pages 1-2, March.
    3. Ahmed, Hanan, 2022. "Extreme value statistics using related variables," Other publications TiSEM 246f0f13-701c-4c0d-8e09-e, Tilburg University, School of Economics and Management.
    4. Anja Breuer & Yves Staudt, 2022. "Equalization Reserves for Reinsurance and Non-Life Undertakings in Switzerland," Risks, MDPI, vol. 10(3), pages 1-41, March.
    5. Carina Clemente & Gracinda R. Guerreiro & Jorge M. Bravo, 2023. "Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting," Risks, MDPI, vol. 11(9), pages 1-20, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jolien Ponnet & Robin Van Oirbeek & Tim Verdonck, 2021. "Concordance Probability for Insurance Pricing Models," Risks, MDPI, vol. 9(10), pages 1-26, October.
    2. Mihaela Covrig & Iulian Mircea & Gheorghita Zbaganu & Alexandru Coser & Alexandru Tindeche, 2015. "Using R In Generalized Linear Models," Romanian Statistical Review, Romanian Statistical Review, vol. 63(3), pages 33-45, September.
    3. Willame, Gireg & Trufin, Julien & Denuit, Michel, 2023. "Boosted Poisson regression trees: A guide to the BT package in R," LIDAM Discussion Papers ISBA 2023008, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    4. Christopher Blier-Wong & Hélène Cossette & Luc Lamontagne & Etienne Marceau, 2020. "Machine Learning in P&C Insurance: A Review for Pricing and Reserving," Risks, MDPI, vol. 9(1), pages 1-26, December.
    5. Denuit, Michel & Trufin, Julien & Verdebout, Thomas, 2021. "Testing for more positive expectation dependence with application to model comparison," Insurance: Mathematics and Economics, Elsevier, vol. 101(PB), pages 163-172.
    6. George Tzougas, 2020. "EM Estimation for the Poisson-Inverse Gamma Regression Model with Varying Dispersion: An Application to Insurance Ratemaking," Risks, MDPI, vol. 8(3), pages 1-23, September.
    7. Tzougas, George, 2020. "EM estimation for the Poisson-Inverse Gamma regression model with varying dispersion: an application to insurance ratemaking," LSE Research Online Documents on Economics 106539, London School of Economics and Political Science, LSE Library.
    8. Tzougas, George & Vrontos, Spyridon D. & Frangos, Nickolaos E., 2015. "Risk classification for claim counts and losses using regression models for location, scale and shape," LSE Research Online Documents on Economics 70921, London School of Economics and Political Science, LSE Library.
    9. Denuit, Michel & Trufin, Julien & Verdebout, Thomas, 2021. "Testing for more positive expectation dependence with application to model comparison," LIDAM Discussion Papers ISBA 2021021, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    10. Aivars Spilbergs & Andris Fomins & Māris Krastiņš, 2022. "Multivariate Modelling of Motor Third Party Liability Insurance Claims," European Journal of Business Science and Technology, Mendel University in Brno, Faculty of Business and Economics, vol. 8(1), pages 5-18.
    11. Jones, A.M, 2010. "Models For Health Care," Health, Econometrics and Data Group (HEDG) Working Papers 10/01, HEDG, c/o Department of Economics, University of York.
    12. Denuit, Michel & Trufin, Julien, 2022. "Autocalibration by balance correction in nonlife insurance pricing," LIDAM Discussion Papers ISBA 2022041, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    13. Devriendt, Sander & Antonio, Katrien & Reynkens, Tom & Verbelen, Roel, 2021. "Sparse regression with Multi-type Regularized Feature modeling," Insurance: Mathematics and Economics, Elsevier, vol. 96(C), pages 248-261.
    14. Aktaev, Nurken E. & Bannova, K.A., 2022. "Mathematical modeling of probability distribution of money by means of potential formation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 595(C).
    15. Michel Denuit & Christian Y. Robert, 2021. "Risk sharing under the dominant peer‐to‐peer property and casualty insurance business models," Risk Management and Insurance Review, American Risk and Insurance Association, vol. 24(2), pages 181-205, June.
    16. Denuit, Michel & Robert, Christian Y., 2021. "Risk sharing under the dominant peer-to-peer property and casualty insurance business models," LIDAM Discussion Papers ISBA 2021001, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    17. Hao Yu, 2017. "China’s medical savings accounts: an analysis of the price elasticity of demand for health care," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 18(6), pages 773-785, July.
    18. Sarra Ghaddab & Manel Kacem & Christian Peretti & Lotfi Belkacem, 2023. "Extreme severity modeling using a GLM-GPD combination: application to an excess of loss reinsurance treaty," Empirical Economics, Springer, vol. 65(3), pages 1105-1127, September.
    19. Deprez, Laurens & Antonio, Katrien & Boute, Robert, 2023. "Empirical risk assessment of maintenance costs under full-service contracts," European Journal of Operational Research, Elsevier, vol. 304(2), pages 476-493.
    20. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2018. "Unravelling the predictive power of telematics data in car insurance pricing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1275-1304, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jrisks:v:9:y:2021:i:3:p:53-:d:517868. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.