IDEAS home Printed from https://ideas.repec.org/a/gam/jrisks/v10y2022i4p83-d791726.html
   My bibliography  Save this article

Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data

Author

Listed:
  • Jennifer S. K. Chan

    (School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia)

  • S. T. Boris Choy

    (Discipline of Business Analytics, The University of Sydney, Sydney, NSW 2006, Australia)

  • Udi Makov

    (Actuarial Resreach Center, University of Haifa, Haifa 3498838, Israel)

  • Ariel Shamir

    (Efi Arazi School of Computer Science, Reichman University, Herzliya 4610101, Israel)

  • Vered Shapovalov

    (Actuarial Resreach Center, University of Haifa, Haifa 3498838, Israel)

Abstract

In automobile insurance, it is common to adopt a Poisson regression model to predict the number of claims as part of the actuarial pricing process. The Poisson assumption can rarely be justified, often due to overdispersion, and alternative modeling is often considered, typically zero-inflated models, which are special cases of finite mixture distributions. Finite mixture regression modeling of telematics data is challenging to implement since the huge number of covariates computationally prohibits the essential variable selection needed to attain a model with desirable predictive power devoid of overfitting. This paper aims at devising an algorithm that can carry the task of variable selection in the presence of a large number of covariates. This is achieved by generating sub-samples of the data corresponding to each component of the Poisson mixture, and wherein variable selection is applied following the enhancement of the Poisson assumption by means of controlling the number of zero claims. The resulting algorithm is assessed by measuring the out-of-sample AUC (Area Under the Curve), a Machine Learning tool for quantifying predictive power. Finally, the application of the algorithm is demonstrated by using data of claim history and telematics data describing driving behavior. It transpires that unlike alternative algorithms related to Poisson regression, the proposed algorithm is both implementable and enjoys an improved AUC (0.71). The proposed algorithm allows more accurate pricing in an era where telematics data is used for automobile insurance.

Suggested Citation

  • Jennifer S. K. Chan & S. T. Boris Choy & Udi Makov & Ariel Shamir & Vered Shapovalov, 2022. "Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data," Risks, MDPI, vol. 10(4), pages 1-10, April.
  • Handle: RePEc:gam:jrisks:v:10:y:2022:i:4:p:83-:d:791726
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-9091/10/4/83/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-9091/10/4/83/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2018. "Unravelling the predictive power of telematics data in car insurance pricing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1275-1304, November.
    2. Cameron,A. Colin & Trivedi,Pravin K., 2013. "Regression Analysis of Count Data," Cambridge Books, Cambridge University Press, number 9781107667273, January.
    3. Dionne, G & Vanasse, C, 1992. "Automobile Insurance Ratemaking in the Presence of Asymmetrical Information," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 7(2), pages 149-165, April-Jun.
    4. Mercedes Ayuso & Montserrat Guillen & Jens Perch Nielsen, 2019. "Improving automobile insurance ratemaking using telematics: incorporating mileage and driver behaviour data," Transportation, Springer, vol. 46(3), pages 735-752, June.
    5. Ehsan Ormoz & Farzad Eskandari, 2016. "Variable selection in finite mixture of semi-parametric regression models," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 45(3), pages 695-711, February.
    6. Lin Dai & Junhui Yin & Zhengfen Xie & Liucang Wu, 2019. "Robust variable selection in finite mixture of regression models using the t distribution," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 48(21), pages 5370-5386, November.
    7. Zhengmin Duan & Yonglian Chang & Qi Wang & Tianyao Chen & Qing Zhao, 2018. "A Logistic Regression Based Auto Insurance Rate-Making Model Designed for the Insurance Rate Reform," IJFS, MDPI, vol. 6(1), pages 1-16, February.
    8. Yip, Karen C.H. & Yau, Kelvin K.W., 2005. "On modeling claim frequency data in general insurance with extra zeros," Insurance: Mathematics and Economics, Elsevier, vol. 36(2), pages 153-163, April.
    9. Qingguo Tang & R. J. Karunamuni, 2018. "Robust variable selection for finite mixture regression models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(3), pages 489-521, June.
    10. Stroinski, Krzysztof J. & Currie, Iain D., 1989. "Selection of variables for automobile insurance rating," Insurance: Mathematics and Economics, Elsevier, vol. 8(1), pages 35-46, March.
    11. Jean-Philippe Boucher & Michel Denuit & Montserrat Guillén, 2007. "Risk Classification for Claim Counts," North American Actuarial Journal, Taylor & Francis Journals, vol. 11(4), pages 110-131.
    12. Junhui Yin & Liucang Wu & Lin Dai, 2020. "Variable selection in finite mixture of regression models using the skew-normal distribution," Journal of Applied Statistics, Taylor & Francis Journals, vol. 47(16), pages 2941-2960, December.
    13. Vuong, Quang H, 1989. "Likelihood Ratio Tests for Model Selection and Non-nested Hypotheses," Econometrica, Econometric Society, vol. 57(2), pages 307-333, March.
    14. Khalili, Abbas & Chen, Jiahua, 2007. "Variable Selection in Finite Mixture of Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1025-1038, September.
    15. Lluís Bermúdez & Dimitris Karlis & Isabel Morillo, 2020. "Modelling Unobserved Heterogeneity in Claim Counts Using Finite Mixture Models," Risks, MDPI, vol. 8(1), pages 1-13, January.
    16. Rainer Winkelmann, 2008. "Econometric Analysis of Count Data," Springer Books, Springer, edition 0, number 978-3-540-78389-3, September.
    17. Leisch, Friedrich, 2004. "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i08).
    18. Brown, Garfield O. & Buckley, Winston S., 2015. "Experience rating with Poisson mixtures," Annals of Actuarial Science, Cambridge University Press, vol. 9(2), pages 304-321, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tzougas, George & Hoon, W. L. & Lim, J. M., 2019. "The negative binomial-inverse Gaussian regression model with an application to insurance ratemaking," LSE Research Online Documents on Economics 101728, London School of Economics and Political Science, LSE Library.
    2. Montserrat Guillen & Ana M. Pérez-Marín & Mercedes Ayuso & Jens Perch Nielsen, 2018. "“Exposure to risk increases the excess of zero accident claims frequency in automobile insurance”," IREA Working Papers 201810, University of Barcelona, Research Institute of Applied Economics, revised May 2018.
    3. Montserrat Guillen & Jens Perch Nielsen & Mercedes Ayuso & Ana M. Pérez‐Marín, 2019. "The Use of Telematics Devices to Improve Automobile Insurance Rates," Risk Analysis, John Wiley & Sons, vol. 39(3), pages 662-672, March.
    4. Marjan Qazvini, 2019. "On the Validation of Claims with Excess Zeros in Liability Insurance: A Comparative Study," Risks, MDPI, vol. 7(3), pages 1-17, June.
    5. Jean-Philippe Boucher & Roxane Turcotte, 2020. "A Longitudinal Analysis of the Impact of Distance Driven on the Probability of Car Accidents," Risks, MDPI, vol. 8(3), pages 1-19, September.
    6. Mihaela DAVID, 2014. "Modeling The Frequency Of Claims In Auto Insurance With Application To A French Case," Review of Economic and Business Studies, Alexandru Ioan Cuza University, Faculty of Economics and Business Administration, issue 13, pages 69-85, June.
    7. Payandeh Najafabadi Amir T. & MohammadPour Saeed, 2018. "A k-Inflated Negative Binomial Mixture Regression Model: Application to Rate–Making Systems," Asia-Pacific Journal of Risk and Insurance, De Gruyter, vol. 12(2), pages 1-31, July.
    8. Lluís Bermúdez & Dimitris Karlis & Isabel Morillo, 2020. "Modelling Unobserved Heterogeneity in Claim Counts Using Finite Mixture Models," Risks, MDPI, vol. 8(1), pages 1-13, January.
    9. Donatella Porrini & Giulio Fusco & Cosimo Magazzino, 2020. "Black boxes and market efficiency: the effect on premiums in the Italian motor-vehicle insurance market," European Journal of Law and Economics, Springer, vol. 49(3), pages 455-472, June.
    10. Gregori Baetschmann & Rainer Winkelmann, 2014. "A dynamic hurdle model for zero-inflated count data: with an application to health care utilization," ECON - Working Papers 151, Department of Economics - University of Zurich.
    11. Mihaela COVRIG & Dumitru BADEA, 2017. "Some Generalized Linear Models for the Estimation of the Mean Frequency of Claims in Motor Insurance," ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, Faculty of Economic Cybernetics, Statistics and Informatics, vol. 51(4), pages 91-107.
    12. Jean‐Philippe Boucher & Michel Denuit & Montserrat Guillen, 2009. "Number of Accidents or Number of Claims? An Approach with Zero‐Inflated Poisson Models for Panel Data," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 76(4), pages 821-846, December.
    13. Claudia Czado & Holger Schabenberger & Vinzenz Erhardt, 2014. "Non nested model selection for spatial count regression models with application to health insurance," Statistical Papers, Springer, vol. 55(2), pages 455-476, May.
    14. George Tzougas, 2020. "EM Estimation for the Poisson-Inverse Gamma Regression Model with Varying Dispersion: An Application to Insurance Ratemaking," Risks, MDPI, vol. 8(3), pages 1-23, September.
    15. Mihaela Covrig & Iulian Mircea & Gheorghita Zbaganu & Alexandru Coser & Alexandru Tindeche, 2015. "Using R In Generalized Linear Models," Romanian Statistical Review, Romanian Statistical Review, vol. 63(3), pages 33-45, September.
    16. Ping Zeng & Yongyue Wei & Yang Zhao & Jin Liu & Liya Liu & Ruyang Zhang & Jianwei Gou & Shuiping Huang & Feng Chen, 2014. "Variable selection approach for zero-inflated count data via adaptive lasso," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(4), pages 879-894, April.
    17. David Mihaela & Jemna Dănuţ-Vasile, 2015. "Modeling the Frequency of Auto Insurance Claims by Means of Poisson and Negative Binomial Models," Scientific Annals of Economics and Business, Sciendo, vol. 62(2), pages 151-168, July.
    18. Miguel Santolino & Jean-Philippe Boucher, 2009. "Modelling the disability severity score in motor insurance claims: an application to the Spanish case," IREA Working Papers 200902, University of Barcelona, Research Institute of Applied Economics, revised Jan 2009.
    19. Banghee So & Jean-Philippe Boucher & Emiliano A. Valdez, 2021. "Synthetic Dataset Generation of Driver Telematics," Risks, MDPI, vol. 9(4), pages 1-19, March.
    20. Tzougas, George, 2020. "EM estimation for the Poisson-Inverse Gamma regression model with varying dispersion: an application to insurance ratemaking," LSE Research Online Documents on Economics 106539, London School of Economics and Political Science, LSE Library.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jrisks:v:10:y:2022:i:4:p:83-:d:791726. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.