IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2301.12710.html
   My bibliography  Save this paper

Machine Learning with High-Cardinality Categorical Features in Actuarial Applications

Author

Listed:
  • Benjamin Avanzi
  • Greg Taylor
  • Melantha Wang
  • Bernard Wong

Abstract

High-cardinality categorical features are pervasive in actuarial data (e.g. occupation in commercial property insurance). Standard categorical encoding methods like one-hot encoding are inadequate in these settings. In this work, we present a novel _Generalised Linear Mixed Model Neural Network_ ("GLMMNet") approach to the modelling of high-cardinality categorical features. The GLMMNet integrates a generalised linear mixed model in a deep learning framework, offering the predictive power of neural networks and the transparency of random effects estimates, the latter of which cannot be obtained from the entity embedding models. Further, its flexibility to deal with any distribution in the exponential dispersion (ED) family makes it widely applicable to many actuarial contexts and beyond. We illustrate and compare the GLMMNet against existing approaches in a range of simulation experiments as well as in a real-life insurance case study. Notably, we find that the GLMMNet often outperforms or at least performs comparably with an entity embedded neural network, while providing the additional benefit of transparency, which is particularly valuable in practical applications. Importantly, while our model was motivated by actuarial applications, it can have wider applicability. The GLMMNet would suit any applications that involve high-cardinality categorical variables and where the response cannot be sufficiently modelled by a Gaussian distribution.

Suggested Citation

  • Benjamin Avanzi & Greg Taylor & Melantha Wang & Bernard Wong, 2023. "Machine Learning with High-Cardinality Categorical Features in Actuarial Applications," Papers 2301.12710, arXiv.org.
  • Handle: RePEc:arx:papers:2301.12710
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2301.12710
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Antonio, Katrien & Beirlant, Jan, 2007. "Actuarial statistics with generalized linear mixed models," Insurance: Mathematics and Economics, Elsevier, vol. 40(1), pages 58-76, January.
    2. repec:cup:astinb:v:49:y:2019:i:01:p:1-3_00 is not listed on IDEAS
    3. Florian Pargent & Florian Pfisterer & Janek Thomas & Bernd Bischl, 2022. "Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features," Computational Statistics, Springer, vol. 37(5), pages 2671-2692, November.
    4. Roel Henckaerts & Marie-Pier Côté & Katrien Antonio & Roel Verbelen, 2021. "Boosting Insights in Insurance Tariff Plans with Tree-Based Machine Learning Methods," North American Actuarial Journal, Taylor & Francis Journals, vol. 25(2), pages 255-285, April.
    5. David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
    6. Richman, Ronald, 2021. "AI in actuarial science – a review of recent advances – part 2," Annals of Actuarial Science, Cambridge University Press, vol. 15(2), pages 230-258, July.
    7. Kevin Kuo & Ronald Richman, 2021. "Embeddings and Attention in Predictive Modeling," Papers 2104.03545, arXiv.org.
    8. Richman, Ronald & Wüthrich, Mario V., 2021. "A neural network extension of the Lee–Carter model to multiple populations," Annals of Actuarial Science, Cambridge University Press, vol. 15(2), pages 346-366, July.
    9. Al-Mudafer, Muhammed Taher & Avanzi, Benjamin & Taylor, Greg & Wong, Bernard, 2022. "Stochastic loss reserving with mixture density neural networks," Insurance: Mathematics and Economics, Elsevier, vol. 105(C), pages 144-174.
    10. Richman, Ronald, 2021. "AI in actuarial science – a review of recent advances – part 1," Annals of Actuarial Science, Cambridge University Press, vol. 15(2), pages 207-229, July.
    11. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    2. Bansal, Prateek & Krueger, Rico & Graham, Daniel J., 2021. "Fast Bayesian estimation of spatial count data models," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    3. Bram van Os, 2023. "Information-Theoretic Time-Varying Density Modeling," Tinbergen Institute Discussion Papers 23-037/III, Tinbergen Institute.
    4. Benjamin Avanzi & Yanfeng Li & Bernard Wong & Alan Xian, 2022. "Ensemble distributional forecasting for insurance loss reserving," Papers 2206.08541, arXiv.org, revised Feb 2024.
    5. David T. Frazier & Ruben Loaiza-Maya & Gael M. Martin, 2021. "Variational Bayes in State Space Models: Inferential and Predictive Accuracy," Papers 2106.12262, arXiv.org, revised Feb 2022.
    6. Jamotton, Charlotte & Hainaut, Donatien, 2024. "Latent Dirichlet Allocation for structured insurance data," LIDAM Discussion Papers ISBA 2024008, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    7. Krueger, Rico & Rashidi, Taha H. & Vij, Akshay, 2020. "A Dirichlet process mixture model of discrete choice: Comparisons and a case study on preferences for shared automated vehicles," Journal of choice modelling, Elsevier, vol. 36(C).
    8. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    9. Francesca Perla & Salvatore Scognamiglio, 2023. "Locally-coherent multi-population mortality modelling via neural networks," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 46(1), pages 157-176, June.
    10. Gael M. Martin & David T. Frazier & Worapree Maneesoonthorn & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2022. "Bayesian Forecasting in Economics and Finance: A Modern Review," Papers 2212.03471, arXiv.org, revised Jul 2023.
    11. David T. Frazier & Ruben Loaiza-Maya & Gael M. Martin & Bonsoo Koo, 2021. "Loss-Based Variational Bayes Prediction," Monash Econometrics and Business Statistics Working Papers 8/21, Monash University, Department of Econometrics and Business Statistics.
    12. Freek Holvoet & Katrien Antonio & Roel Henckaerts, 2023. "Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff," Papers 2310.12671, arXiv.org, revised Oct 2023.
    13. Chaya Weerasinghe & Ruben Loaiza-Maya & Gael M. Martin & David T. Frazier, 2023. "ABC-based Forecasting in State Space Models," Monash Econometrics and Business Statistics Working Papers 12/23, Monash University, Department of Econometrics and Business Statistics.
    14. Patrick Toman & Nalini Ravishanker & Nathan Lally & Sanguthevar Rajasekaran, 2023. "Latent Autoregressive Student- t Prior Process Models to Assess Impact of Interventions in Time Series," Future Internet, MDPI, vol. 16(1), pages 1-17, December.
    15. Azar, Pablo D. & Micali, Silvio, 2018. "Computational principal agent problems," Theoretical Economics, Econometric Society, vol. 13(2), May.
    16. Angelica Gianfreda & Francesco Ravazzolo & Luca Rossini, 2023. "Large Time‐Varying Volatility Models for Hourly Electricity Prices," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 85(3), pages 545-573, June.
    17. Davide Pettenuzzo & Francesco Ravazzolo, 2016. "Optimal Portfolio Choice Under Decision‐Based Model Combinations," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 31(7), pages 1312-1332, November.
    18. Rubio, F.J. & Steel, M.F.J., 2011. "Inference for grouped data with a truncated skew-Laplace distribution," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3218-3231, December.
    19. Hwang, Eunju, 2022. "Prediction intervals of the COVID-19 cases by HAR models with growth rates and vaccination rates in top eight affected countries: Bootstrap improvement," Chaos, Solitons & Fractals, Elsevier, vol. 155(C).
    20. R de Fondeville & A C Davison, 2018. "High-dimensional peaks-over-threshold inference," Biometrika, Biometrika Trust, vol. 105(3), pages 575-592.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2301.12710. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.