IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2301.12710.html

Machine Learning with High-Cardinality Categorical Features in Actuarial Applications

Author

Listed:
  • Benjamin Avanzi
  • Greg Taylor
  • Melantha Wang
  • Bernard Wong

Abstract

High-cardinality categorical features are pervasive in actuarial data (e.g. occupation in commercial property insurance). Standard categorical encoding methods like one-hot encoding are inadequate in these settings. In this work, we present a novel _Generalised Linear Mixed Model Neural Network_ ("GLMMNet") approach to the modelling of high-cardinality categorical features. The GLMMNet integrates a generalised linear mixed model in a deep learning framework, offering the predictive power of neural networks and the transparency of random effects estimates, the latter of which cannot be obtained from the entity embedding models. Further, its flexibility to deal with any distribution in the exponential dispersion (ED) family makes it widely applicable to many actuarial contexts and beyond. We illustrate and compare the GLMMNet against existing approaches in a range of simulation experiments as well as in a real-life insurance case study. Notably, we find that the GLMMNet often outperforms or at least performs comparably with an entity embedded neural network, while providing the additional benefit of transparency, which is particularly valuable in practical applications. Importantly, while our model was motivated by actuarial applications, it can have wider applicability. The GLMMNet would suit any applications that involve high-cardinality categorical variables and where the response cannot be sufficiently modelled by a Gaussian distribution.

Suggested Citation

  • Benjamin Avanzi & Greg Taylor & Melantha Wang & Bernard Wong, 2023. "Machine Learning with High-Cardinality Categorical Features in Actuarial Applications," Papers 2301.12710, arXiv.org.
  • Handle: RePEc:arx:papers:2301.12710
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2301.12710
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. repec:cup:astinb:v:49:y:2019:i:01:p:1-3_00 is not listed on IDEAS
    2. David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
    3. Richman, Ronald, 2021. "AI in actuarial science – a review of recent advances – part 2," Annals of Actuarial Science, Cambridge University Press, vol. 15(2), pages 230-258, July.
    4. Kevin Kuo & Ronald Richman, 2021. "Embeddings and Attention in Predictive Modeling," Papers 2104.03545, arXiv.org.
    5. Al-Mudafer, Muhammed Taher & Avanzi, Benjamin & Taylor, Greg & Wong, Bernard, 2022. "Stochastic loss reserving with mixture density neural networks," Insurance: Mathematics and Economics, Elsevier, vol. 105(C), pages 144-174.
    6. Richman, Ronald & Wüthrich, Mario V., 2021. "A neural network extension of the Lee–Carter model to multiple populations," Annals of Actuarial Science, Cambridge University Press, vol. 15(2), pages 346-366, July.
    7. Richman, Ronald, 2021. "AI in actuarial science – a review of recent advances – part 1," Annals of Actuarial Science, Cambridge University Press, vol. 15(2), pages 207-229, July.
    8. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    9. Florian Pargent & Florian Pfisterer & Janek Thomas & Bernd Bischl, 2022. "Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features," Computational Statistics, Springer, vol. 37(5), pages 2671-2692, November.
    10. Roel Henckaerts & Marie-Pier Côté & Katrien Antonio & Roel Verbelen, 2021. "Boosting Insights in Insurance Tariff Plans with Tree-Based Machine Learning Methods," North American Actuarial Journal, Taylor & Francis Journals, vol. 25(2), pages 255-285, April.
    11. Antonio, Katrien & Beirlant, Jan, 2007. "Actuarial statistics with generalized linear mixed models," Insurance: Mathematics and Economics, Elsevier, vol. 40(1), pages 58-76, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jaiswal, Rachana & Gupta, Shashank & Tiwari, Aviral Kumar, 2024. "Big data and machine learning-based decision support system to reshape the vaticination of insurance claims," Technological Forecasting and Social Change, Elsevier, vol. 209(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yang Qiao & Chou-Wen Wang & Wenjun Zhu, 2024. "Machine learning in long-term mortality forecasting," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 49(2), pages 340-362, April.
    2. Giorgio Alfredo Spedicato & Christophe Dutang & Quentin Guibert, 2025. "Adjusting Manual Rates to Own Experience: Comparing the Credibility Approach to Machine Learning," Post-Print hal-04821310, HAL.
    3. Freek Holvoet & Christopher Blier-Wong & Katrien Antonio, 2025. "A multi-view contrastive learning framework for spatial embeddings in risk modelling," Papers 2511.17954, arXiv.org.
    4. Katrien Antonio & Christophe Dutang & Andreas Tsanakas, 2021. "Editorial," Post-Print hal-04748464, HAL.
    5. Jaiswal, Rachana & Gupta, Shashank & Tiwari, Aviral Kumar, 2024. "Big data and machine learning-based decision support system to reshape the vaticination of insurance claims," Technological Forecasting and Social Change, Elsevier, vol. 209(C).
    6. Simon Hatzesberger & Iris Nonneman, 2025. "Advanced Applications of Generative AI in Actuarial Science: Case Studies Beyond ChatGPT," Papers 2506.18942, arXiv.org.
    7. Hung-Tsung Hsiao & Chou-Wen Wang & I.-Chien Liu & Ko-Lun Kung, 2024. "Mortality improvement neural-network models with autoregressive effects," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 49(2), pages 363-383, April.
    8. Li, Li & Li, Han & Panagiotelis, Anastasios, 2025. "Boosting domain-specific models with shrinkage: An application in mortality forecasting," International Journal of Forecasting, Elsevier, vol. 41(1), pages 191-207.
    9. Ronald Richman & Salvatore Scognamiglio & Mario V. Wuthrich, 2024. "The Credibility Transformer," Papers 2409.16653, arXiv.org.
    10. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    11. Bansal, Prateek & Krueger, Rico & Graham, Daniel J., 2021. "Fast Bayesian estimation of spatial count data models," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    12. Bram van Os, 2023. "Information-Theoretic Time-Varying Density Modeling," Tinbergen Institute Discussion Papers 23-037/III, Tinbergen Institute.
    13. Patrick Toman & Nalini Ravishanker & Nathan Lally & Sanguthevar Rajasekaran, 2025. "Forecasting Robust Gaussian Process State Space Models for Assessing Intervention Impact in Internet of Things Time Series," Forecasting, MDPI, vol. 7(2), pages 1-20, May.
    14. Benjamin Avanzi & Yanfeng Li & Bernard Wong & Alan Xian, 2022. "Ensemble distributional forecasting for insurance loss reserving," Papers 2206.08541, arXiv.org, revised Jun 2024.
    15. Aleksandar Arandjelovi'c & Julia Eisenberg, 2024. "Optimal risk mitigation by deep reinsurance," Papers 2408.06168, arXiv.org, revised Nov 2025.
    16. Weerasinghe, Chaya & Loaiza-Maya, Rubén & Martin, Gael M. & Frazier, David T., 2025. "ABC-based forecasting in misspecified state space models," International Journal of Forecasting, Elsevier, vol. 41(1), pages 270-289.
    17. David T. Frazier & Ruben Loaiza-Maya & Gael M. Martin, 2021. "Variational Bayes in State Space Models: Inferential and Predictive Accuracy," Papers 2106.12262, arXiv.org, revised Feb 2022.
    18. Jamotton, Charlotte & Hainaut, Donatien, 2024. "Latent Dirichlet Allocation for structured insurance data," LIDAM Discussion Papers ISBA 2024008, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    19. Diego Zappa & Gian Paolo Clemente & Francesco Della Corte & Nino Savelli, 2023. "Editorial on the Special Issue on Insurance: complexity, risks and its connection with social sciences," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(2), pages 125-130, December.
    20. Zhu, Felix & Dong, Yumo & Huang, Fei, 2025. "Data-rich economic forecasting for actuarial applications," Insurance: Mathematics and Economics, Elsevier, vol. 124(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2301.12710. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.