IDEAS home Printed from https://ideas.repec.org/p/aiz/louvad/2024008.html
   My bibliography  Save this paper

Latent Dirichlet Allocation for structured insurance data

Author

Listed:
  • Jamotton, Charlotte

    (Université catholique de Louvain, LIDAM/ISBA, Belgium)

  • Hainaut, Donatien

    (Université catholique de Louvain, LIDAM/ISBA, Belgium)

Abstract

This article explores the application of Latent Dirichlet Allocation (LDA) to structured tabular insurance data. LDA is a probabilistic topic modelling approach initially developed in Natural Language Processing (NLP) to uncover the underlying structure of (unstructured) textual data. It was designed to represent textual documents as mixture of latent (hidden) topics, and topics as mixtures of words. This study introduces the LDA’s document-topic distribution as a soft clustering tool for unsupervised learningtasks in the actuarial field. By defining each topic as a risk profile, and by treating insurance policies as documents and the modalities of categorical covariates as words, we show how LDA can be extended beyond textual data and can offer a framework to uncover underlying structures within insurance portfolios. Our experimental results and analysis highlight how the modelling of policies based on topic cluster membership, and the identification of dominant modalities within each risk profile, can give insights into the prominent risk factors contributing to higher or lower claim frequencies.

Suggested Citation

  • Jamotton, Charlotte & Hainaut, Donatien, 2024. "Latent Dirichlet Allocation for structured insurance data," LIDAM Discussion Papers ISBA 2024008, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
  • Handle: RePEc:aiz:louvad:2024008
    as

    Download full text from publisher

    File URL: https://dial.uclouvain.be/pr/boreal/en/object/boreal%3A285770/datastream/PDF_01/view
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Richman, Ronald, 2021. "AI in actuarial science – a review of recent advances – part 2," Annals of Actuarial Science, Cambridge University Press, vol. 15(2), pages 230-258, July.
    2. Guojun Gan & Emiliano A. Valdez, 2020. "Data Clustering with Actuarial Applications," North American Actuarial Journal, Taylor & Francis Journals, vol. 24(2), pages 168-186, April.
    3. Ng, Kai Wang & Tang, Man-Lai & Tan, Ming & Tian, Guo-Liang, 2008. "Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis," Journal of Multivariate Analysis, Elsevier, vol. 99(3), pages 490-509, March.
    4. Jamotton, Charlotte & Hainaut, Donatien & Hames, Thomas, 2023. "Insurance analytics with clustering techniques," LIDAM Discussion Papers ISBA 2023002, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    5. Richman, Ronald, 2021. "AI in actuarial science – a review of recent advances – part 1," Annals of Actuarial Science, Cambridge University Press, vol. 15(2), pages 207-229, July.
    6. Campbell, Malcolm, 1986. "An Integrated System for Estimating the Risk Premium of Individual Car Models in Motor Insurance," ASTIN Bulletin, Cambridge University Press, vol. 16(2), pages 165-183, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francesca Perla & Salvatore Scognamiglio, 2023. "Locally-coherent multi-population mortality modelling via neural networks," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 46(1), pages 157-176, June.
    2. Benjamin Avanzi & Greg Taylor & Melantha Wang & Bernard Wong, 2023. "Machine Learning with High-Cardinality Categorical Features in Actuarial Applications," Papers 2301.12710, arXiv.org.
    3. Darren Shannon & Tim Jannusch & Florian David‐Spickermann & Martin Mullins & Martin Cunneen & Finbarr Murphy, 2021. "Connected and autonomous vehicle injury loss events: Potential risk and actuarial considerations for primary insurers," Risk Management and Insurance Review, American Risk and Insurance Association, vol. 24(1), pages 5-35, March.
    4. Ramon Alemany & Catalina Bolance & Montserrat Guillen, 2014. "Accounting for severity of risk when pricing insurance products," Working Papers 2014-05, Universitat de Barcelona, UB Riskcenter.
    5. Albarrán Lozano, Irene & Alonso, Pablo J. & Grané Chávez, Aurea, 2011. "Profile identification via weighted related metric scaling : an application to dependent Spanish children," DES - Working Papers. Statistics and Econometrics. WS ws113628, Universidad Carlos III de Madrid. Departamento de Estadística.
    6. Ongaro, A. & Migliorati, S., 2013. "A generalization of the Dirichlet distribution," Journal of Multivariate Analysis, Elsevier, vol. 114(C), pages 412-426.
    7. Qiu, Shi-Fang & Poon, Wai-Yin & Tang, Man-Lai, 2016. "Confidence intervals for an ordinal effect size measure based on partially validated series," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 170-192.
    8. Shuang Yin & Guojun Gan & Emiliano A. Valdez & Jeyaraj Vadiveloo, 2021. "Applications of Clustering with Mixed Type Data in Life Insurance," Risks, MDPI, vol. 9(3), pages 1-19, March.
    9. Nicholas Bett & Juma Kasozi & Daniel Ruturwa, 2022. "Temporal Clustering of the Causes of Death for Mortality Modelling," Risks, MDPI, vol. 10(5), pages 1-34, May.
    10. Li, Huiqiong & Tian, Guoliang & Tang, Niansheng & Cao, Hongyuan, 2018. "Assessing non-inferiority for incomplete paired-data under non-ignorable missing mechanism," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 69-81.
    11. Nguyen, H.D. & Gouno, E., 2020. "Bayesian inference for Common cause failure rate based on causal inference with missing data," Reliability Engineering and System Safety, Elsevier, vol. 197(C).
    12. Tian, Guo-Liang & Tang, Man-Lai & Yuen, Kam Chuen & Ng, Kai Wang, 2010. "Further properties and new applications of the nested Dirichlet distribution," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 394-405, February.
    13. Shengkun Xie & Kun Shi, 2023. "Generalised Additive Modelling of Auto Insurance Data with Territory Design: A Rate Regulation Perspective," Mathematics, MDPI, vol. 11(2), pages 1-24, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aiz:louvad:2024008. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Nadja Peiffer (email available below). General contact details of provider: https://edirc.repec.org/data/isuclbe.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.