IDEAS home Printed from https://ideas.repec.org/a/eee/insuma/v120y2025icp17-41.html
   My bibliography  Save this article

Automated machine learning in insurance

Author

Listed:
  • Dong, Panyi
  • Quan, Zhiyu

Abstract

Machine Learning (ML) has gained popularity in actuarial research and insurance industrial applications. However, the performance of most ML tasks heavily depends on data preprocessing, model selection, and hyperparameter optimization, which are considered to be intensive in terms of domain knowledge, experience, and manual labor. Automated Machine Learning (AutoML) aims to automatically complete the full life-cycle of ML tasks and provides state-of-the-art ML models without human intervention or supervision. This paper introduces an AutoML workflow that allows users without domain knowledge or prior experience to achieve robust and effortless ML deployment by writing only a few lines of code. This proposed AutoML is specifically tailored for the insurance application, with features like the balancing step in data preprocessing, ensemble pipelines, and customized loss functions. These features are designed to address the unique challenges of the insurance domain, including the imbalanced nature of common insurance datasets. The full code and documentation are available on the GitHub repository.1

Suggested Citation

  • Dong, Panyi & Quan, Zhiyu, 2025. "Automated machine learning in insurance," Insurance: Mathematics and Economics, Elsevier, vol. 120(C), pages 17-41.
  • Handle: RePEc:eee:insuma:v:120:y:2025:i:c:p:17-41
    DOI: 10.1016/j.insmatheco.2024.10.002
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167668724001057
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.insmatheco.2024.10.002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Peiris, Hashan & Jeong, Himchan & Kim, Jae-Kwang & Lee, Hangsuck, 2024. "Integration of traditional and telematics data for efficient insurance claims prediction," ASTIN Bulletin, Cambridge University Press, vol. 54(2), pages 263-279, May.
    2. Pedro Guerra & Mauro Castelli, 2021. "Machine Learning Applied to Banking Supervision a Literature Review," Risks, MDPI, vol. 9(7), pages 1-24, July.
    3. Brian Hartman & Rebecca Owen & Zoe Gibbs, 2020. "Predicting High-Cost Health Insurance Members through Boosted Trees and Oversampling: An Application Using the HCCI Database," North American Actuarial Journal, Taylor & Francis Journals, vol. 25(1), pages 53-61, July.
    4. O. Y. Bakhteev & V. V. Strijov, 2020. "Comprehensive analysis of gradient-based hyperparameter optimization algorithms," Annals of Operations Research, Springer, vol. 289(1), pages 51-65, June.
    5. Okine, A. Nii-Armah & Frees, Edward W. & Shi, Peng, 2022. "Joint Model Prediction And Application To Individual-Level Loss Reserving," ASTIN Bulletin, Cambridge University Press, vol. 52(1), pages 91-116, January.
    6. Jared Cummings & Brian Hartman, 2022. "Using Machine Learning to Better Model Long-Term Care Insurance Claims," North American Actuarial Journal, Taylor & Francis Journals, vol. 26(3), pages 470-483, August.
    7. Roxane Turcotte & Jean-Philippe Boucher, 2024. "GAMLSS for Longitudinal Multivariate Claim Count Models," North American Actuarial Journal, Taylor & Francis Journals, vol. 28(2), pages 337-360, April.
    8. So, Banghee & Boucher, Jean-Philippe & Valdez, Emiliano A., 2021. "Cost-Sensitive Multi-Class Adaboost For Understanding Driving Behavior Based On Telematics," ASTIN Bulletin, Cambridge University Press, vol. 51(3), pages 719-751, September.
    9. Arthur Charpentier & Romuald Élie & Carl Remlinger, 2023. "Reinforcement Learning in Economics and Finance," Computational Economics, Springer;Society for Computational Economics, vol. 62(1), pages 425-462, June.
    10. Dennis Bams & Thorsten Lehnert & Christian C. P. Wolff, 2009. "Loss Functions in Option Valuation: A Framework for Selection," Management Science, INFORMS, vol. 55(5), pages 853-862, May.
    11. Qi Wang & Yue Ma & Kun Zhao & Yingjie Tian, 2022. "A Comprehensive Survey of Loss Functions in Machine Learning," Annals of Data Science, Springer, vol. 9(2), pages 187-212, April.
    12. Jeong, Himchan, 2024. "Tweedie multivariate semi-parametric credibility with the exchangeable correlation," Insurance: Mathematics and Economics, Elsevier, vol. 115(C), pages 13-21.
    13. Banghee So, 2024. "Enhanced gradient boosting for zero-inflated insurance claims and comparative analysis of CatBoost, XGBoost, and LightGBM," Scandinavian Actuarial Journal, Taylor & Francis Journals, vol. 2024(10), pages 1013-1035, November.
    14. de Jong,Piet & Heller,Gillian Z., 2008. "Generalized Linear Models for Insurance Data," Cambridge Books, Cambridge University Press, number 9780521879149, Enero-Abr.
    15. Ma, Liye & Sun, Baohong, 2020. "Machine learning and AI in marketing – Connecting computing power to human insights," International Journal of Research in Marketing, Elsevier, vol. 37(3), pages 481-504.
    16. Zhang, Yaojun & Ji, Lanpeng & Aivaliotis, Georgios & Taylor, Charles, 2024. "Bayesian CART models for insurance claims frequency," Insurance: Mathematics and Economics, Elsevier, vol. 114(C), pages 108-131.
    17. Zhiyu Quan & Changyue Hu & Panyi Dong & Emiliano A. Valdez, 2024. "Improving Business Insurance Loss Models by Leveraging InsurTech Innovation," Papers 2401.16723, arXiv.org.
    18. Edward W. Frees & Gee Lee & Lu Yang, 2016. "Multivariate Frequency-Severity Regression Models in Insurance," Risks, MDPI, vol. 4(1), pages 1-36, February.
    19. Peng Shi & Wei Zhang & Kun Shi, 2024. "Leveraging Weather Dynamics in Insurance Claims Triage Using Deep Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(546), pages 825-838, April.
    20. Hu, Changyue & Quan, Zhiyu & Chong, Wing Fung, 2022. "Imbalanced learning for insurance using modified loss functions in tree-based models," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 13-32.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Šoltés Erik & Zelinová Silvia & Bilíková Mária, 2019. "General Linear Model: An Effective Tool For Analysis Of Claim Severity In Motor Third Party Liability Insurance," Statistics in Transition New Series, Statistics Poland, vol. 20(4), pages 13-31, December.
    2. Park, Sojung C. & Kim, Joseph H.T. & Ahn, Jae Youn, 2018. "Does hunger for bonuses drive the dependence between claim frequency and severity?," Insurance: Mathematics and Economics, Elsevier, vol. 83(C), pages 32-46.
    3. Kaiwen Wang & Jiehui Ding & Kristen R. Lidwell & Scott Manski & Gee Y. Lee & Emilio Xavier Esposito, 2019. "Treatment Level and Store Level Analyses of Healthcare Data," Risks, MDPI, vol. 7(2), pages 1-22, April.
    4. Erik Šoltés & Silvia Zelinová & Mária Bilíková, 2019. "General Linear Model: An Effective Tool For Analysis Of Claim Severity In Motor Third Party Liability Insurance," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 13-31, December.
    5. Marian Reiff & Erik Šoltés & Silvia Komara & Tatiana Šoltésová & Silvia Zelinová, 2022. "Segmentation and estimation of claim severity in motor third-party liability insurance through contrast analysis," Equilibrium. Quarterly Journal of Economics and Economic Policy, Institute of Economic Research, vol. 17(3), pages 803-842, September.
    6. Oh, Rosy & Lee, Kyung Suk & Park, Sojung C. & Ahn, Jae Youn, 2020. "Double-counting problem of the bonus–malus system," Insurance: Mathematics and Economics, Elsevier, vol. 93(C), pages 141-155.
    7. Fung, Tsz Chai & Badescu, Andrei L. & Lin, X. Sheldon, 2019. "A class of mixture of experts models for general insurance: Theoretical developments," Insurance: Mathematics and Economics, Elsevier, vol. 89(C), pages 111-127.
    8. Gómez-Déniz, E., 2016. "Bivariate credibility bonus–malus premiums distinguishing between two types of claims," Insurance: Mathematics and Economics, Elsevier, vol. 70(C), pages 117-124.
    9. Yang Lu, 2019. "Flexible (panel) regression models for bivariate count–continuous data with an insurance application," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1503-1521, October.
    10. Avanzi, Benjamin & Taylor, Greg & Wong, Bernard & Yang, Xinda, 2021. "On the modelling of multivariate counts with Cox processes and dependent shot noise intensities," Insurance: Mathematics and Economics, Elsevier, vol. 99(C), pages 9-24.
    11. Chenglong Ye & Lin Zhang & Mingxuan Han & Yanjia Yu & Bingxin Zhao & Yuhong Yang, 2022. "Combining Predictions of Auto Insurance Claims," Econometrics, MDPI, vol. 10(2), pages 1-15, April.
    12. Yoganathan, Vignesh & Osburg, Victoria-Sophie, 2024. "The mind in the machine: Estimating mind perception's effect on user satisfaction with voice-based conversational agents," Journal of Business Research, Elsevier, vol. 175(C).
    13. Xisong Jin, 2018. "How much does book value data tell us about systemic risk and its interactions with the macroeconomy? A Luxembourg empirical evaluation," BCL working papers 118, Central Bank of Luxembourg.
    14. Aivars Spilbergs & Andris Fomins & Māris Krastiņš, 2022. "Multivariate Modelling of Motor Third Party Liability Insurance Claims," European Journal of Business Science and Technology, Mendel University in Brno, Faculty of Business and Economics, vol. 8(1), pages 5-18.
    15. Deprez, Laurens & Antonio, Katrien & Boute, Robert, 2021. "Pricing service maintenance contracts using predictive analytics," European Journal of Operational Research, Elsevier, vol. 290(2), pages 530-545.
    16. Adriana Bruscato Bortoluzzo & Danny Pimentel Claro & Marco Antonio Leonel Caetano & Rinaldo Artes, 2009. "Estimating Claim Size and Probability in the Auto-insurance Industry: The Zero-adjusted Inverse Gaussian (ZAIG) Distribution," Business and Economics Working Papers 056, Unidade de Negocios e Economia, Insper.
    17. Martin Branda, 2014. "Optimization Approaches to Multiplicative Tariff of Rates Estimation in Non-Life Insurance," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 31(05), pages 1-17.
    18. Jeonghwan Kim & Woojoo Lee, 2019. "On testing the hidden heterogeneity in negative binomial regression models," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 82(4), pages 457-470, May.
    19. Lu Yang & Claudia Czado, 2022. "Two‐part D‐vine copula models for longitudinal insurance claim data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(4), pages 1534-1561, December.
    20. Ghosh, Sourav & Yadav, Sarita & Devi, Ambika & Thomas, Tiju, 2022. "Techno-economic understanding of Indian energy-storage market: A perspective on green materials-based supercapacitor technologies," Renewable and Sustainable Energy Reviews, Elsevier, vol. 161(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:insuma:v:120:y:2025:i:c:p:17-41. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/505554 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.