IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2602.19663.html

The impact of class imbalance in logistic regression models for low-default portfolios in credit risk

Author

Listed:
  • Willem D. Schutte
  • Charl Pretorius
  • Neill Smit
  • Leandra van der Merwe
  • Robert Maxwell

Abstract

In this paper, we study how class imbalance, typical of low-default credit portfolios, affects the performance of logistic regression models. Using a simulation study with controlled data-generating mechanisms, we vary (i) the level of class imbalance and (ii) the strength of association between the predictors and the response. The results show that, for a given strength of association, achievable classification accuracy deteriorates markedly as the event rate decreases, and the optimal classification cut-off shifts with the level of imbalance. In contrast, the Gini coefficient is comparatively stable with respect to class imbalance once sample sizes are sufficiently large, even when classification accuracy is strongly affected. As a practical guideline, we summarise attainable classification performance as a function of the event rate and strength of association between the predictors and the response.

Suggested Citation

  • Willem D. Schutte & Charl Pretorius & Neill Smit & Leandra van der Merwe & Robert Maxwell, 2026. "The impact of class imbalance in logistic regression models for low-default portfolios in credit risk," Papers 2602.19663, arXiv.org.
  • Handle: RePEc:arx:papers:2602.19663
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2602.19663
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    2. Thomas, Lyn C., 2009. "Consumer Credit Models: Pricing, Profit and Portfolios," OUP Catalogue, Oxford University Press, number 9780199232130.
    3. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    4. Misuk Kim & Kyu-Baek Hwang, 2022. "An empirical evaluation of sampling methods for the classification of imbalanced data," PLOS ONE, Public Library of Science, vol. 17(7), pages 1-22, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dendramis, Y. & Tzavalis, E. & Varthalitis, P. & Athanasiou, E., 2020. "Predicting default risk under asymmetric binary link functions," International Journal of Forecasting, Elsevier, vol. 36(3), pages 1039-1056.
    2. Chen, Yujia & Calabrese, Raffaella & Martin-Barragan, Belen, 2024. "Interpretable machine learning for imbalanced credit scoring datasets," European Journal of Operational Research, Elsevier, vol. 312(1), pages 357-372.
    3. Murphy, Sinnott & Sowell, Fallaw & Apt, Jay, 2019. "A time-dependent model of generator failures and recoveries captures correlated events and quantifies temperature dependence," Applied Energy, Elsevier, vol. 253(C), pages 1-1.
    4. Angel M. Morales & Patrick Tarwater & Indika Mallawaarachchi & Alok Kumar Dwivedi & Juan B. Figueroa-Casas, 2015. "Multinomial logistic regression approach for the evaluation of binary diagnostic test in medical research," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 16(2), pages 203-222, June.
    5. F. Gauthier & D. Germain & B. Hétu, 2017. "Logistic models as a forecasting tool for snow avalanches in a cold maritime climate: northern Gaspésie, Québec, Canada," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 89(1), pages 201-232, October.
    6. Douglas Cumming & Lars Hornuf & Moein Karami & Denis Schweizer, 2023. "Disentangling Crowdfunding from Fraudfunding," Journal of Business Ethics, Springer, vol. 182(4), pages 1103-1128, February.
    7. Cristian David Correa-Álvarez & Juan Carlos Salazar-Uribe & Luis Raúl Pericchi-Guerra, 2023. "Bayesian multilevel logistic regression models: a case study applied to the results of two questionnaires administered to university students," Computational Statistics, Springer, vol. 38(4), pages 1791-1810, December.
    8. Eunae Yoo & Elliot Rabinovich & Bin Gu, 2020. "The Growth of Follower Networks on Social Media Platforms for Humanitarian Operations," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2696-2715, December.
    9. Lo Turco, Alessia & Maggioni, Daniela, 2018. "Effects of Islamic religiosity on bilateral trust in trade: The case of Turkish exports," Journal of Comparative Economics, Elsevier, vol. 46(4), pages 947-965.
    10. Matija Kovacic & Claudio Zoli, 2021. "Ethnic distribution, effective power and conflict," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 57(2), pages 257-299, August.
    11. Blackman, Allen & Guerrero, Santiago, 2012. "What drives voluntary eco-certification in Mexico?," Journal of Comparative Economics, Elsevier, vol. 40(2), pages 256-268.
    12. Jacob Ausderan, 2018. "Reassessing the democratic advantage in interstate wars using k-adic datasets," Conflict Management and Peace Science, Peace Science Society (International), vol. 35(5), pages 451-473, September.
    13. Paul Poast, 2013. "Issue linkage and international cooperation: An empirical investigation," Conflict Management and Peace Science, Peace Science Society (International), vol. 30(3), pages 286-303, July.
    14. Yerko Rojas, 2017. "Evictions and short-term all-cause mortality: a 3-year follow-up study of a middle-aged Swedish population," International Journal of Public Health, Springer;Swiss School of Public Health (SSPH+), vol. 62(3), pages 343-351, April.
    15. Mehrez Ben Slama & Dhafer Saidane & Hassouna Fedhila, 2012. "How to identify targets in the M&A banking operations? Case of cross-border strategies in Europe by line of activity," Review of Quantitative Finance and Accounting, Springer, vol. 38(2), pages 209-240, February.
    16. Marcin Chlebus, 2014. "One-day prediction of state of turbulence for financial instrument based on models for binary dependent variable," Ekonomia journal, Faculty of Economic Sciences, University of Warsaw, vol. 37.
    17. Lorenzo Cassi & Anne Plunket, 2014. "Proximity, network formation and inventive performance: in search of the proximity paradox," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 53(2), pages 395-422, September.
    18. Trent Geisler & Herman Ray & Ying Xie, 2023. "Finding the Proverbial Needle: Improving Minority Class Identification Under Extreme Class Imbalance," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 192-212, April.
    19. Dugan, Spencer August & Utne, Ingrid Bouwer, 2025. "Improved identification of maritime risk-influencing factors using AIS data in regression analysis," Reliability Engineering and System Safety, Elsevier, vol. 262(C).
    20. Adriana Bruscato Bortoluzzo & Danny Pimentel Claro & Marco Antonio Leonel Caetano & Rinaldo Artes, 2009. "Estimating Claim Size and Probability in the Auto-insurance Industry: The Zero-adjusted Inverse Gaussian (ZAIG) Distribution," Business and Economics Working Papers 056, Unidade de Negocios e Economia, Insper.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2602.19663. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.