IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0316454.html
   My bibliography  Save this article

NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class

Author

Listed:
  • Seongil Han
  • Haemin Jung

Abstract

Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model’s decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE’s capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.

Suggested Citation

  • Seongil Han & Haemin Jung, 2024. "NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class," PLOS ONE, Public Library of Science, vol. 19(12), pages 1-24, December.
  • Handle: RePEc:plo:pone00:0316454
    DOI: 10.1371/journal.pone.0316454
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0316454
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0316454&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0316454?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Anderson, Raymond, 2007. "The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation," OUP Catalogue, Oxford University Press, number 9780199226405, Decembrie.
    2. Lars Ole Hjelkrem & Petter Eilif de Lange, 2023. "Explaining Deep Learning Models for Credit Scoring with SHAP: A Case Study Using Open Banking Data," JRFM, MDPI, vol. 16(4), pages 1-19, April.
    3. Majid Bazarbash, 2019. "FinTech in Financial Inclusion: Machine Learning Applications in Assessing Credit Risk," IMF Working Papers 2019/109, International Monetary Fund.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tigges, Maximilian & Mestwerdt, Sönke & Tschirner, Sebastian & Mauer, René, 2024. "Who gets the money? A qualitative analysis of fintech lending and credit scoring through the adoption of AI and alternative data," Technological Forecasting and Social Change, Elsevier, vol. 205(C).
    2. Wosnitza, Jan Henrik, 2022. "Calibration alternatives to logistic regression and their potential for transferring the dispersion of discriminatory power into uncertainties of probabilities of default," Discussion Papers 04/2022, Deutsche Bundesbank.
    3. Marcin Chlebus, 2014. "One-day prediction of state of turbulence for financial instrument based on models for binary dependent variable," Ekonomia journal, Faculty of Economic Sciences, University of Warsaw, vol. 37.
    4. Fiorella De Fiore & Leonardo Gambacorta & Cristina Manea, 2023. "Big techs and the credit channel of monetary policy," BIS Working Papers 1088, Bank for International Settlements.
    5. Raffaele Manini & Oriol Amat, 2018. "Credit scoring for the supermarket and retailing industry: analysis and application proposal," Economics Working Papers 1614, Department of Economics and Business, Universitat Pompeu Fabra.
    6. Enrique Batiz‐Zuk & Fabrizio López‐Gallo & Abdulkadir Mohamed & Fátima Sánchez‐Cajal, 2022. "Determinants of loan survival rates for small and medium‐sized enterprises: Evidence from an emerging economy," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 27(4), pages 4741-4755, October.
    7. A?da Kammoun & Imen Triki, 2016. "Credit Scoring Models for a Tunisian Microfinance Institution: Comparison between Artificial Neural Network and Logistic Regression," Review of Economics & Finance, Better Advances Press, Canada, vol. 6, pages 61-78, February.
    8. Kritzinger, Nico & van Vuuren, Gary Wayne, 2021. "Non-capital calibration of bureau scorecards," The Quarterly Review of Economics and Finance, Elsevier, vol. 79(C), pages 260-271.
    9. Husam Rjoub & Tomiwa Sunday Adebayo & Dervis Kirikkaleli, 2023. "Blockchain technology-based FinTech banking sector involvement using adaptive neuro-fuzzy-based K-nearest neighbors algorithm," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 9(1), pages 1-23, December.
    10. Zhiyong Li & Xinyi Hu & Ke Li & Fanyin Zhou & Feng Shen, 2020. "Inferring the outcomes of rejected loans: an application of semisupervised clustering," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(2), pages 631-654, February.
    11. Nartey Menzo, Benjamin Prince & Mogre, Diana & Asuamah Yeboah, Samuel, 2024. "Beyond Income: The Complexities of Credit Risk in Developing Countries," MPRA Paper 122364, University Library of Munich, Germany, revised 20 Sep 2024.
    12. George Xianzhi Yuan & Huiqi Wang, 2019. "The general dynamic risk assessment for the enterprise by the hologram approach in financial technology," International Journal of Financial Engineering (IJFE), World Scientific Publishing Co. Pte. Ltd., vol. 6(01), pages 1-48, March.
    13. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    14. Sant'Anna, Dário A.L.M. & Figueiredo, Paulo N., 2024. "Fintech innovation: Is it beneficial or detrimental to financial inclusion and financial stability? A systematic literature review and research directions," Emerging Markets Review, Elsevier, vol. 60(C).
    15. Kiviat, Barbara, 2019. "Credit Scoring in the United States," economic sociology. perspectives and conversations, Max Planck Institute for the Study of Societies, vol. 21(1), pages 33-42.
    16. Singh, Ramendra Pratap & Singh, Ramendra & Mishra, Prashant, 2021. "Does managing customer accounts receivable impact customer relationships, and sales performance? An empirical investigation," Journal of Retailing and Consumer Services, Elsevier, vol. 60(C).
    17. Ha-Thu Nguyen, 2015. "How is credit scoring used to predict default in China?," EconomiX Working Papers 2015-1, University of Paris Nanterre, EconomiX.
    18. Karol Przanowski, 2014. "Credit acceptance process strategy case studies - the power of Credit Scoring," Papers 1403.6531, arXiv.org.
    19. Ha-Thu Nguyen, 2014. "Default Predictors in Credit Scoring - Evidence from France’s Retail Banking Institution," EconomiX Working Papers 2014-26, University of Paris Nanterre, EconomiX.
    20. Jackelyn Hwang & Elizabeth Kneebone & Vasudha Kumar, 2023. "Recent Findings on Residential Instability in Oakland," Community Development Research Brief, Federal Reserve Bank of San Francisco, vol. 2023(02), pages 1-33, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0316454. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.