IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0316454.html
   My bibliography  Save this article

NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class

Author

Listed:
  • Seongil Han
  • Haemin Jung

Abstract

Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model’s decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE’s capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.

Suggested Citation

  • Seongil Han & Haemin Jung, 2024. "NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class," PLOS ONE, Public Library of Science, vol. 19(12), pages 1-24, December.
  • Handle: RePEc:plo:pone00:0316454
    DOI: 10.1371/journal.pone.0316454
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0316454
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0316454&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0316454?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Majid Bazarbash, 2019. "FinTech in Financial Inclusion: Machine Learning Applications in Assessing Credit Risk," IMF Working Papers 2019/109, International Monetary Fund.
    2. Anderson, Raymond, 2007. "The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation," OUP Catalogue, Oxford University Press, number 9780199226405, Decembrie.
    3. Lars Ole Hjelkrem & Petter Eilif de Lange, 2023. "Explaining Deep Learning Models for Credit Scoring with SHAP: A Case Study Using Open Banking Data," JRFM, MDPI, vol. 16(4), pages 1-19, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tigges, Maximilian & Mestwerdt, Sönke & Tschirner, Sebastian & Mauer, René, 2024. "Who gets the money? A qualitative analysis of fintech lending and credit scoring through the adoption of AI and alternative data," Technological Forecasting and Social Change, Elsevier, vol. 205(C).
    2. Fiorella De Fiore & Leonardo Gambacorta & Cristina Manea, 2023. "Big techs and the credit channel of monetary policy," BIS Working Papers 1088, Bank for International Settlements.
    3. A?da Kammoun & Imen Triki, 2016. "Credit Scoring Models for a Tunisian Microfinance Institution: Comparison between Artificial Neural Network and Logistic Regression," Review of Economics & Finance, Better Advances Press, Canada, vol. 6, pages 61-78, February.
    4. Husam Rjoub & Tomiwa Sunday Adebayo & Dervis Kirikkaleli, 2023. "Blockchain technology-based FinTech banking sector involvement using adaptive neuro-fuzzy-based K-nearest neighbors algorithm," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 9(1), pages 1-23, December.
    5. Nartey Menzo, Benjamin Prince & Mogre, Diana & Asuamah Yeboah, Samuel, 2024. "Beyond Income: The Complexities of Credit Risk in Developing Countries," MPRA Paper 122364, University Library of Munich, Germany, revised 20 Sep 2024.
    6. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    7. Singh, Ramendra Pratap & Singh, Ramendra & Mishra, Prashant, 2021. "Does managing customer accounts receivable impact customer relationships, and sales performance? An empirical investigation," Journal of Retailing and Consumer Services, Elsevier, vol. 60(C).
    8. Ha-Thu Nguyen, 2015. "How is credit scoring used to predict default in China?," EconomiX Working Papers 2015-1, University of Paris Nanterre, EconomiX.
    9. Ha-Thu Nguyen, 2014. "Default Predictors in Credit Scoring - Evidence from France’s Retail Banking Institution," EconomiX Working Papers 2014-26, University of Paris Nanterre, EconomiX.
    10. Jackelyn Hwang & Elizabeth Kneebone & Vasudha Kumar, 2023. "Recent Findings on Residential Instability in Oakland," Community Development Research Brief, Federal Reserve Bank of San Francisco, vol. 2023(02), pages 1-33, February.
    11. Huang, Yiping & Li, Zhenhua & Qiu, Han & Tao, Sun & Wang, Xue & Zhang, Longmei, 2023. "BigTech credit risk assessment for SMEs," China Economic Review, Elsevier, vol. 81(C).
    12. Narayanamurthy, Gopalakrishnan & Jayanth, R Sai Shiva & Moser, Roger & Schaefers, Tobias & Nagendra, Narayan Prasad, 2025. "Data-driven digital transformation for uncertainty reduction – Application of satellite imagery analytics in institutional crop credit management," International Journal of Production Economics, Elsevier, vol. 280(C).
    13. Tanja Verster & Erika Fourie, 2023. "The Changing Landscape of Financial Credit Risk Models," IJFS, MDPI, vol. 11(3), pages 1-15, August.
    14. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    15. Andrés Alonso Robisco & José Manuel Carbó Martínez, 2022. "Measuring the model risk-adjusted performance of machine learning algorithms in credit default prediction," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-35, December.
    16. Rais Ahmad Itoo & A. Selvarasu & José António Filipe, 2015. "Loan Products and Credit Scoring by Commercial Banks (India)," International Journal of Finance, Insurance and Risk Management, International Journal of Finance, Insurance and Risk Management, vol. 5(1), pages 851-851.
    17. Bátiz-Zuk Enrique & Mohamed Abdulkadir & Sánchez-Cajal Fátima, 2021. "Exploring the sources of loan default clustering using survival analysis with frailty," Working Papers 2021-14, Banco de México.
    18. Galina A. Timofeeva & Yana A. Bozhalkina, 2018. "Dependence of a Loan Portfolio Structure on a Cut-Off Level in a Scoring Model," Journal of New Economy, Ural State University of Economics, vol. 19(2), pages 24-35, April.
    19. Jairaj Gupta & Nicholas Wilson & Andros Gregoriou & Jerome Healy, 2014. "The value of operating cash flow in modelling credit risk for SMEs," Applied Financial Economics, Taylor & Francis Journals, vol. 24(9), pages 649-660, May.
    20. Li, Zhiyong & Li, Aimin & Bellotti, Anthony & Yao, Xiao, 2023. "The profitability of online loans: A competing risks analysis on default and prepayment," European Journal of Operational Research, Elsevier, vol. 306(2), pages 968-985.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0316454. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.