IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0308718.html
   My bibliography  Save this article

A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards

Author

Listed:
  • Rivalani Hlongwane
  • Kutlwano Ramabao
  • Wilson Mongwe

Abstract

Credit scorecards are essential tools for banks to assess the creditworthiness of loan applicants. While advanced machine learning models like XGBoost and random forest often outperform traditional logistic regression in predictive accuracy, their lack of interpretability hinders their adoption in practice. This study bridges the gap between research and practice by developing a novel framework for constructing interpretable credit scorecards using Shapley values. We apply this framework to two credit datasets, discretizing numerical variables and utilizing one-hot encoding to facilitate model development. Shapley values are then employed to derive credit scores for each predictor variable group in XGBoost, random forest, LightGBM, and CatBoost models. Our results demonstrate that this approach yields credit scorecards with interpretability comparable to logistic regression while maintaining superior predictive accuracy. This framework offers a practical and effective solution for credit practitioners seeking to leverage the power of advanced models without sacrificing transparency and regulatory compliance.

Suggested Citation

  • Rivalani Hlongwane & Kutlwano Ramabao & Wilson Mongwe, 2024. "A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards," PLOS ONE, Public Library of Science, vol. 19(8), pages 1-20, August.
  • Handle: RePEc:plo:pone00:0308718
    DOI: 10.1371/journal.pone.0308718
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0308718
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0308718&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0308718?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    2. Niklas Bussmann & Paolo Giudici & Dimitri Marinelli & Jochen Papenbrock, 2021. "Explainable Machine Learning in Credit Risk Management," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 203-216, January.
    3. Philippe Bracke & Anupam Datta & Carsten Jung & Shayak Sen, 2019. "Machine learning explainability in finance: an application to default risk analysis," Bank of England working papers 816, Bank of England.
    4. Petr Gurný & Martin Gurný, 2013. "Comparison of Credit Scoring Models on Probability of Default Estimation for Us Banks," Prague Economic Papers, Prague University of Economics and Business, vol. 2013(2), pages 163-181.
    5. Winter, Eyal, 2002. "The shapley value," Handbook of Game Theory with Economic Applications, in: R.J. Aumann & S. Hart (ed.), Handbook of Game Theory with Economic Applications, edition 1, volume 3, chapter 53, pages 2025-2054, Elsevier.
    6. Arturs Kalnins, 2018. "Multicollinearity: How common factors cause Type 1 errors in multivariate regression," Strategic Management Journal, Wiley Blackwell, vol. 39(8), pages 2362-2385, August.
    7. Eliana Costa e Silva & Isabel Cristina Lopes & Aldina Correia & Susana Faria, 2020. "A logistic regression model for consumer default risk," Journal of Applied Statistics, Taylor & Francis Journals, vol. 47(13-15), pages 2879-2894, November.
    8. Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
    9. Lkhagvadorj Munkhdalai & Tsendsuren Munkhdalai & Oyun-Erdene Namsrai & Jong Yun Lee & Keun Ho Ryu, 2019. "An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments," Sustainability, MDPI, vol. 11(3), pages 1-23, January.
    10. Anna Cierniak-Emerych & Ewa Mazur-Wierzbicka & Magdalena Rojek-Nowosielska, 2021. "Corporate Social Responsibility in Poland," CSR, Sustainability, Ethics & Governance, in: Samuel O. Idowu (ed.), Current Global Practices of Corporate Social Responsibility, edition 1, pages 287-310, Springer.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nadia Ayed & Khemaies Bougatef, 2024. "Performance Assessment of Logistic Regression (LR), Artificial Neural Network (ANN), Fuzzy Inference System (FIS) and Adaptive Neuro-Fuzzy System (ANFIS) in Predicting Default Probability: The Case of," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1803-1835, September.
    2. Xia Li & Hanghang Zheng & Kunpeng Tao & Mao Mao, 2025. "Implementation of an Asymmetric Adjusted Activation Function for Class Imbalance Credit Scoring," Papers 2501.12285, arXiv.org.
    3. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    4. Juan Laborda & Seyong Ryoo, 2021. "Feature Selection in a Credit Scoring Model," Mathematics, MDPI, vol. 9(7), pages 1-22, March.
    5. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    6. Yu Xia & Ta Xu & Ming-Xia Wei & Zhen-Ke Wei & Lian-Jie Tang, 2023. "Predicting Chain’s Manufacturing SME Credit Risk in Supply Chain Finance Based on Machine Learning Methods," Sustainability, MDPI, vol. 15(2), pages 1-18, January.
    7. Lobna Abid & Afif Masmoudi & Sonia Zouari-Ghorbel, 2018. "The Consumer Loan’s Payment Default Predictive Model: an Application of the Logistic Regression and the Discriminant Analysis in a Tunisian Commercial Bank," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 9(3), pages 948-962, September.
    8. Kim Long Tran & Hoang Anh Le & Thanh Hien Nguyen & Duc Trung Nguyen, 2022. "Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam," Data, MDPI, vol. 7(11), pages 1-12, November.
    9. Liu, Fan & Hua, Zhongsheng & Lim, Andrew, 2015. "Identifying future defaulters: A hierarchical Bayesian method," European Journal of Operational Research, Elsevier, vol. 241(1), pages 202-211.
    10. Chen, Dangxing & Ye, Jiahui & Ye, Weicheng, 2023. "Interpretable selective learning in credit risk," Research in International Business and Finance, Elsevier, vol. 65(C).
    11. Fang, Fang & Chen, Yuanyuan, 2019. "A new approach for credit scoring by directly maximizing the Kolmogorov–Smirnov statistic," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 180-194.
    12. Naveed Chehrazi & Thomas A. Weber, 2015. "Dynamic Valuation of Delinquent Credit-Card Accounts," Management Science, INFORMS, vol. 61(12), pages 3077-3096, December.
    13. Martin Řezáč, 2011. "Advanced empirical estimate of information value for credit scoring models," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 59(2), pages 267-274.
    14. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    15. José Willer Prado & Valderí Castro Alcântara & Francisval Melo Carvalho & Kelly Carvalho Vieira & Luiz Kennedy Cruz Machado & Dany Flávio Tonelli, 2016. "Multivariate analysis of credit risk and bankruptcy research data: a bibliometric study involving different knowledge fields (1968–2014)," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 1007-1029, March.
    16. Sunghyon Kyeong & Daehee Kim & Jinho Shin, 2021. "Can System Log Data Enhance the Performance of Credit Scoring?—Evidence from an Internet Bank in Korea," Sustainability, MDPI, vol. 14(1), pages 1-12, December.
    17. Xiufang Li & Zhiwang Zhang & Lingyun Li & Hui Pan, 2024. "Combining Feature Selection and Classification Using LASSO-Based MCO Classifier for Credit Risk Evaluation," Computational Economics, Springer;Society for Computational Economics, vol. 64(5), pages 2641-2662, November.
    18. Martin Rezac & Frantisek Rezac, 2011. "How to Measure the Quality of Credit Scoring Models," Czech Journal of Economics and Finance (Finance a uver), Charles University Prague, Faculty of Social Sciences, vol. 61(5), pages 486-507, November.
    19. Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.
    20. Arno Botha & Conrad Beyers & Pieter de Villiers, 2019. "A procedure for loss-optimising default definitions across simulated credit risk scenarios," Papers 1907.12615, arXiv.org, revised Feb 2021.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0308718. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.