IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0308718.html
   My bibliography  Save this article

A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards

Author

Listed:
  • Rivalani Hlongwane
  • Kutlwano Ramabao
  • Wilson Mongwe

Abstract

Credit scorecards are essential tools for banks to assess the creditworthiness of loan applicants. While advanced machine learning models like XGBoost and random forest often outperform traditional logistic regression in predictive accuracy, their lack of interpretability hinders their adoption in practice. This study bridges the gap between research and practice by developing a novel framework for constructing interpretable credit scorecards using Shapley values. We apply this framework to two credit datasets, discretizing numerical variables and utilizing one-hot encoding to facilitate model development. Shapley values are then employed to derive credit scores for each predictor variable group in XGBoost, random forest, LightGBM, and CatBoost models. Our results demonstrate that this approach yields credit scorecards with interpretability comparable to logistic regression while maintaining superior predictive accuracy. This framework offers a practical and effective solution for credit practitioners seeking to leverage the power of advanced models without sacrificing transparency and regulatory compliance.

Suggested Citation

  • Rivalani Hlongwane & Kutlwano Ramabao & Wilson Mongwe, 2024. "A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards," PLOS ONE, Public Library of Science, vol. 19(8), pages 1-20, August.
  • Handle: RePEc:plo:pone00:0308718
    DOI: 10.1371/journal.pone.0308718
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0308718
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0308718&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0308718?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    2. Bracke, Philippe & Datta, Anupam & Jung, Carsten & Sen, Shayak, 2019. "Machine learning explainability in finance: an application to default risk analysis," Bank of England working papers 816, Bank of England.
    3. Lkhagvadorj Munkhdalai & Tsendsuren Munkhdalai & Oyun-Erdene Namsrai & Jong Yun Lee & Keun Ho Ryu, 2019. "An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments," Sustainability, MDPI, vol. 11(3), pages 1-23, January.
    4. Niklas Bussmann & Paolo Giudici & Dimitri Marinelli & Jochen Papenbrock, 2021. "Explainable Machine Learning in Credit Risk Management," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 203-216, January.
    5. Petr Gurný & Martin Gurný, 2013. "Comparison of Credit Scoring Models on Probability of Default Estimation for Us Banks," Prague Economic Papers, Prague University of Economics and Business, vol. 2013(2), pages 163-181.
    6. Anna Cierniak-Emerych & Ewa Mazur-Wierzbicka & Magdalena Rojek-Nowosielska, 2021. "Corporate Social Responsibility in Poland," CSR, Sustainability, Ethics & Governance, in: Samuel O. Idowu (ed.), Current Global Practices of Corporate Social Responsibility, edition 1, pages 287-310, Springer.
    7. Arturs Kalnins, 2018. "Multicollinearity: How common factors cause Type 1 errors in multivariate regression," Strategic Management Journal, Wiley Blackwell, vol. 39(8), pages 2362-2385, August.
    8. Eliana Costa e Silva & Isabel Cristina Lopes & Aldina Correia & Susana Faria, 2020. "A logistic regression model for consumer default risk," Journal of Applied Statistics, Taylor & Francis Journals, vol. 47(13-15), pages 2879-2894, November.
    9. Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nadia Ayed & Khemaies Bougatef, 2024. "Performance Assessment of Logistic Regression (LR), Artificial Neural Network (ANN), Fuzzy Inference System (FIS) and Adaptive Neuro-Fuzzy System (ANFIS) in Predicting Default Probability: The Case of," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1803-1835, September.
    2. Xia Li & Hanghang Zheng & Kunpeng Tao & Mao Mao, 2025. "Implementation of an Asymmetric Adjusted Activation Function for Class Imbalance Credit Scoring," Papers 2501.12285, arXiv.org.
    3. Jonathan K. Budd & Peter G. Taylor, 2015. "Calculating optimal limits for transacting credit card customers," Papers 1506.05376, arXiv.org, revised Aug 2015.
    4. Rasa Kanapickiene & Renatas Spicas, 2019. "Credit Risk Assessment Model for Small and Micro-Enterprises: The Case of Lithuania," Risks, MDPI, vol. 7(2), pages 1-23, June.
    5. K Rajaratnam & P Beling & G Overstreet, 2010. "Scoring decisions in the context of economic uncertainty," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(3), pages 421-429, March.
    6. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    7. Trivedi, Shrawan Kumar, 2020. "A study on credit scoring modeling with different feature selection and machine learning approaches," Technology in Society, Elsevier, vol. 63(C).
    8. Juan Laborda & Seyong Ryoo, 2021. "Feature Selection in a Credit Scoring Model," Mathematics, MDPI, vol. 9(7), pages 1-22, March.
    9. Lu, Xuefei & Calabrese, Raffaella, 2023. "The Cohort Shapley value to measure fairness in financing small and medium enterprises in the UK," Finance Research Letters, Elsevier, vol. 58(PC).
    10. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    11. Yu Xia & Ta Xu & Ming-Xia Wei & Zhen-Ke Wei & Lian-Jie Tang, 2023. "Predicting Chain’s Manufacturing SME Credit Risk in Supply Chain Finance Based on Machine Learning Methods," Sustainability, MDPI, vol. 15(2), pages 1-18, January.
    12. Lobna Abid & Afif Masmoudi & Sonia Zouari-Ghorbel, 2018. "The Consumer Loan’s Payment Default Predictive Model: an Application of the Logistic Regression and the Discriminant Analysis in a Tunisian Commercial Bank," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 9(3), pages 948-962, September.
    13. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    14. Kim Long Tran & Hoang Anh Le & Thanh Hien Nguyen & Duc Trung Nguyen, 2022. "Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam," Data, MDPI, vol. 7(11), pages 1-12, November.
    15. Linhui Wang & Jianping Zhu & Chenlu Zheng & Zhiyuan Zhang, 2024. "Incorporating Digital Footprints into Credit-Scoring Models through Model Averaging," Mathematics, MDPI, vol. 12(18), pages 1-15, September.
    16. Arno Botha & Conrad Beyers & Pieter de Villiers, 2020. "Simulation-based optimisation of the timing of loan recovery across different portfolios," Papers 2009.11064, arXiv.org, revised Apr 2021.
    17. Babaei, Golnoosh & Giudici, Paolo & Raffinetti, Emanuela, 2023. "Explainable FinTech lending," Journal of Economics and Business, Elsevier, vol. 125.
    18. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    19. Richard Chamboko & Jorge Miguel Bravo, 2020. "A Multi-State Approach to Modelling Intermediate Events and Multiple Mortgage Loan Outcomes," Risks, MDPI, vol. 8(2), pages 1-29, June.
    20. Chen, Yujia & Calabrese, Raffaella & Martin-Barragan, Belen, 2024. "Interpretable machine learning for imbalanced credit scoring datasets," European Journal of Operational Research, Elsevier, vol. 312(1), pages 357-372.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0308718. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.