IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v8y2023i11p169-d1275568.html
   My bibliography  Save this article

Machine Learning for Credit Risk Prediction: A Systematic Literature Review

Author

Listed:
  • Jomark Pablo Noriega

    (Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
    Financiera QAPAQ, Lima 150120, Peru
    These authors contributed equally to this work.)

  • Luis Antonio Rivera

    (Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
    Centro de Ciências Exatas e Tecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes 28013-602, Brazil
    These authors contributed equally to this work.)

  • José Alfredo Herrera

    (Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
    Programme in Biotechnology, Engineering and Chemical Technology, Universidad Pablo de Olavide, 41013 Sevilla, Spain
    These authors contributed equally to this work.)

Abstract

In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite.

Suggested Citation

  • Jomark Pablo Noriega & Luis Antonio Rivera & José Alfredo Herrera, 2023. "Machine Learning for Credit Risk Prediction: A Systematic Literature Review," Data, MDPI, vol. 8(11), pages 1-17, November.
  • Handle: RePEc:gam:jdataj:v:8:y:2023:i:11:p:169-:d:1275568
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/8/11/169/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/8/11/169/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    2. Gianfranco Lombardo & Mattia Pellegrino & George Adosoglou & Stefano Cagnoni & Panos M. Pardalos & Agostino Poggi, 2022. "Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks," Future Internet, MDPI, vol. 14(8), pages 1-23, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    2. Dangxing Chen & Luyao Zhang, 2023. "Monotonicity for AI ethics and society: An empirical study of the monotonic neural additive model in criminology, education, health care, and finance," Papers 2301.07060, arXiv.org.
    3. Sun, Weixin & Zhang, Xuantao & Li, Minghao & Wang, Yong, 2023. "Interpretable high-stakes decision support system for credit default forecasting," Technological Forecasting and Social Change, Elsevier, vol. 196(C).
    4. Al-Amin Abba Dabo & Amin Hosseinian-Far, 2023. "An Integrated Methodology for Enhancing Reverse Logistics Flows and Networks in Industry 5.0," Logistics, MDPI, vol. 7(4), pages 1-26, December.
    5. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    6. Ana Lorena Jiménez-Preciado & Francisco Venegas-Martínez & Abraham Ramírez-García, 2022. "Stock Portfolio Optimization with Competitive Advantages (MOAT): A Machine Learning Approach," Mathematics, MDPI, vol. 10(23), pages 1-16, November.
    7. John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1, March.
    8. Yang Liu & Fei Huang & Lili Ma & Qingguo Zeng & Jiale Shi, 2024. "Credit scoring prediction leveraging interpretable ensemble learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(2), pages 286-308, March.
    9. Li, Zhiyong & Li, Aimin & Bellotti, Anthony & Yao, Xiao, 2023. "The profitability of online loans: A competing risks analysis on default and prepayment," European Journal of Operational Research, Elsevier, vol. 306(2), pages 968-985.
    10. Dangxing Chen, 2022. "Two-stage Modeling for Prediction with Confidence," Papers 2209.08848, arXiv.org.
    11. Kellner, Ralf & Nagl, Maximilian & Rösch, Daniel, 2022. "Opening the black box – Quantile neural networks for loss given default prediction," Journal of Banking & Finance, Elsevier, vol. 134(C).
    12. Dangxing Chen & Weicheng Ye, 2022. "Generalized Gloves of Neural Additive Models: Pursuing transparent and accurate machine learning models in finance," Papers 2209.10082, arXiv.org.
    13. Lorena Espina-Romero & José Gregorio Noroño Sánchez & Humberto Gutiérrez Hurtado & Helga Dworaczek Conde & Yessenia Solier Castro & Luz Emérita Cervera Cajo & Jose Rio Corredoira, 2023. "Which Industrial Sectors Are Affected by Artificial Intelligence? A Bibliometric Analysis of Trends and Perspectives," Sustainability, MDPI, vol. 15(16), pages 1-18, August.
    14. Dangxing Chen & Weicheng Ye, 2022. "Monotonic Neural Additive Models: Pursuing Regulated Machine Learning Models for Credit Scoring," Papers 2209.10070, arXiv.org.
    15. Zhou, Ying & Shen, Long & Ballester, Laura, 2023. "A two-stage credit scoring model based on random forest: Evidence from Chinese small firms," International Review of Financial Analysis, Elsevier, vol. 89(C).
    16. Chen, Yujia & Calabrese, Raffaella & Martin-Barragan, Belen, 2024. "Interpretable machine learning for imbalanced credit scoring datasets," European Journal of Operational Research, Elsevier, vol. 312(1), pages 357-372.
    17. Chen, Dangxing & Ye, Jiahui & Ye, Weicheng, 2023. "Interpretable selective learning in credit risk," Research in International Business and Finance, Elsevier, vol. 65(C).
    18. Ahmad El Majzoub & Fethi A. Rabhi & Walayat Hussain, 2023. "Evaluating interpretable machine learning predictions for cryptocurrencies," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 30(3), pages 137-149, July.
    19. Sullivan Hué, 2022. "GAM(L)A: An econometric model for interpretable machine learning," French Stata Users' Group Meetings 2022 19, Stata Users Group.
    20. Katsafados, Apostolos G. & Leledakis, George N. & Pyrgiotakis, Emmanouil G. & Androutsopoulos, Ion & Fergadiotis, Manos, 2024. "Machine learning in bank merger prediction: A text-based approach," European Journal of Operational Research, Elsevier, vol. 312(2), pages 783-797.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:8:y:2023:i:11:p:169-:d:1275568. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.