IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v8y2023i11p169-d1275568.html
   My bibliography  Save this article

Machine Learning for Credit Risk Prediction: A Systematic Literature Review

Author

Listed:
  • Jomark Pablo Noriega

    (Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
    Financiera QAPAQ, Lima 150120, Peru
    These authors contributed equally to this work.)

  • Luis Antonio Rivera

    (Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
    Centro de Ciências Exatas e Tecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes 28013-602, Brazil
    These authors contributed equally to this work.)

  • José Alfredo Herrera

    (Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
    Programme in Biotechnology, Engineering and Chemical Technology, Universidad Pablo de Olavide, 41013 Sevilla, Spain
    These authors contributed equally to this work.)

Abstract

In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite.

Suggested Citation

  • Jomark Pablo Noriega & Luis Antonio Rivera & José Alfredo Herrera, 2023. "Machine Learning for Credit Risk Prediction: A Systematic Literature Review," Data, MDPI, vol. 8(11), pages 1-17, November.
  • Handle: RePEc:gam:jdataj:v:8:y:2023:i:11:p:169-:d:1275568
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/8/11/169/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/8/11/169/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    2. Gianfranco Lombardo & Mattia Pellegrino & George Adosoglou & Stefano Cagnoni & Panos M. Pardalos & Agostino Poggi, 2022. "Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks," Future Internet, MDPI, vol. 14(8), pages 1-23, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Brian Daniel Bernhardt & Chiara Marciano & Mario Rosario Guarracino, 2025. "The Impact of Alternative Data on Default Probability: Analyzing the Italian E-commerce Sector with NLP and Network Structures," SN Operations Research Forum, Springer, vol. 6(2), pages 1-30, June.
    2. Jomark Noriega & Luis Rivera & Jorge Castañeda & José Herrera, 2025. "From Crisis to Algorithm: Credit Delinquency Prediction in Peru Under Critical External Factors Using Machine Learning," Data, MDPI, vol. 10(5), pages 1-53, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jomark Noriega & Luis Rivera & Jorge Castañeda & José Herrera, 2025. "From Crisis to Algorithm: Credit Delinquency Prediction in Peru Under Critical External Factors Using Machine Learning," Data, MDPI, vol. 10(5), pages 1-53, April.
    2. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    3. Al-Amin Abba Dabo & Amin Hosseinian-Far, 2023. "An Integrated Methodology for Enhancing Reverse Logistics Flows and Networks in Industry 5.0," Logistics, MDPI, vol. 7(4), pages 1-26, December.
    4. Yusheng Li & Mengyi Sha, 2024. "Two‐stage credit risk prediction framework based on three‐way decisions with automatic threshold learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(5), pages 1263-1277, August.
    5. Simone Narizzano & Marco Orlandi & Antonio Scalia, 2024. "The Bank of Italy’s statistical model for the credit assessment of non-financial firms," Mercati, infrastrutture, sistemi di pagamento (Markets, Infrastructures, Payment Systems) 53, Bank of Italy, Directorate General for Markets and Payment System.
    6. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    7. Nadia Ayed & Khemaies Bougatef, 2024. "Performance Assessment of Logistic Regression (LR), Artificial Neural Network (ANN), Fuzzy Inference System (FIS) and Adaptive Neuro-Fuzzy System (ANFIS) in Predicting Default Probability: The Case of," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1803-1835, September.
    8. John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1-15, March.
    9. Yang Liu & Fei Huang & Lili Ma & Qingguo Zeng & Jiale Shi, 2024. "Credit scoring prediction leveraging interpretable ensemble learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(2), pages 286-308, March.
    10. Shi, Yong & Qu, Yi & Chen, Zhensong & Mi, Yunlong & Wang, Yunong, 2024. "Improved credit risk prediction based on an integrated graph representation learning approach with graph transformation," European Journal of Operational Research, Elsevier, vol. 315(2), pages 786-801.
    11. Li, Zhe & Liang, Shuguang & Pan, Xianyou & Pang, Meng, 2024. "Credit risk prediction based on loan profit: Evidence from Chinese SMEs," Research in International Business and Finance, Elsevier, vol. 67(PA).
    12. Kazim Topuz & Akhilesh Bajaj & Kristof Coussement & Timothy L. Urban, 2025. "Interpretable machine learning and explainable artificial intelligence," Annals of Operations Research, Springer, vol. 347(2), pages 775-782, April.
    13. Emmanuel Flachaire & Sullivan Hué & Sébastien Laurent & Gilles Hacheme, 2024. "Interpretable Machine Learning Using Partial Linear Models," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 86(3), pages 519-540, June.
    14. Li, Zhiyong & Li, Aimin & Bellotti, Anthony & Yao, Xiao, 2023. "The profitability of online loans: A competing risks analysis on default and prepayment," European Journal of Operational Research, Elsevier, vol. 306(2), pages 968-985.
    15. Przemyslaw Ruta & Joanna Kubicka & Yurii Vitkovskyi & Marcin Budzinski & Magdalena Dobrzańska-Rzepecka, 2024. "Preparing Polish Micro-Enterprises for the Loss of Liquidity," European Research Studies Journal, European Research Studies Journal, vol. 0(Special B), pages 3-16.
    16. Kellner, Ralf & Nagl, Maximilian & Rösch, Daniel, 2022. "Opening the black box – Quantile neural networks for loss given default prediction," Journal of Banking & Finance, Elsevier, vol. 134(C).
    17. Dangxing Chen & Weicheng Ye, 2022. "Generalized Groves of Neural Additive Models: Pursuing transparent and accurate machine learning models in finance," Papers 2209.10082, arXiv.org, revised Jul 2024.
    18. Lorena Espina-Romero & José Gregorio Noroño Sánchez & Humberto Gutiérrez Hurtado & Helga Dworaczek Conde & Yessenia Solier Castro & Luz Emérita Cervera Cajo & Jose Rio Corredoira, 2023. "Which Industrial Sectors Are Affected by Artificial Intelligence? A Bibliometric Analysis of Trends and Perspectives," Sustainability, MDPI, vol. 15(16), pages 1-18, August.
    19. Oyebayo Ridwan Olaniran & Ali Rashash R. Alzahrani & Nada MohammedSaeed Alharbi & Asma Ahmad Alzahrani, 2025. "Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification," Mathematics, MDPI, vol. 13(7), pages 1-25, April.
    20. Zhou, Ying & Shen, Long & Ballester, Laura, 2023. "A two-stage credit scoring model based on random forest: Evidence from Chinese small firms," International Review of Financial Analysis, Elsevier, vol. 89(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:8:y:2023:i:11:p:169-:d:1275568. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.