IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2008.01687.html
   My bibliography  Save this paper

Machine Learning approach for Credit Scoring

Author

Listed:
  • A. R. Provenzano
  • D. Trifir`o
  • A. Datteo
  • L. Giada
  • N. Jean
  • A. Riciputi
  • G. Le Pera
  • M. Spadaccino
  • L. Massaron
  • C. Nordio

Abstract

In this work we build a stack of machine learning models aimed at composing a state-of-the-art credit rating and default prediction system, obtaining excellent out-of-sample performances. Our approach is an excursion through the most recent ML / AI concepts, starting from natural language processes (NLP) applied to economic sectors' (textual) descriptions using embedding and autoencoders (AE), going through the classification of defaultable firms on the base of a wide range of economic features using gradient boosting machines (GBM) and calibrating their probabilities paying due attention to the treatment of unbalanced samples. Finally we assign credit ratings through genetic algorithms (differential evolution, DE). Model interpretability is achieved by implementing recent techniques such as SHAP and LIME, which explain predictions locally in features' space.

Suggested Citation

  • A. R. Provenzano & D. Trifir`o & A. Datteo & L. Giada & N. Jean & A. Riciputi & G. Le Pera & M. Spadaccino & L. Massaron & C. Nordio, 2020. "Machine Learning approach for Credit Scoring," Papers 2008.01687, arXiv.org.
  • Handle: RePEc:arx:papers:2008.01687
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2008.01687
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Mizen, Paul & Tsoukas, Serafeim, 2012. "Forecasting US bond default ratings allowing for previous and initial state dependence in an ordered probit model," International Journal of Forecasting, Elsevier, vol. 28(1), pages 273-287.
    2. Peter Martey Addo & Dominique Guégan & Bertrand Hassani, 2018. "Credit Risk Analysis using Machine and Deep learning models," Documents de travail du Centre d'Economie de la Sorbonne 18003, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    3. Dominique Guegan & Peter Martey Addo & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-01835164, HAL.
    4. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Are post-crisis statistical initiatives completed?, volume 49, Bank for International Settlements.
    5. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan-level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), The use of big data analytics and artificial intelligence in central banking, volume 50, Bank for International Settlements.
    6. Dominique Guegan, 2018. "Credit Risk Analysis Using machine and Deep Learning Models," Post-Print halshs-01889154, HAL.
    7. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis using Machine and Deep learning models," Working Papers 2018:08, Department of Economics, University of Venice "Ca' Foscari".
    8. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis using Machine and Deep Learning models," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-01719983, HAL.
    9. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    10. Dirk Tasche, 2003. "A traffic lights approach to PD validation," Papers cond-mat/0305038, arXiv.org.
    11. Petr Gurný & Martin Gurný, 2013. "Comparison of Credit Scoring Models on Probability of Default Estimation for Us Banks," Prague Economic Papers, Prague University of Economics and Business, vol. 2013(2), pages 163-181.
    12. Dominique Guegan & Peter Martey Addo & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Post-Print halshs-01835164, HAL.
    13. Dominique Guegan, 2018. "Credit Risk Analysis Using machine and Deep Learning Models," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-01889154, HAL.
    14. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis using Machine and Deep Learning models," Post-Print halshs-01719983, HAL.
    15. Angela Rita Provenzano & Daniele Trifir`o & Nicola Jean & Giacomo Le Pera & Maurizio Spadaccino & Luca Massaron & Claudio Nordio, 2019. "An Artificial Intelligence approach to Shadow Rating," Papers 1912.09764, arXiv.org.
    16. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Risks, MDPI, vol. 6(2), pages 1-20, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Apostolos Ampountolas & Titus Nyarko Nde & Paresh Date & Corina Constantinescu, 2021. "A Machine Learning Approach for Micro-Credit Scoring," Risks, MDPI, vol. 9(3), pages 1-20, March.
    2. Bojing Feng & Wenfang Xue & Bindang Xue & Zeyu Liu, 2020. "Every Corporation Owns Its Image: Corporate Credit Ratings via Convolutional Neural Networks," Papers 2012.03744, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Apostolos Ampountolas & Titus Nyarko Nde & Paresh Date & Corina Constantinescu, 2021. "A Machine Learning Approach for Micro-Credit Scoring," Risks, MDPI, vol. 9(3), pages 1-20, March.
    2. Roy Cerqueti & Francesca Pampurini & Annagiulia Pezzola & Anna Grazia Quaranta, 2022. "Dangerous liasons and hot customers for banks," Review of Quantitative Finance and Accounting, Springer, vol. 59(1), pages 65-89, July.
    3. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Are post-crisis statistical initiatives completed?, volume 49, Bank for International Settlements.
    4. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan-level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), The use of big data analytics and artificial intelligence in central banking, volume 50, Bank for International Settlements.
    5. Irving Fisher Committee, 2019. "The use of big data analytics and artificial intelligence in central banking," IFC Bulletins, Bank for International Settlements, number 50.
    6. Dan Wang & Zhi Chen & Ionut Florescu, 2021. "A Sparsity Algorithm with Applications to Corporate Credit Rating," Papers 2107.10306, arXiv.org.
    7. Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
    8. Theuri, Joseph & Olukuru, John, 2022. "The impact of Artficial Intelligence and how it is shaping banking," KBA Centre for Research on Financial Markets and Policy Working Paper Series 61, Kenya Bankers Association (KBA).
    9. José Américo Pereira Antunes, 2021. "To supervise or to self-supervise: a machine learning based comparison on credit supervision," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 7(1), pages 1-21, December.
    10. Keerthana Sivamayil & Elakkiya Rajasekar & Belqasem Aljafari & Srete Nikolovski & Subramaniyaswamy Vairavasundaram & Indragandhi Vairavasundaram, 2023. "A Systematic Study on Reinforcement Learning Based Applications," Energies, MDPI, vol. 16(3), pages 1-23, February.
    11. Amirhosein Mosavi & Yaser Faghan & Pedram Ghamisi & Puhong Duan & Sina Faizollahzadeh Ardabili & Ely Salwana & Shahab S. Band, 2020. "Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics," Mathematics, MDPI, vol. 8(10), pages 1-42, September.
    12. Salima Smiti & Makram Soui, 2020. "Bankruptcy Prediction Using Deep Learning Approach Based on Borderline SMOTE," Information Systems Frontiers, Springer, vol. 22(5), pages 1067-1083, October.
    13. Seyyide Doğan & Yasin Büyükkör & Murat Atan, 2022. "A comparative study of corporate credit ratings prediction with machine learning," Operations Research and Decisions, Wroclaw University of Science and Technology, Faculty of Management, vol. 32(1), pages 25-47.
    14. Martin Leo & Suneel Sharma & K. Maddulety, 2019. "Machine Learning in Banking Risk Management: A Literature Review," Risks, MDPI, vol. 7(1), pages 1-22, March.
    15. Nenad Milojević & Srdjan Redzepagic, 2021. "Prospects of Artificial Intelligence and Machine Learning Application in Banking Risk Management," Journal of Central Banking Theory and Practice, Central bank of Montenegro, vol. 10(3), pages 41-57.
    16. Hossein Hassani & Xu Huang & Emmanuel Silva & Mansi Ghodsi, 2020. "Deep Learning and Implementations in Banking," Annals of Data Science, Springer, vol. 7(3), pages 433-446, September.
    17. Kim, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2020. "Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting," European Journal of Operational Research, Elsevier, vol. 283(1), pages 217-234.
    18. Yaseen Ghulam & Kamini Dhruva & Sana Naseem & Sophie Hill, 2018. "The Interaction of Borrower and Loan Characteristics in Predicting Risks of Subprime Automobile Loans," Risks, MDPI, vol. 6(3), pages 1-21, September.
    19. Li-Chen Cheng & Wei-Ting Lu & Benjamin Yeo, 2023. "Predicting abnormal trading behavior from internet rumor propagation: a machine learning approach," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 9(1), pages 1-23, December.
    20. Roman P. Bulyga & Alexey A. Sitnov & Liudmila V. Kashirskaya & Irina V. Safonova, 2020. "Transparency of credit institutions," Entrepreneurship and Sustainability Issues, VsI Entrepreneurship and Sustainability Center, vol. 7(4), pages 3158-3172, June.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2008.01687. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.