IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v295y2021i2p758-771.html
   My bibliography  Save this article

The value of text for small business default prediction: A Deep Learning approach

Author

Listed:
  • Stevenson, Matthew
  • Mues, Christophe
  • Bravo, Cristián

Abstract

Compared to consumer lending, Micro, Small and Medium Enterprise (mSME) credit risk modelling is particularly challenging, as, often, the same sources of information are not available. Therefore, it is standard policy for a loan officer to provide a textual loan assessment to mitigate limited data availability. In turn, this statement is analysed by a credit expert alongside any available standard credit data. In our paper, we exploit recent advances from the field of Deep Learning and Natural Language Processing (NLP), including the BERT (Bidirectional Encoder Representations from Transformers) model, to extract information from 60,000 textual assessments provided by a lender. We consider the performance in terms of the AUC (Area Under the receiver operating characteristic Curve) and Brier Score metrics and find that the text alone is surprisingly effective for predicting default. However, when combined with traditional data, it yields no additional predictive capability, with performance dependent on the text’s length. Our proposed Deep Learning model does, however, appear to be robust to the quality of the text and therefore suitable for partly automating the mSME lending process. We also demonstrate how the content of loan assessments influences performance, leading us to a series of recommendations on a new strategy for collecting future mSME loan assessments.

Suggested Citation

  • Stevenson, Matthew & Mues, Christophe & Bravo, Cristián, 2021. "The value of text for small business default prediction: A Deep Learning approach," European Journal of Operational Research, Elsevier, vol. 295(2), pages 758-771.
  • Handle: RePEc:eee:ejores:v:295:y:2021:i:2:p:758-771
    DOI: 10.1016/j.ejor.2021.03.008
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221721001983
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2021.03.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lee, Neil & Sameen, Hiba & Cowling, Marc, 2015. "Access to finance for innovative SMEs since the financial crisis," Research Policy, Elsevier, vol. 44(2), pages 370-380.
    2. Alexander W. Bartik & Marianne Bertrand & Zoë B. Cullen & Edward L. Glaeser & Michael Luca & Christopher T. Stanton, 2020. "How Are Small Businesses Adjusting to COVID-19? Early Evidence from a Survey," NBER Working Papers 26989, National Bureau of Economic Research, Inc.
    3. Raffaella Calabrese & Silvia Angela Osmetti, 2013. "Modelling small and medium enterprise loan defaults as rare events: the generalized extreme value regression model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 40(6), pages 1172-1188, June.
    4. Cuiqing Jiang & Zhao Wang & Ruiya Wang & Yong Ding, 2018. "Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending," Annals of Operations Research, Springer, vol. 266(1), pages 511-529, July.
    5. Chen, Xiao & Huang, Bihong & Ye, Dezhu, 2018. "The role of punctuation in P2P lending: Evidence from China," Economic Modelling, Elsevier, vol. 68(C), pages 634-643.
    6. Joris Van Gool & Wouter Verbeke & Piet Sercu & Bart Baesens, 2012. "Credit scoring for microfinance: is it worth it?," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 17(2), pages 103-123, April.
    7. Bravo, Cristián & Maldonado, Sebastián & Weber, Richard, 2013. "Granting and managing loans for micro-entrepreneurs: New developments and practical experiences," European Journal of Operational Research, Elsevier, vol. 227(2), pages 358-366.
    8. B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
    9. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    10. Dorfleitner, Gregor & Priberny, Christopher & Schuster, Stephanie & Stoiber, Johannes & Weber, Martina & de Castro, Ivan & Kammler, Julia, 2016. "Description-text related soft information in peer-to-peer lending – Evidence from two leading European platforms," Journal of Banking & Finance, Elsevier, vol. 64(C), pages 169-187.
    11. Agarwal, Arvind & Gupta, Aparna & Kumar, Arun & Tamilselvam, Srikanth G., 2019. "Learning risk culture of banks using news analytics," European Journal of Operational Research, Elsevier, vol. 277(2), pages 770-783.
    12. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    13. Mai, Feng & Tian, Shaonan & Lee, Chihoon & Ma, Ling, 2019. "Deep learning models for bankruptcy prediction using textual disclosures," European Journal of Operational Research, Elsevier, vol. 274(2), pages 743-758.
    14. Zhang, Chaowei & Gupta, Ashish & Kauten, Christian & Deokar, Amit V. & Qin, Xiao, 2019. "Detecting fake news for reducing misinformation risks using analytics approaches," European Journal of Operational Research, Elsevier, vol. 279(3), pages 1036-1052.
    15. Tsai, Ming-Feng & Wang, Chuan-Ju, 2017. "On the risk prediction and analysis of soft information in finance reports," European Journal of Operational Research, Elsevier, vol. 257(1), pages 243-250.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    2. Jiang, Cuiqing & Lyu, Ximei & Yuan, Yufei & Wang, Zhao & Ding, Yong, 2022. "Mining semantic features in current reports for financial distress prediction: Empirical evidence from unlisted public firms in China," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1086-1099.
    3. Yufei Xia & Lingyun He & Yinguo Li & Nana Liu & Yanlin Ding, 2020. "Predicting loan default in peer‐to‐peer lending using narrative data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(2), pages 260-280, March.
    4. Medina-Olivares, Victor & Calabrese, Raffaella & Dong, Yizhe & Shi, Baofeng, 2022. "Spatial dependence in microfinance credit default," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1071-1085.
    5. Wang, Chao & Wang, Junbo & Wu, Chunchi & Zhang, Yue, 2023. "Voluntary disclosure in P2P lending: Information or hyperbole?," Pacific-Basin Finance Journal, Elsevier, vol. 79(C).
    6. Lisa Crosato & Caterina Liberati & Marco Repetto, 2021. "Look Who's Talking: Interpretable Machine Learning for Assessing Italian SMEs Credit Default," Papers 2108.13914, arXiv.org, revised Sep 2021.
    7. Wang, Chao & Zhang, Yue & Zhang, Weiguo & Gong, Xue, 2021. "Textual sentiment of comments and collapse of P2P platforms: Evidence from China's P2P market," Research in International Business and Finance, Elsevier, vol. 58(C).
    8. Wang, Shaoda & Ye, Dezhu & Liao, Junyun, 2024. "Politeness matters: The role of polite languages in online peer-to-peer lending," Journal of Business Research, Elsevier, vol. 171(C).
    9. Suyuan Luo & Tsan-Ming Choi, 2024. "Great partners: how deep learning and blockchain help improve business operations together," Annals of Operations Research, Springer, vol. 339(1), pages 53-78, August.
    10. Christopher Gerling & Stefan Lessmann, 2023. "Multimodal Document Analytics for Banking Process Automation," Papers 2307.11845, arXiv.org, revised Nov 2023.
    11. Mahsa Tavakoli & Rohitash Chandra & Fengrui Tian & Cristi'an Bravo, 2023. "Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams," Papers 2304.10740, arXiv.org, revised Nov 2024.
    12. Azadi, Majid & Yousefi, Saeed & Farzipoor Saen, Reza & Shabanpour, Hadi & Jabeen, Fauzia, 2023. "Forecasting sustainability of healthcare supply chains using deep learning and network data envelopment analysis," Journal of Business Research, Elsevier, vol. 154(C).
    13. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Nydia M. Reyes, 2013. "A Social Approach to Microfinance Credit Scoring," Working Papers CEB 13-013, ULB -- Universite Libre de Bruxelles.
    14. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    15. Wu, Yu & Zhang, Tong, 2021. "Can credit ratings predict defaults in peer-to-peer online lending? Evidence from a Chinese platform," Finance Research Letters, Elsevier, vol. 40(C).
    16. Ma, Qianli & Xu, Lei & Anwar, Sajid & Lu, Zenghua, 2023. "Banking competition and the use of shadow credit: Evidence from lending marketplaces," Global Finance Journal, Elsevier, vol. 58(C).
    17. Jin Huang & Jun Li & Vania Sena, 2023. "Psychological distancing and language intensity in Peer‐to‐Peer lending," Journal of Consumer Affairs, Wiley Blackwell, vol. 57(3), pages 1281-1303, July.
    18. Sarbjit Singh Oberoi & Sayan Banerjee, 2023. "Bankruptcy Prediction of Indian Banks Using Advanced Analytics," Economic Studies journal, Bulgarian Academy of Sciences - Economic Research Institute, issue 4, pages 22-41.
    19. Li, Zhiyong & Li, Aimin & Bellotti, Anthony & Yao, Xiao, 2023. "The profitability of online loans: A competing risks analysis on default and prepayment," European Journal of Operational Research, Elsevier, vol. 306(2), pages 968-985.
    20. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:295:y:2021:i:2:p:758-771. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.