IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v302y2022i1p309-323.html
   My bibliography  Save this article

Credit default prediction from user-generated text in peer-to-peer lending using deep learning

Author

Listed:
  • Kriebel, Johannes
  • Stitz, Lennart

Abstract

Digital technologies produce vast amounts of unstructured data that can be stored and accessed by traditional banks and fintech companies. We employ deep learning and several other techniques to extract credit-relevant information from user-generated text on Lending Club. Our results show that even short pieces of user-generated text can improve credit default predictions significantly. The importance of text is further supported by an information fusion analysis. Compared with other approaches that use text, deep learning outperforms them in almost all cases. However, machine learning models combined with word frequencies or topic models also extract substantial credit-relevant information. A comparison of six deep neural network architectures, including state-of-the-art transformer models, finds that the architectures mostly provide similar performance. This means that simpler methods (such as average embedding neural networks) offer performance comparable to more complex methods (such as the transformer networks BERT and RoBERTa) in this credit scoring setting.

Suggested Citation

  • Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
  • Handle: RePEc:eee:ejores:v:302:y:2022:i:1:p:309-323
    DOI: 10.1016/j.ejor.2021.12.024
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S037722172101078X
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2021.12.024?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chen, Shunqin & Guo, Zhengfeng & Zhao, Xinlei, 2021. "Predicting mortgage early delinquency with machine learning methods," European Journal of Operational Research, Elsevier, vol. 290(1), pages 358-372.
    2. Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," European Journal of Operational Research, Elsevier, vol. 259(2), pages 689-702.
    3. Fitzpatrick, Trevor & Mues, Christophe, 2021. "How can lenders prosper? Comparing machine learning approaches to identify profitable peer-to-peer loan investments," European Journal of Operational Research, Elsevier, vol. 294(2), pages 711-722.
    4. Fitzpatrick, Trevor & Mues, Christophe, 2016. "An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market," European Journal of Operational Research, Elsevier, vol. 249(2), pages 427-439.
    5. Rajkamal Iyer & Asim Ijaz Khwaja & Erzo F. P. Luttmer & Kelly Shue, 2016. "Screening Peers Softly: Inferring the Quality of Small Borrowers," Management Science, INFORMS, vol. 62(6), pages 1554-1577, June.
    6. Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
    7. Sami Ben Jabeur & Salma Mefteh-Wali & Jean-Laurent Viviani, 2021. "Forecasting gold price with the XGBoost algorithm and SHAP interaction values," Post-Print hal-03331805, HAL.
    8. Kim, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2020. "Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting," European Journal of Operational Research, Elsevier, vol. 283(1), pages 217-234.
    9. Kraus, Mathias & Feuerriegel, Stefan & Oztekin, Asil, 2020. "Deep learning in business analytics and operations research: Models, applications and managerial implications," European Journal of Operational Research, Elsevier, vol. 281(3), pages 628-641.
    10. Dorfleitner, Gregor & Priberny, Christopher & Schuster, Stephanie & Stoiber, Johannes & Weber, Martina & de Castro, Ivan & Kammler, Julia, 2016. "Description-text related soft information in peer-to-peer lending – Evidence from two leading European platforms," Journal of Banking & Finance, Elsevier, vol. 64(C), pages 169-187.
    11. Bart Baesens & Rudy Setiono & Christophe Mues & Jan Vanthienen, 2003. "Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation," Management Science, INFORMS, vol. 49(3), pages 312-329, March.
    12. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    13. Georgios Sermpinis & Andreas Karathanasopoulos & Rafael Rosillo & David Fuente, 2021. "Neural networks in financial trading," Annals of Operations Research, Springer, vol. 297(1), pages 293-308, February.
    14. Stevenson, Matthew & Mues, Christophe & Bravo, Cristián, 2021. "The value of text for small business default prediction: A Deep Learning approach," European Journal of Operational Research, Elsevier, vol. 295(2), pages 758-771.
    15. Christopher Krauss & Anh Do & Nicolas Huck, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," Post-Print hal-01768895, HAL.
    16. Cao, Yi & Liu, Xiaoquan & Zhai, Jia, 2021. "Option valuation under no-arbitrage constraints with neural networks," European Journal of Operational Research, Elsevier, vol. 293(1), pages 361-374.
    17. Finlay, Steven, 2010. "Credit scoring for profitability objectives," European Journal of Operational Research, Elsevier, vol. 202(2), pages 528-537, April.
    18. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    19. Ravi Kumar, P. & Ravi, V., 2007. "Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review," European Journal of Operational Research, Elsevier, vol. 180(1), pages 1-28, July.
    20. Mai, Feng & Tian, Shaonan & Lee, Chihoon & Ma, Ling, 2019. "Deep learning models for bankruptcy prediction using textual disclosures," European Journal of Operational Research, Elsevier, vol. 274(2), pages 743-758.
    21. Cuiqing Jiang & Zhao Wang & Ruiya Wang & Yong Ding, 2018. "Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending," Annals of Operations Research, Springer, vol. 266(1), pages 511-529, July.
    22. Chen, Xiao & Huang, Bihong & Ye, Dezhu, 2018. "The role of punctuation in P2P lending: Evidence from China," Economic Modelling, Elsevier, vol. 68(C), pages 634-643.
    23. Schnaubelt, Matthias, 2022. "Deep reinforcement learning for the optimal placement of cryptocurrency limit orders," European Journal of Operational Research, Elsevier, vol. 296(3), pages 993-1006.
    24. Huck, Nicolas, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," European Journal of Operational Research, Elsevier, vol. 278(1), pages 330-342.
    25. Apaar Sadhwani & Kay Giesecke & Justin Sirignano, 2021. "Deep Learning for Mortgage Risk [The Subprime Virus]," Journal of Financial Econometrics, Oxford University Press, vol. 19(2), pages 313-368.
    26. Tobias Berg & Valentin Burg & Ana Gombović & Manju Puri, 2020. "On the Rise of FinTechs: Credit Scoring Using Digital Footprints," Review of Financial Studies, Society for Financial Studies, vol. 33(7), pages 2845-2897.
    27. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    28. Nicolas Huck, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," Post-Print hal-02143971, HAL.
    29. Flori, Andrea & Regoli, Daniele, 2021. "Revealing Pairs-trading opportunities with long short-term memory networks," European Journal of Operational Research, Elsevier, vol. 295(2), pages 772-791.
    30. Lucas, Robert Jr, 1976. "Econometric policy evaluation: A critique," Carnegie-Rochester Conference Series on Public Policy, Elsevier, vol. 1(1), pages 19-46, January.
    31. Sevim, Cuneyt & Oztekin, Asil & Bali, Ozkan & Gumus, Serkan & Guresen, Erkam, 2014. "Developing an early warning system to predict currency crises," European Journal of Operational Research, Elsevier, vol. 237(3), pages 1095-1104.
    32. Sumit Agarwal & Vincent Y. S. Chen & Weina Zhang, 2016. "The Information Value of Credit Rating Action Reports: A Textual Analysis," Management Science, INFORMS, vol. 62(8), pages 2218-2240, August.
    33. Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
    34. Kizilaslan, Recep & Freund, Steven & Iseri, Ali, 2016. "A data analytic approach to forecasting daily stock returns in an emerging marketAuthor-Name: Oztekin, Asil," European Journal of Operational Research, Elsevier, vol. 253(3), pages 697-710.
    35. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    36. Mingfeng Lin & Nagpurnanand R. Prabhala & Siva Viswanathan, 2013. "Judging Borrowers by the Company They Keep: Friendship Networks and Information Asymmetry in Online Peer-to-Peer Lending," Management Science, INFORMS, vol. 59(1), pages 17-35, August.
    37. Liu, Zhengchi & Shang, Jennifer & Wu, Shin-yi & Chen, Pei-yu, 2020. "Social collateral, soft information and online peer-to-peer lending: A theoretical model," European Journal of Operational Research, Elsevier, vol. 281(2), pages 428-438.
    38. Asil Oztekin, 2018. "Information fusion-based meta-classification predictive modeling for ETF performance," Information Systems Frontiers, Springer, vol. 20(2), pages 223-238, April.
    39. Tsai, Ming-Feng & Wang, Chuan-Ju, 2017. "On the risk prediction and analysis of soft information in finance reports," European Journal of Operational Research, Elsevier, vol. 257(1), pages 243-250.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Koen W. de Bock & Kristof Coussement & Arno De Caigny & Roman Slowiński & Bart Baesens & Robert N Boute & Tsan-Ming Choi & Dursun Delen & Mathias Kraus & Stefan Lessmann & Sebastián Maldonado & David , 2023. "Explainable AI for Operational Research: A Defining Framework, Methods, Applications, and a Research Agenda," Post-Print hal-04219546, HAL.
    2. Mahsa Tavakoli & Rohitash Chandra & Fengrui Tian & Cristi'an Bravo, 2023. "Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams," Papers 2304.10740, arXiv.org, revised Sep 2023.
    3. Katsafados, Apostolos G. & Leledakis, George N. & Pyrgiotakis, Emmanouil G. & Androutsopoulos, Ion & Fergadiotis, Manos, 2024. "Machine learning in bank merger prediction: A text-based approach," European Journal of Operational Research, Elsevier, vol. 312(2), pages 783-797.
    4. Vairetti, Carla & Aránguiz, Ignacio & Maldonado, Sebastián & Karmy, Juan Pablo & Leal, Alonso, 2024. "Analytics-driven complaint prioritisation via deep learning and multicriteria decision-making," European Journal of Operational Research, Elsevier, vol. 312(3), pages 1108-1118.
    5. Mario Sanz-Guerrero & Javier Arroyo, 2024. "Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending," Papers 2401.16458, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kim, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2020. "Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting," European Journal of Operational Research, Elsevier, vol. 283(1), pages 217-234.
    2. Kolesnikova, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2019. "Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting," IRTG 1792 Discussion Papers 2019-023, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    3. Koen W. de Bock & Kristof Coussement & Arno De Caigny & Roman Slowiński & Bart Baesens & Robert N Boute & Tsan-Ming Choi & Dursun Delen & Mathias Kraus & Stefan Lessmann & Sebastián Maldonado & David , 2023. "Explainable AI for Operational Research: A Defining Framework, Methods, Applications, and a Research Agenda," Post-Print hal-04219546, HAL.
    4. Li, Zhiyong & Li, Aimin & Bellotti, Anthony & Yao, Xiao, 2023. "The profitability of online loans: A competing risks analysis on default and prepayment," European Journal of Operational Research, Elsevier, vol. 306(2), pages 968-985.
    5. Flori, Andrea & Regoli, Daniele, 2021. "Revealing Pairs-trading opportunities with long short-term memory networks," European Journal of Operational Research, Elsevier, vol. 295(2), pages 772-791.
    6. Stevenson, Matthew & Mues, Christophe & Bravo, Cristián, 2021. "The value of text for small business default prediction: A Deep Learning approach," European Journal of Operational Research, Elsevier, vol. 295(2), pages 758-771.
    7. Schnaubelt, Matthias & Fischer, Thomas G. & Krauss, Christopher, 2020. "Separating the signal from the noise – Financial machine learning for Twitter," Journal of Economic Dynamics and Control, Elsevier, vol. 114(C).
    8. Jabeur, Sami Ben & Gharib, Cheima & Mefteh-Wali, Salma & Arfi, Wissal Ben, 2021. "CatBoost model and artificial intelligence techniques for corporate failure prediction," Technological Forecasting and Social Change, Elsevier, vol. 166(C).
    9. Mahsa Tavakoli & Rohitash Chandra & Fengrui Tian & Cristi'an Bravo, 2023. "Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams," Papers 2304.10740, arXiv.org, revised Sep 2023.
    10. Alexander Jakob Dautel & Wolfgang Karl Härdle & Stefan Lessmann & Hsin-Vonn Seow, 2020. "Forex exchange rate forecasting using deep recurrent neural networks," Digital Finance, Springer, vol. 2(1), pages 69-96, September.
    11. Tobias Berg & Andreas Fuster & Manju Puri, 2022. "FinTech Lending," Annual Review of Financial Economics, Annual Reviews, vol. 14(1), pages 187-207, November.
    12. Fabian Waldow & Matthias Schnaubelt & Christopher Krauss & Thomas Günter Fischer, 2021. "Machine Learning in Futures Markets," JRFM, MDPI, vol. 14(3), pages 1-14, March.
    13. Kraus, Mathias & Feuerriegel, Stefan & Oztekin, Asil, 2020. "Deep learning in business analytics and operations research: Models, applications and managerial implications," European Journal of Operational Research, Elsevier, vol. 281(3), pages 628-641.
    14. Han, Chulwoo & He, Zhaodong & Toh, Alenson Jun Wei, 2023. "Pairs trading via unsupervised learning," European Journal of Operational Research, Elsevier, vol. 307(2), pages 929-947.
    15. Rubesam, Alexandre, 2022. "Machine learning portfolios with equal risk contributions: Evidence from the Brazilian market," Emerging Markets Review, Elsevier, vol. 51(PB).
    16. Van Nguyen, Truong & Zhou, Li & Chong, Alain Yee Loong & Li, Boying & Pu, Xiaodie, 2020. "Predicting customer demand for remanufactured products: A data-mining approach," European Journal of Operational Research, Elsevier, vol. 281(3), pages 543-558.
    17. Kamaladdin Fataliyev & Aneesh Chivukula & Mukesh Prasad & Wei Liu, 2021. "Stock Market Analysis with Text Data: A Review," Papers 2106.12985, arXiv.org, revised Jul 2021.
    18. Guillaume Coqueret & Tony Guida, 2020. "Training trees on tails with applications to portfolio choice," Post-Print hal-04144665, HAL.
    19. Jiang, Cuiqing & Lyu, Ximei & Yuan, Yufei & Wang, Zhao & Ding, Yong, 2022. "Mining semantic features in current reports for financial distress prediction: Empirical evidence from unlisted public firms in China," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1086-1099.
    20. Yufei Xia & Lingyun He & Yinguo Li & Nana Liu & Yanlin Ding, 2020. "Predicting loan default in peer‐to‐peer lending using narrative data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(2), pages 260-280, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:302:y:2022:i:1:p:309-323. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.