IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2312.03194.html
   My bibliography  Save this paper

Corporate Bankruptcy Prediction with Domain-Adapted BERT

Author

Listed:
  • Alex Kim
  • Sangwon Yoon

Abstract

This study performs BERT-based analysis, which is a representative contextualized language model, on corporate disclosure data to predict impending bankruptcies. Prior literature on bankruptcy prediction mainly focuses on developing more sophisticated prediction methodologies with financial variables. However, in our study, we focus on improving the quality of input dataset. Specifically, we employ BERT model to perform sentiment analysis on MD&A disclosures. We show that BERT outperforms dictionary-based predictions and Word2Vec-based predictions in terms of adjusted R-square in logistic regression, k-nearest neighbor (kNN-5), and linear kernel support vector machine (SVM). Further, instead of pre-training the BERT model from scratch, we apply self-learning with confidence-based filtering to corporate disclosure data (10-K). We achieve the accuracy rate of 91.56% and demonstrate that the domain adaptation procedure brings a significant improvement in prediction accuracy.

Suggested Citation

  • Alex Kim & Sangwon Yoon, 2023. "Corporate Bankruptcy Prediction with Domain-Adapted BERT," Papers 2312.03194, arXiv.org.
  • Handle: RePEc:arx:papers:2312.03194
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2312.03194
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. A. Adam Ding & Shaonan Tian & Yan Yu & Hui Guo, 2012. "A Class of Discrete Transformation Survival Models With Application to Default Probability Prediction," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 990-1003, September.
    2. Li, Feng, 2008. "Annual report readability, current earnings, and earnings persistence," Journal of Accounting and Economics, Elsevier, vol. 45(2-3), pages 221-247, August.
    3. Paul C. Tetlock & Maytal Saar‐Tsechansky & Sofus Macskassy, 2008. "More Than Words: Quantifying Language to Measure Firms' Fundamentals," Journal of Finance, American Finance Association, vol. 63(3), pages 1437-1467, June.
    4. Mark Cecchini & Haldun Aytug & Gary J. Koehler & Praveen Pathak, 2010. "Detecting Management Fraud in Public Companies," Management Science, INFORMS, vol. 56(7), pages 1146-1160, July.
    5. Jung, Wo & Kwon, Yk, 1988. "Disclosure When The Market Is Unsure Of Information Endowment Of Managers," Journal of Accounting Research, Wiley Blackwell, vol. 26(1), pages 146-153.
    6. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    7. Premachandra, I.M. & Chen, Yao & Watson, John, 2011. "DEA as a tool for predicting corporate failure and success: A case of bankruptcy assessment," Omega, Elsevier, vol. 39(6), pages 620-626, December.
    8. Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
    9. Shumway, Tyler, 2001. "Forecasting Bankruptcy More Accurately: A Simple Hazard Model," The Journal of Business, University of Chicago Press, vol. 74(1), pages 101-124, January.
    10. Tian, Shaonan & Yu, Yan & Guo, Hui, 2015. "Variable selection and corporate bankruptcy forecasts," Journal of Banking & Finance, Elsevier, vol. 52(C), pages 89-100.
    11. Sreedhar T. Bharath & Tyler Shumway, 2008. "Forecasting Default with the Merton Distance to Default Model," The Review of Financial Studies, Society for Financial Studies, vol. 21(3), pages 1339-1369, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sigrist, Fabio & Leuenberger, Nicola, 2023. "Machine learning for corporate default risk: Multi-period prediction, frailty correlation, loan portfolios, and tail probabilities," European Journal of Operational Research, Elsevier, vol. 305(3), pages 1390-1406.
    2. Yi Cao & Xiaoquan Liu & Jia Zhai & Shan Hua, 2022. "A two‐stage Bayesian network model for corporate bankruptcy prediction," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 27(1), pages 455-472, January.
    3. Bai, Qing & Tian, Shaonan, 2020. "Innovate or die: Corporate innovation and bankruptcy forecasts," Journal of Empirical Finance, Elsevier, vol. 59(C), pages 88-108.
    4. Tian, Shaonan & Yu, Yan, 2017. "Financial ratios and bankruptcy predictions: An international evidence," International Review of Economics & Finance, Elsevier, vol. 51(C), pages 510-526.
    5. Mai, Feng & Tian, Shaonan & Lee, Chihoon & Ma, Ling, 2019. "Deep learning models for bankruptcy prediction using textual disclosures," European Journal of Operational Research, Elsevier, vol. 274(2), pages 743-758.
    6. Serrano-Cinca, Carlos & Gutiérrez-Nieto, Begoña & Bernate-Valbuena, Martha, 2019. "The use of accounting anomalies indicators to predict business failure," European Management Journal, Elsevier, vol. 37(3), pages 353-375.
    7. Mousavi, Mohammad M. & Ouenniche, Jamal & Xu, Bing, 2015. "Performance evaluation of bankruptcy prediction models: An orientation-free super-efficiency DEA-based framework," International Review of Financial Analysis, Elsevier, vol. 42(C), pages 64-75.
    8. Kumar, Rahul & Deb, Soumya Guha & Mukherjee, Shubhadeep, 2020. "Do words reveal the latent truth? Identifying communication patterns of corporate losers," Journal of Behavioral and Experimental Finance, Elsevier, vol. 26(C).
    9. Alessandro Bitetto & Stefano Filomeni & Michele Modina, 2021. "Understanding corporate default using Random Forest: The role of accounting and market information," DEM Working Papers Series 205, University of Pavia, Department of Economics and Management.
    10. Dong, Manh Cuong & Tian, Shaonan & Chen, Cathy W.S., 2018. "Predicting failure risk using financial ratios: Quantile hazard model approach," The North American Journal of Economics and Finance, Elsevier, vol. 44(C), pages 204-220.
    11. Sigrist, Fabio & Hirnschall, Christoph, 2019. "Grabit: Gradient tree-boosted Tobit models for default prediction," Journal of Banking & Finance, Elsevier, vol. 102(C), pages 177-192.
    12. Kerstin Lopatta & Mario Albert Gloger & Reemda Jaeschke, 2017. "Can Language Predict Bankruptcy? The Explanatory Power of Tone in 10‐K Filings," Accounting Perspectives, John Wiley & Sons, vol. 16(4), pages 315-343, December.
    13. Jamal Ouenniche & Kaoru Tone, 2017. "An out-of-sample evaluation framework for DEA with application in bankruptcy prediction," Annals of Operations Research, Springer, vol. 254(1), pages 235-250, July.
    14. Tian, Shaonan & Yu, Yan & Guo, Hui, 2015. "Variable selection and corporate bankruptcy forecasts," Journal of Banking & Finance, Elsevier, vol. 52(C), pages 89-100.
    15. Mohammad Mahdi Mousavi & Jamal Ouenniche, 2018. "Multi-criteria ranking of corporate distress prediction models: empirical evaluation and methodological contributions," Annals of Operations Research, Springer, vol. 271(2), pages 853-886, December.
    16. John Donovan & Jared Jennings & Kevin Koharki & Joshua Lee, 2021. "Measuring credit risk using qualitative disclosure," Review of Accounting Studies, Springer, vol. 26(2), pages 815-863, June.
    17. Ruey-Ching Hwang, 2013. "Forecasting credit ratings with the varying-coefficient model," Quantitative Finance, Taylor & Francis Journals, vol. 13(12), pages 1947-1965, December.
    18. Chen, Peimin & Wu, Chunchi, 2014. "Default prediction with dynamic sectoral and macroeconomic frailties," Journal of Banking & Finance, Elsevier, vol. 40(C), pages 211-226.
    19. Giordani, Paolo & Jacobson, Tor & Schedvin, Erik von & Villani, Mattias, 2014. "Taking the Twists into Account: Predicting Firm Bankruptcy Risk with Splines of Financial Ratios," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 49(4), pages 1071-1099, August.
    20. Li, Chunyu & Lou, Chenxin & Luo, Dan & Xing, Kai, 2021. "Chinese corporate distress prediction using LASSO: The role of earnings management," International Review of Financial Analysis, Elsevier, vol. 76(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2312.03194. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.