IDEAS home Printed from https://ideas.repec.org/a/eee/riibaf/v59y2022ics0275531921001574.html
   My bibliography  Save this article

A novel two-stage hybrid default prediction model with k-means clustering and support vector domain description

Author

Listed:
  • Yuan, Kunpeng
  • Chi, Guotai
  • Zhou, Ying
  • Yin, Hailei

Abstract

Default prediction identifies the probability of a firm to default by establishing a prediction model. It reveals the functional relation between the features’ data at time t-m and default status at t. If the prediction of a defaulting company is wrong, it will mislead banks into making loans to a “defaulter,” causing huge losses; if the prediction of a non-defaulting company is wrong, it will result in a potential churn in high-quality customers. To support the lending decisions of banks and non-banking financial institutions, this study proposes a two-stage default prediction model that integrates k-means clustering for partitioning the sample and support vector domain description (SVDD) for predicting default (credit scoring). It also uses attributes’ data at time t-m (m = 1, 2, 3, 4, 5) and the default status at t to train the proposed model so that it can warn of default m years ahead. The results show that the predictive accuracy of the proposed two-stage default prediction model is better than that of single-stage models using only k-means clustering or support vector domain description, and the proposed model could achieve a five-year default prediction ability (AUC > 0.85). Further, the study implies that “retained earnings/total assets”, “financial expenses/gross revenue”, and “type of audit opinion” are three key features in default forecasting for Chinese listed enterprises. This study contributes to the field of multi-stage credit scoring research by demonstrating that a combination of different methods is worth considering to improve the performance of default prediction models.

Suggested Citation

  • Yuan, Kunpeng & Chi, Guotai & Zhou, Ying & Yin, Hailei, 2022. "A novel two-stage hybrid default prediction model with k-means clustering and support vector domain description," Research in International Business and Finance, Elsevier, vol. 59(C).
  • Handle: RePEc:eee:riibaf:v:59:y:2022:i:c:s0275531921001574
    DOI: 10.1016/j.ribaf.2021.101536
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0275531921001574
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ribaf.2021.101536?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yiheng Li & Weidong Chen, 2021. "Entropy method of constructing a combined model for improving loan default prediction: A case study in China," Journal of the Operational Research Society, Taylor & Francis Journals, vol. 72(5), pages 1099-1109, May.
    2. Fitzpatrick, Trevor & Mues, Christophe, 2016. "An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market," European Journal of Operational Research, Elsevier, vol. 249(2), pages 427-439.
    3. Chen, Yenpao & Guo, Ruey-Ji & Huang, Rao-Li, 2009. "Two stages credit evaluation in bank loan appraisal," Economic Modelling, Elsevier, vol. 26(1), pages 63-70, January.
    4. Le, Hong Hanh & Viviani, Jean-Laurent, 2018. "Predicting bank failure: An improvement by implementing a machine-learning approach to classical financial ratios," Research in International Business and Finance, Elsevier, vol. 44(C), pages 16-25.
    5. Purnanandam, Amiyatosh, 2008. "Financial distress and corporate risk management: Theory and evidence," Journal of Financial Economics, Elsevier, vol. 87(3), pages 706-739, March.
    6. Wanke, Peter & Azad, M.D. Abul Kalam & Barros, C.P., 2016. "Predicting efficiency in Malaysian Islamic banks: A two-stage TOPSIS and neural networks approach," Research in International Business and Finance, Elsevier, vol. 36(C), pages 485-498.
    7. M. Wong, 1984. "Asymptotic properties of univariate sample k-means clusters," Journal of Classification, Springer;The Classification Society, vol. 1(1), pages 255-270, December.
    8. Kalak, Izidin El & Azevedo, Alcino & Hudson, Robert & Karim, Mohamad Abd, 2017. "Stock liquidity and SMEs’ likelihood of bankruptcy: Evidence from the US market," Research in International Business and Finance, Elsevier, vol. 42(C), pages 1383-1393.
    9. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    10. Guotai Chi & Zhipeng Zhang, 2017. "Multi Criteria Credit Rating Model for Small Enterprise Using a Nonparametric Method," Sustainability, MDPI, vol. 9(10), pages 1-23, October.
    11. Edward I. Altman, 1968. "The Prediction Of Corporate Bankruptcy: A Discriminant Analysis," Journal of Finance, American Finance Association, vol. 23(1), pages 193-194, March.
    12. Jens Hilscher & Mungo Wilson, 2017. "Credit Ratings and Credit Risk: Is One Measure Enough?," Management Science, INFORMS, vol. 63(10), pages 3414-3437, October.
    13. Akkoç, Soner, 2012. "An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish cred," European Journal of Operational Research, Elsevier, vol. 222(1), pages 168-178.
    14. du Jardin, Philippe, 2016. "A two-stage classification technique for bankruptcy prediction," European Journal of Operational Research, Elsevier, vol. 254(1), pages 236-252.
    15. Paul Green & Jonathan Kim & Frank Carmone, 1990. "A preliminary study of optimal variable weighting in k-means clustering," Journal of Classification, Springer;The Classification Society, vol. 7(2), pages 271-285, September.
    16. Zhao, Chengguo & Li, Meng & Wang, Jun & Ma, Shujian, 2021. "The mechanism of credit risk contagion among internet P2P lending platforms based on a SEIR model with time-lag," Research in International Business and Finance, Elsevier, vol. 57(C).
    17. Stolbov, Mikhail & Shchepeleva, Maria, 2020. "Systemic risk, economic policy uncertainty and firm bankruptcies: Evidence from multivariate causal inference," Research in International Business and Finance, Elsevier, vol. 52(C).
    18. Geng, Ruibin & Bose, Indranil & Chen, Xi, 2015. "Prediction of financial distress: An empirical study of listed Chinese companies using data mining," European Journal of Operational Research, Elsevier, vol. 241(1), pages 236-247.
    19. Arno de Caigny & Kristof Coussement & Koen W. de Bock, 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," Post-Print hal-01741661, HAL.
    20. Khediri, Karim Ben & Charfeddine, Lanouar & Youssef, Slah Ben, 2015. "Islamic versus conventional banks in the GCC countries: A comparative study using classification techniques," Research in International Business and Finance, Elsevier, vol. 33(C), pages 75-98.
    21. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W., 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," European Journal of Operational Research, Elsevier, vol. 269(2), pages 760-772.
    22. Meyer, Paul A & Pifer, Howard W, 1970. "Prediction of Bank Failures," Journal of Finance, American Finance Association, vol. 25(4), pages 853-868, September.
    23. Baofeng Shi & Bin Meng & Hufeng Yang & Jing Wang & Wenli Shi, 2018. "A Novel Approach for Reducing Attributes and Its Application to Small Enterprise Financing Ability Evaluation," Complexity, Hindawi, vol. 2018, pages 1-17, January.
    24. Sermpinis, Georgios & Tsoukas, Serafeim & Zhang, Ping, 2018. "Modelling market implied ratings using LASSO variable selection techniques," Journal of Empirical Finance, Elsevier, vol. 48(C), pages 19-35.
    25. Ballester, Laura & González-Urteaga, Ana & Martínez, Beatriz, 2020. "The role of internal corporate governance mechanisms on default risk: A systematic review for different institutional settings," Research in International Business and Finance, Elsevier, vol. 54(C).
    26. Carvalho, Daniel & Ferreira, Miguel A. & Matos, Pedro, 2015. "Lending Relationships and the Effect of Bank Distress: Evidence from the 2007–2009 Financial Crisis," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 50(6), pages 1165-1197, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Abedin, Mohammad Zoynul & Hajek, Petr & Sharif, Taimur & Satu, Md. Shahriare & Khan, Md. Imran, 2023. "Modelling bank customer behaviour using feature engineering and classification techniques," Research in International Business and Finance, Elsevier, vol. 65(C).
    2. Xing, Jin & Chi, Guotai & Pan, Ancheng, 2024. "Instance-dependent misclassification cost-sensitive learning for default prediction," Research in International Business and Finance, Elsevier, vol. 69(C).
    3. Zedda, Stefano & Modina, Michele & Gallucci, Carmen, 2024. "Cooperative credit banks and sustainability: Towards a social credit scoring," Research in International Business and Finance, Elsevier, vol. 68(C).
    4. Shi, Yong & Qu, Yi & Chen, Zhensong & Mi, Yunlong & Wang, Yunong, 2024. "Improved credit risk prediction based on an integrated graph representation learning approach with graph transformation," European Journal of Operational Research, Elsevier, vol. 315(2), pages 786-801.
    5. Li, Zhe & Liang, Shuguang & Pan, Xianyou & Pang, Meng, 2024. "Credit risk prediction based on loan profit: Evidence from Chinese SMEs," Research in International Business and Finance, Elsevier, vol. 67(PA).
    6. Chi, Guotai & Dong, Bingjie & Zhou, Ying & Jin, Peng, 2024. "Long-horizon predictions of credit default with inconsistent customers," Technological Forecasting and Social Change, Elsevier, vol. 198(C).
    7. Chen, Dangxing & Ye, Jiahui & Ye, Weicheng, 2023. "Interpretable selective learning in credit risk," Research in International Business and Finance, Elsevier, vol. 65(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shi, Baofeng & Chi, Guotai & Li, Weiping, 2020. "Exploring the mismatch between credit ratings and loss-given-default: A credit risk approach," Economic Modelling, Elsevier, vol. 85(C), pages 420-428.
    2. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    3. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    4. Dawen Yan & Guotai Chi & Kin Keung Lai, 2020. "Financial Distress Prediction and Feature Selection in Multiple Periods by Lassoing Unconstrained Distributed Lag Non-linear Models," Mathematics, MDPI, vol. 8(8), pages 1-27, August.
    5. Ben Jabeur, Sami & Serret, Vanessa, 2023. "Bankruptcy prediction using fuzzy convolutional neural networks," Research in International Business and Finance, Elsevier, vol. 64(C).
    6. Mohammad Mahdi Mousavi & Jamal Ouenniche, 2018. "Multi-criteria ranking of corporate distress prediction models: empirical evaluation and methodological contributions," Annals of Operations Research, Springer, vol. 271(2), pages 853-886, December.
    7. Jabeur, Sami Ben & Gharib, Cheima & Mefteh-Wali, Salma & Arfi, Wissal Ben, 2021. "CatBoost model and artificial intelligence techniques for corporate failure prediction," Technological Forecasting and Social Change, Elsevier, vol. 166(C).
    8. Mohammad Mahdi Mousavi & Jamal Ouenniche & Kaoru Tone, 2023. "A dynamic performance evaluation of distress prediction models," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(4), pages 756-784, July.
    9. Gianfranco Lombardo & Mattia Pellegrino & George Adosoglou & Stefano Cagnoni & Panos M. Pardalos & Agostino Poggi, 2022. "Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks," Future Internet, MDPI, vol. 14(8), pages 1-23, August.
    10. Bai, Chunguang & Shi, Baofeng & Liu, Feng & Sarkis, Joseph, 2019. "Banking credit worthiness: Evaluating the complex relationships," Omega, Elsevier, vol. 83(C), pages 26-38.
    11. Zeineb Affes & Rania Hentati-Kaffel, 2019. "Predicting US Banks Bankruptcy: Logit Versus Canonical Discriminant Analysis," Computational Economics, Springer;Society for Computational Economics, vol. 54(1), pages 199-244, June.
    12. Mai, Feng & Tian, Shaonan & Lee, Chihoon & Ma, Ling, 2019. "Deep learning models for bankruptcy prediction using textual disclosures," European Journal of Operational Research, Elsevier, vol. 274(2), pages 743-758.
    13. Li, Chunyu & Lou, Chenxin & Luo, Dan & Xing, Kai, 2021. "Chinese corporate distress prediction using LASSO: The role of earnings management," International Review of Financial Analysis, Elsevier, vol. 76(C).
    14. Evan Dudley & Niclas Andrén & Håkan Jankensgård, 2022. "How do firms hedge in financial distress?," Journal of Futures Markets, John Wiley & Sons, Ltd., vol. 42(7), pages 1324-1351, July.
    15. Enrico Supino & Nicola Piras, 2022. "Le performance dei modelli di credit scoring in contesti di forte instabilit? macroeconomica: il ruolo delle Reti Neurali Artificiali," MANAGEMENT CONTROL, FrancoAngeli Editore, vol. 2022(2), pages 41-61.
    16. Yanfang Zhang & Mushang Lee, 2019. "A Hybrid Model for Addressing the Relationship between Financial Performance and Sustainable Development," Sustainability, MDPI, vol. 11(10), pages 1-15, May.
    17. Mitroussi, K. & Abouarghoub, W. & Haider, J.J. & Pettit, S.J. & Tigka, N., 2016. "Performance drivers of shipping loans: An empirical investigation," International Journal of Production Economics, Elsevier, vol. 171(P3), pages 438-452.
    18. Chen, Qi & Vashishtha, Rahul, 2017. "The effects of bank mergers on corporate information disclosure," Journal of Accounting and Economics, Elsevier, vol. 64(1), pages 56-77.
    19. Marco Locurcio & Francesco Tajani & Pierluigi Morano & Debora Anelli & Benedetto Manganelli, 2021. "Credit Risk Management of Property Investments through Multi-Criteria Indicators," Risks, MDPI, vol. 9(6), pages 1-23, June.
    20. Pavlos Almanidis & Robin C. Sickles, 2016. "Banking Crises, Early Warning Models, and Efficiency," International Series in Operations Research & Management Science, in: Juan Aparicio & C. A. Knox Lovell & Jesus T. Pastor (ed.), Advances in Efficiency and Productivity, chapter 0, pages 331-364, Springer.

    More about this item

    Keywords

    Default prediction; K-means clustering; Support vector domain description; Optimal cluster number; Optimal kernel function; Big data;
    All these keywords.

    JEL classification:

    • G17 - Financial Economics - - General Financial Markets - - - Financial Forecasting and Simulation
    • G33 - Financial Economics - - Corporate Finance and Governance - - - Bankruptcy; Liquidation

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:riibaf:v:59:y:2022:i:c:s0275531921001574. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/ribaf .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.