IDEAS home Printed from https://ideas.repec.org/a/eee/intfor/v39y2023i4p1593-1614.html
   My bibliography  Save this article

Tree-based heterogeneous cascade ensemble model for credit scoring

Author

Listed:
  • Liu, Wanan
  • Fan, Hong
  • Xia, Meng

Abstract

Credit scoring is an important tool to guard against commercial risks for banks and lending companies and provides good conditions for the construction of individual personal credit. Ensemble algorithms have shown appealing progress for the improvement of credit scoring. In this study, to meet the challenge of large-scale credit scoring, we propose a heterogeneous deep forest model (Heter-DF), which is established based on considerations ranging from base learner selection, encouragement of the diversity of base learners, and ensemble strategies, for credit scoring. Heter-DF is designed as a scalable cascading framework that can increase its complexity with the scale of the credit dataset. Moreover, each level of Heter-DF is built by multiple heterogeneous tree-based ensembled base learners, avoiding the homogeneous prediction of the ensemble framework. In addition, a weighted voting mechanism is introduced to highlight important information and suppress irrelevant features, making Heter-DF a robust model for credit scoring. Experimental results on four credit scoring datasets and six evaluation metrics show that the cascading framework a good choice for the ensemble of tree-based base learners. A comparison among homogeneous ensembles and heterogeneous ensembles further demonstrates the effectiveness of Heter-DF. Experiments on different training sets indicate that Heter-DF is a scalable framework which not only deals with large-scale credit scoring but also satisfies the condition where small-scale credit scoring is desirable. Finally, based on the good interpretability of a tree-based structure, the global interpretation of Heter-DF is preliminarily explored.

Suggested Citation

  • Liu, Wanan & Fan, Hong & Xia, Meng, 2023. "Tree-based heterogeneous cascade ensemble model for credit scoring," International Journal of Forecasting, Elsevier, vol. 39(4), pages 1593-1614.
  • Handle: RePEc:eee:intfor:v:39:y:2023:i:4:p:1593-1614
    DOI: 10.1016/j.ijforecast.2022.07.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0169207022001054
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ijforecast.2022.07.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Maldonado, Sebastián & Pérez, Juan & Bravo, Cristián, 2017. "Cost-based feature selection for Support Vector Machines: An application in credit scoring," European Journal of Operational Research, Elsevier, vol. 261(2), pages 656-665.
    2. Bo Huang & Lyn C Thomas, 2015. "The impact of Basel Accords on the lender’s profitability under different pricing decisions," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 66(11), pages 1826-1839, November.
    3. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    4. Xia, Yufei & Zhao, Junhao & He, Lingyun & Li, Yinguo & Yang, Xiaoli, 2021. "Forecasting loss given default for peer-to-peer loans via heterogeneous stacking ensemble approach," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1590-1613.
    5. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    6. De Bock, Koen W. & Coussement, Kristof & Lessmann, Stefan, 2020. "Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach," European Journal of Operational Research, Elsevier, vol. 285(2), pages 612-630.
    7. B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
    8. Koen W. de Bock & Kristof Coussement & Stefan Lessmann, 2020. "Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach," Post-Print hal-02863245, HAL.
    9. Zhou, Jing & Li, Wei & Wang, Jiaxin & Ding, Shuai & Xia, Chengyi, 2019. "Default prediction in P2P lending from high-dimensional data based on machine learning," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 534(C).
    10. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    11. Hu, Yu-Chiang & Ansell, Jake, 2007. "Measuring retail company performance using credit scoring techniques," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1595-1606, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Casado Yusta, Silvia & Nœ–ez Letamendía, Laura & Pacheco Bonrostro, Joaqu’n Antonio, 2018. "Predicting Corporate Failure: The GRASP-LOGIT Model || Predicci—n de la quiebra empresarial: el modelo GRASP-LOGIT," Revista de Métodos Cuantitativos para la Economía y la Empresa = Journal of Quantitative Methods for Economics and Business Administration, Universidad Pablo de Olavide, Department of Quantitative Methods for Economics and Business Administration, vol. 26(1), pages 294-314, Diciembre.
    2. Matthias Bogaert & Lex Delaere, 2023. "Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art," Mathematics, MDPI, vol. 11(5), pages 1-28, February.
    3. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    4. Juan Laborda & Seyong Ryoo, 2021. "Feature Selection in a Credit Scoring Model," Mathematics, MDPI, vol. 9(7), pages 1-22, March.
    5. Koen W. de Bock, 2017. "The best of two worlds: Balancing model strength and comprehensibility in business failure prediction using spline-rule ensembles," Post-Print hal-01588059, HAL.
    6. Chengbin Wang & Kuangnan Fang & Chenlu Zheng & Hechao Xu & Zewei Li, 0. "Credit scoring of micro and small entrepreneurial firms in China," International Entrepreneurship and Management Journal, Springer, vol. 0, pages 1-15.
    7. Koen W. de Bock & Kristof Coussement & Arno De Caigny & Roman Slowiński & Bart Baesens & Robert N Boute & Tsan-Ming Choi & Dursun Delen & Mathias Kraus & Stefan Lessmann & Sebastián Maldonado & David , 2023. "Explainable AI for Operational Research: A Defining Framework, Methods, Applications, and a Research Agenda," Post-Print hal-04219546, HAL.
    8. Michal Pavlicko & Marek Durica & Jaroslav Mazanec, 2021. "Ensemble Model of the Financial Distress Prediction in Visegrad Group Countries," Mathematics, MDPI, vol. 9(16), pages 1-26, August.
    9. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    10. José Willer Prado & Valderí Castro Alcântara & Francisval Melo Carvalho & Kelly Carvalho Vieira & Luiz Kennedy Cruz Machado & Dany Flávio Tonelli, 2016. "Multivariate analysis of credit risk and bankruptcy research data: a bibliometric study involving different knowledge fields (1968–2014)," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 1007-1029, March.
    11. Rasa Kanapickiene & Renatas Spicas, 2019. "Credit Risk Assessment Model for Small and Micro-Enterprises: The Case of Lithuania," Risks, MDPI, vol. 7(2), pages 1-23, June.
    12. Chengbin Wang & Kuangnan Fang & Chenlu Zheng & Hechao Xu & Zewei Li, 2021. "Credit scoring of micro and small entrepreneurial firms in China," International Entrepreneurship and Management Journal, Springer, vol. 17(1), pages 29-43, March.
    13. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    14. Richard Chamboko & Jorge Miguel Bravo, 2020. "A Multi-State Approach to Modelling Intermediate Events and Multiple Mortgage Loan Outcomes," Risks, MDPI, vol. 8(2), pages 1-29, June.
    15. Dagmar Camska & Jiri Klecka, 2020. "Comparison of Prediction Models Applied in Economic Recession and Expansion," JRFM, MDPI, vol. 13(3), pages 1-16, March.
    16. Shen, Feng & Zhang, Xin & Wang, Run & Lan, Dao & Zhou, Wei, 2022. "Sequential optimization three-way decision model with information gain for credit default risk evaluation," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1116-1128.
    17. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    18. Tsukahara, Fábio Yasuhiro & Kimura, Herbert & Sobreiro, Vinicius Amorim & Zambrano, Juan Carlos Arismendi, 2016. "Validation of default probability models: A stress testing approach," International Review of Financial Analysis, Elsevier, vol. 47(C), pages 70-85.
    19. Richard Chamboko & Jorge M. Bravo, 2016. "On the modelling of prognosis from delinquency to normal performance on retail consumer loans," Risk Management, Palgrave Macmillan, vol. 18(4), pages 264-287, December.
    20. Cao Son Tran & Dan Nicolau & Richi Nayak & Peter Verhoeven, 2021. "Modeling Credit Risk: A Category Theory Perspective," JRFM, MDPI, vol. 14(7), pages 1-21, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:intfor:v:39:y:2023:i:4:p:1593-1614. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/ijforecast .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.