IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i6p855-d1357214.html
   My bibliography  Save this article

Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards

Author

Listed:
  • John Martin

    (School of Science, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia)

  • Sona Taheri

    (School of Science, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia)

  • Mali Abdollahian

    (School of Science, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia)

Abstract

Credit risk scorecard models are utilized by lending institutions to optimize decisions on credit approvals. In recent years, ensemble learning has often been deployed to reduce misclassification costs in credit risk scorecards. In this paper, we compared the risk estimation of 26 widely used machine learning algorithms based on commonly used statistical metrics. The best-performing algorithms were then used for model selection in ensemble learning. For the first time, we proposed financial criteria that assess the impact of losses associated with both false positive and false negative predictions to identify optimal ensemble learning. The German Credit Dataset (GCD) is augmented with simulated financial information according to a hypothetical mortgage portfolio observed in UK, European and Australian banks to enable the assessment of losses arising from misclassification costs. The experimental results using the simulated GCD show that the best predictive individual algorithm with the accuracy of 0.87, Gini of 0.88 and Area Under the Receiver Operating Curve of 0.94 was the Generalized Additive Model (GAM). The ensemble learning method with the lowest misclassification cost was the combination of Random Forest (RF) and K-Nearest Neighbors (KNN), totaling USD 417 million in costs (USD 230 for default costs and USD 187 for opportunity costs) compared to the costs of the GAM (USD 487, USD 287 and USD 200). Implementing the proposed financial criteria has led to a significant USD 70 million reduction in misclassification costs derived from a small sample. Thus, the lending institutions’ profit would considerably rise as the number of submitted credit applications for approval increases.

Suggested Citation

  • John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1, March.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:6:p:855-:d:1357214
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/6/855/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/6/855/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Panayiota Koulafetis, 2017. "Modern Credit Risk Management," Palgrave Macmillan Books, Palgrave Macmillan, number 978-1-137-52407-2, September.
    2. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    3. Boxiang Wang & Hui Zou, 2018. "Another look at distance‐weighted discrimination," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(1), pages 177-198, January.
    4. Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    2. Dangxing Chen & Luyao Zhang, 2023. "Monotonicity for AI ethics and society: An empirical study of the monotonic neural additive model in criminology, education, health care, and finance," Papers 2301.07060, arXiv.org.
    3. Sun, Weixin & Zhang, Xuantao & Li, Minghao & Wang, Yong, 2023. "Interpretable high-stakes decision support system for credit default forecasting," Technological Forecasting and Social Change, Elsevier, vol. 196(C).
    4. Al-Amin Abba Dabo & Amin Hosseinian-Far, 2023. "An Integrated Methodology for Enhancing Reverse Logistics Flows and Networks in Industry 5.0," Logistics, MDPI, vol. 7(4), pages 1-26, December.
    5. Chun-Xia Zhang & Jiang-She Zhang & Sang-Woon Kim, 2016. "PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection," Computational Statistics, Springer, vol. 31(4), pages 1237-1262, December.
    6. Kleiman, Rachel M. & Characklis, Gregory W. & Kern, Jordan D., 2022. "Managing weather- and market price-related financial risks in algal biofuel production," Renewable Energy, Elsevier, vol. 200(C), pages 111-124.
    7. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    8. Hayley Randall & Andreas Artemiou & Xingye Qiao, 2021. "Sufficient dimension reduction based on distance‐weighted discrimination," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(4), pages 1186-1211, December.
    9. Yang Liu & Fei Huang & Lili Ma & Qingguo Zeng & Jiale Shi, 2024. "Credit scoring prediction leveraging interpretable ensemble learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(2), pages 286-308, March.
    10. John De Jesús González & Filiberto Enrique Valdés Medina & Maria Luisa Saavedra García, 2021. "Factores de éxito en el financiamiento para Pymes a través del Crowdfunding en México," Remef - Revista Mexicana de Economía y Finanzas Nueva Época REMEF (The Mexican Journal of Economics and Finance), Instituto Mexicano de Ejecutivos de Finanzas, IMEF, vol. 16(2), pages 1-23, Abril - J.
    11. Barrow, Devon K. & Crone, Sven F., 2016. "A comparison of AdaBoost algorithms for time series forecast combination," International Journal of Forecasting, Elsevier, vol. 32(4), pages 1103-1119.
    12. Li, Zhiyong & Li, Aimin & Bellotti, Anthony & Yao, Xiao, 2023. "The profitability of online loans: A competing risks analysis on default and prepayment," European Journal of Operational Research, Elsevier, vol. 306(2), pages 968-985.
    13. Dangxing Chen, 2022. "Two-stage Modeling for Prediction with Confidence," Papers 2209.08848, arXiv.org.
    14. Kellner, Ralf & Nagl, Maximilian & Rösch, Daniel, 2022. "Opening the black box – Quantile neural networks for loss given default prediction," Journal of Banking & Finance, Elsevier, vol. 134(C).
    15. Dangxing Chen & Weicheng Ye, 2022. "Generalized Gloves of Neural Additive Models: Pursuing transparent and accurate machine learning models in finance," Papers 2209.10082, arXiv.org.
    16. Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.
    17. Dangxing Chen & Weicheng Ye, 2022. "Monotonic Neural Additive Models: Pursuing Regulated Machine Learning Models for Credit Scoring," Papers 2209.10070, arXiv.org.
    18. Chun-Xia Zhang & Guan-Wei Wang & Jun-Min Liu, 2015. "RandGA: injecting randomness into parallel genetic algorithm for variable selection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(3), pages 630-647, March.
    19. Zhou, Ying & Shen, Long & Ballester, Laura, 2023. "A two-stage credit scoring model based on random forest: Evidence from Chinese small firms," International Review of Financial Analysis, Elsevier, vol. 89(C).
    20. Chen, Yujia & Calabrese, Raffaella & Martin-Barragan, Belen, 2024. "Interpretable machine learning for imbalanced credit scoring datasets," European Journal of Operational Research, Elsevier, vol. 312(1), pages 357-372.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:6:p:855-:d:1357214. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.