IDEAS home Printed from https://ideas.repec.org/a/gam/jrisks/v10y2022i9p169-d895806.html
   My bibliography  Save this article

Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?

Author

Listed:
  • Ahmed Almustfa Hussin Adam Khatir

    (Department of Economics and Management, University of Trento, Via Inama 5, 38122 Trento, Italy)

  • Marco Bee

    (Department of Economics and Management, University of Trento, Via Inama 5, 38122 Trento, Italy)

Abstract

Forecasting the creditworthiness of customers is a central issue of banking activity. This task requires the analysis of large datasets with many variables, for which machine learning algorithms and feature selection techniques are a crucial tool. Moreover, the percentages of “good” and “bad” customers are typically imbalanced such that over- and undersampling techniques should be employed. In the literature, most investigations tackle these three issues individually. Since there is little evidence about their joint performance, in this paper, we try to fill this gap. We use five machine learning classifiers, and each of them is combined with different feature selection techniques and various data-balancing approaches. According to the empirical analysis of a retail credit bank dataset, we find that the best combination is given by random forests, random forest recursive feature elimination and random oversampling.

Suggested Citation

  • Ahmed Almustfa Hussin Adam Khatir & Marco Bee, 2022. "Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?," Risks, MDPI, vol. 10(9), pages 1-22, August.
  • Handle: RePEc:gam:jrisks:v:10:y:2022:i:9:p:169-:d:895806
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-9091/10/9/169/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-9091/10/9/169/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Reichert, Alan K & Cho, Chien-Ching & Wagner, George M, 1983. "An Examination of the Conceptual Issues Involved in Developing Credit-scoring Models," Journal of Business & Economic Statistics, American Statistical Association, vol. 1(2), pages 101-114, April.
    2. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    3. B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
    4. K B Schebesch & R Stecking, 2005. "Support vector machines for classifying and describing credit applicants: detecting typical and critical regions," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(9), pages 1082-1088, September.
    5. Desai, Vijay S. & Crook, Jonathan N. & Overstreet, George A., 1996. "A comparison of neural networks and linear scoring models in the credit union environment," European Journal of Operational Research, Elsevier, vol. 95(1), pages 24-37, November.
    6. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    7. Trivedi, Shrawan Kumar, 2020. "A study on credit scoring modeling with different feature selection and machine learning approaches," Technology in Society, Elsevier, vol. 63(C).
    8. Martin Leo & Suneel Sharma & K. Maddulety, 2019. "Machine Learning in Banking Risk Management: A Literature Review," Risks, MDPI, vol. 7(1), pages 1-22, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Flavio Bazzana & Marco Bee & Ahmed Almustfa Hussin Adam Khatir, 2024. "Machine learning techniques for default prediction: an application to small Italian companies," Risk Management, Palgrave Macmillan, vol. 26(1), pages 1-23, February.
    2. Abdussalam Aljadani & Bshair Alharthi & Mohammed A. Farsi & Hossam Magdy Balaha & Mahmoud Badawy & Mostafa A. Elhosseini, 2023. "Mathematical Modeling and Analysis of Credit Scoring Using the LIME Explainer: A Comprehensive Approach," Mathematics, MDPI, vol. 11(19), pages 1-28, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. G Verstraeten & D Van den Poel, 2005. "The impact of sample bias on consumer credit scoring performance and profitability," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(8), pages 981-992, August.
    2. Pérez-Martín, A. & Pérez-Torregrosa, A. & Vaca, M., 2018. "Big Data techniques to measure credit banking risk in home equity loans," Journal of Business Research, Elsevier, vol. 89(C), pages 448-454.
    3. Hong Wang & Qingsong Xu & Lifeng Zhou, 2015. "Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-20, February.
    4. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    5. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    6. Brad S. Trinkle & Amelia A. Baldwin, 2007. "Interpretable credit model development via artificial neural networks," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 15(3‐4), pages 123-147, July.
    7. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    8. L C Thomas, 2010. "Consumer finance: challenges for operational research," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(1), pages 41-52, January.
    9. Dinh, Thi Huyen Thanh & Kleimeier, Stefanie, 2007. "A credit scoring model for Vietnam's retail banking market," International Review of Financial Analysis, Elsevier, vol. 16(5), pages 471-495.
    10. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    11. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Nydia M. Reyes, 2013. "A Social Approach to Microfinance Credit Scoring," Working Papers CEB 13-013, ULB -- Universite Libre de Bruxelles.
    12. Robert Till & David Hand, 2003. "Behavioural models of credit card usage," Journal of Applied Statistics, Taylor & Francis Journals, vol. 30(10), pages 1201-1220.
    13. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    14. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    15. Rais Ahmad Itoo & A. Selvarasu & José António Filipe, 2015. "Loan Products and Credit Scoring by Commercial Banks (India)," International Journal of Finance, Insurance and Risk Management, International Journal of Finance, Insurance and Risk Management, vol. 5(1), pages 851-851.
    16. José Willer Prado & Valderí Castro Alcântara & Francisval Melo Carvalho & Kelly Carvalho Vieira & Luiz Kennedy Cruz Machado & Dany Flávio Tonelli, 2016. "Multivariate analysis of credit risk and bankruptcy research data: a bibliometric study involving different knowledge fields (1968–2014)," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 1007-1029, March.
    17. Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
    18. Hazar Altinbas & Goktug Cenk Akkaya, 2017. "Improving the performance of statistical learning methods with a combined meta-heuristic for consumer credit risk assessment," Risk Management, Palgrave Macmillan, vol. 19(4), pages 255-280, November.
    19. Jun†Tae Han & Jae†Seok Choi & Myeon†Jung Kim & Jina Jeong, 2018. "Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt," Asian Economic Journal, East Asian Economic Association, vol. 32(1), pages 3-14, March.
    20. Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jrisks:v:10:y:2022:i:9:p:169-:d:895806. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.