IDEAS home Printed from https://ideas.repec.org/a/eee/teinso/v63y2020ics0160791x17302324.html
   My bibliography  Save this article

A study on credit scoring modeling with different feature selection and machine learning approaches

Author

Listed:
  • Trivedi, Shrawan Kumar

Abstract

A bit hurdle for financial institutions is to decide potential candidates to give a line of credit identifying the right people without any credit risk. For such a crucial decision, past demographic and financial data of debtors is important to build an automated artificial intelligence credit score prediction model based on machine learning classifier. In addition, for building robust and accurate machine learning models, important input predictors (debtor's information) must be selected. The present computational work focuses on building a credit scoring prediction model. A publicly available German credit data is incorporated in this study. An improvement in the credit scoring prediction has been shown with the use of different feature selection techniques (such as Information-gain, Gain-Ratio and Chi-Square) and machine learning classifiers (Bayesian, Naïve Bayes, Random Forest, Decision Tree (C5.0) and SVM (support Vector Machine)). Further, a comparative analysis is performed between different machine learning classifiers and between different feature selection techniques. Different evaluation metrics are considered for analyzing performance of the models (such as accuracy, F-measure, false positive rate, false negative rate and training time). After analysis, a best combination of machine learning classifier and feature selection technique are identified. In this study, a combination of random forest (RF) and Chi-Square (CS) is found good, among other combinations, with respect to good performance accuracy, F-measure and low false positive and false negative rates. However, training time for this particular combination was found to be slightly higher. Result of C5.0 with chi-square was comparable with the best one. This study provides an opportunity to financial institutions to build an automated model for better credit scoring.

Suggested Citation

  • Trivedi, Shrawan Kumar, 2020. "A study on credit scoring modeling with different feature selection and machine learning approaches," Technology in Society, Elsevier, vol. 63(C).
  • Handle: RePEc:eee:teinso:v:63:y:2020:i:c:s0160791x17302324
    DOI: 10.1016/j.techsoc.2020.101413
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0160791X17302324
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.techsoc.2020.101413?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wei Chen & Zhongfei Li & Jinchao Guo, 2020. "A VNS-EDA Algorithm-Based Feature Selection for Credit Risk Classification," Mathematical Problems in Engineering, Hindawi, vol. 2020, pages 1-14, April.
    2. B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
    3. Maldonado, Sebastián & Pérez, Juan & Bravo, Cristián, 2017. "Cost-based feature selection for Support Vector Machines: An application in credit scoring," European Journal of Operational Research, Elsevier, vol. 261(2), pages 656-665.
    4. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    5. Raffaella Calabrese, 2014. "Predicting bank loan recovery rates with a mixed continuous‐discrete model," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 30(2), pages 99-114, March.
    6. Yao, Xiao & Crook, Jonathan & Andreeva, Galina, 2015. "Support vector regression for loss given default modelling," European Journal of Operational Research, Elsevier, vol. 240(2), pages 528-538.
    7. Naveed, Kashif & Watanabe, Chihiro & Neittaanmäki, Pekka, 2017. "Co-evolution between streaming and live music leads a way to the sustainable growth of music industry – Lessons from the US experiences," Technology in Society, Elsevier, vol. 50(C), pages 1-19.
    8. Al-Emran, Mostafa & Mezhuyev, Vitaliy & Kamaludin, Adzhar, 2020. "Towards a conceptual model for examining the impact of knowledge management factors on mobile learning acceptance," Technology in Society, Elsevier, vol. 61(C).
    9. Fox, Stephen, 2017. "Mass imagineering: Combining human imagination and automated engineering from early education to digital afterlife," Technology in Society, Elsevier, vol. 51(C), pages 163-171.
    10. Koutanaei, Fatemeh Nemati & Sajedi, Hedieh & Khanbabaei, Mohammad, 2015. "A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring," Journal of Retailing and Consumer Services, Elsevier, vol. 27(C), pages 11-23.
    11. Coccia, Mario, 2020. "Deep learning technology for improving cancer care in society: New directions in cancer imaging driven by artificial intelligence," Technology in Society, Elsevier, vol. 60(C).
    12. Wongnaa, Camillus Abawiera & Babu, Suresh, 2020. "Building resilience to shocks of climate change in Ghana's cocoa production and its effect on productivity and incomes," Technology in Society, Elsevier, vol. 62(C).
    13. Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
    14. Cubric, Marija, 2020. "Drivers, barriers and social considerations for AI adoption in business and management: A tertiary study," Technology in Society, Elsevier, vol. 62(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Babaei, Golnoosh & Giudici, Paolo & Raffinetti, Emanuela, 2023. "Explainable FinTech lending," Journal of Economics and Business, Elsevier, vol. 125.
    2. Wei Li & Florentina Paraschiv & Georgios Sermpinis, 2021. "A Data-driven Explainable Case-based Reasoning Approach for Financial Risk Detection," Papers 2107.08808, arXiv.org.
    3. Wei Li & Florentina Paraschiv & Georgios Sermpinis, 2022. "A data-driven explainable case-based reasoning approach for financial risk detection," Quantitative Finance, Taylor & Francis Journals, vol. 22(12), pages 2257-2274, December.
    4. Sun, Yue & Chai, Nana & Dong, Yizhe & Shi, Baofeng, 2022. "Assessing and predicting small industrial enterprises’ credit ratings: A fuzzy decision-making approach," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1158-1172.
    5. Weidong Guo & Zach Zhizhong Zhou, 2022. "A comparative study of combining tree‐based feature selection methods and classifiers in personal loan default prediction," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1248-1313, September.
    6. Ahmed Almustfa Hussin Adam Khatir & Marco Bee, 2022. "Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?," Risks, MDPI, vol. 10(9), pages 1-22, August.
    7. Osama Wagdi & Yasmeen Tarek, 2022. "The Integration of Big Data and Artificial Neural Networks for Enhancing Credit Risk Scoring in Emerging Markets: Evidence from Egypt," International Journal of Economics and Finance, Canadian Center of Science and Education, vol. 14(2), pages 1-32, February.
    8. Daniel Ramos & Mahsa Khorram & Pedro Faria & Zita Vale, 2021. "Load Forecasting in an Office Building with Different Data Structure and Learning Parameters," Forecasting, MDPI, vol. 3(1), pages 1-14, March.
    9. Polyzos, Efstathios & Fotiadis, Anestis & Huan, Tzung-Cheng, 2023. "From Heroes to Scoundrels: Exploring the effects of online campaigns celebrating frontline workers on COVID-19 outcomes," Technology in Society, Elsevier, vol. 72(C).
    10. Dong-Her Shih & Ting-Wei Wu & Po-Yuan Shih & Nai-An Lu & Ming-Hung Shih, 2022. "A Framework of Global Credit-Scoring Modeling Using Outlier Detection and Machine Learning in a P2P Lending Platform," Mathematics, MDPI, vol. 10(13), pages 1-13, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    2. Ostheimer, Julia & Chowdhury, Soumitra & Iqbal, Sarfraz, 2021. "An alliance of humans and machines for machine learning: Hybrid intelligent systems and their design principles," Technology in Society, Elsevier, vol. 66(C).
    3. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    4. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    5. Richard Chamboko & Jorge Miguel Bravo, 2020. "A Multi-State Approach to Modelling Intermediate Events and Multiple Mortgage Loan Outcomes," Risks, MDPI, vol. 8(2), pages 1-29, June.
    6. Kaposty, Florian & Kriebel, Johannes & Löderbusch, Matthias, 2020. "Predicting loss given default in leasing: A closer look at models and variable selection," International Journal of Forecasting, Elsevier, vol. 36(2), pages 248-266.
    7. Finlay, Steven, 2010. "Credit scoring for profitability objectives," European Journal of Operational Research, Elsevier, vol. 202(2), pages 528-537, April.
    8. Fang, Fang & Chen, Yuanyuan, 2019. "A new approach for credit scoring by directly maximizing the Kolmogorov–Smirnov statistic," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 180-194.
    9. L C Thomas, 2010. "Consumer finance: challenges for operational research," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(1), pages 41-52, January.
    10. José Willer Prado & Valderí Castro Alcântara & Francisval Melo Carvalho & Kelly Carvalho Vieira & Luiz Kennedy Cruz Machado & Dany Flávio Tonelli, 2016. "Multivariate analysis of credit risk and bankruptcy research data: a bibliometric study involving different knowledge fields (1968–2014)," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 1007-1029, March.
    11. Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.
    12. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    13. Jonathan K. Budd & Peter G. Taylor, 2015. "Calculating optimal limits for transacting credit card customers," Papers 1506.05376, arXiv.org, revised Aug 2015.
    14. Rasa Kanapickiene & Renatas Spicas, 2019. "Credit Risk Assessment Model for Small and Micro-Enterprises: The Case of Lithuania," Risks, MDPI, vol. 7(2), pages 1-23, June.
    15. K Rajaratnam & P Beling & G Overstreet, 2010. "Scoring decisions in the context of economic uncertainty," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(3), pages 421-429, March.
    16. Richard Chamboko & Jorge M. Bravo, 2016. "On the modelling of prognosis from delinquency to normal performance on retail consumer loans," Risk Management, Palgrave Macmillan, vol. 18(4), pages 264-287, December.
    17. Adnan Dželihodžić & Dženana Đonko & Jasmin Kevrić, 2018. "Improved Credit Scoring Model Based on Bagging Neural Network," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 17(06), pages 1725-1741, November.
    18. Gao, Zheming & Fang, Shu-Cherng & Luo, Jian & Medhin, Negash, 2021. "A kernel-free double well potential support vector machine with applications," European Journal of Operational Research, Elsevier, vol. 290(1), pages 248-262.
    19. Dinh, Thi Huyen Thanh & Kleimeier, Stefanie, 2007. "A credit scoring model for Vietnam's retail banking market," International Review of Financial Analysis, Elsevier, vol. 16(5), pages 471-495.
    20. Teply, Petr & Polena, Michal, 2020. "Best classification algorithms in peer-to-peer lending," The North American Journal of Economics and Finance, Elsevier, vol. 51(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:teinso:v:63:y:2020:i:c:s0160791x17302324. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/technology-in-society .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.