Subagging for credit scoring models
AbstractThe logistic regression framework has been for long time the most used statistical method when assessing customer credit risk. Recently, a more pragmatic approach has been adopted, where the first issue is credit risk prediction, instead of explanation. In this context, several classification techniques have been shown to perform well on credit scoring, such as support vector machines among others. While the investigation of better classifiers is an important research topic, the specific methodology chosen in real world applications has to deal with the challenges arising from the real world data collected in the industry. Such data are often highly unbalanced, part of the information can be missing and some common hypotheses, such as the i.i.d. one, can be violated. In this paper we present a case study based on a sample of IBM Italian customers, which presents all the challenges mentioned above. The main objective is to build and validate robust models, able to handle missing information, class unbalancedness and non-iid data points. We define a missing data imputation method and propose the use of an ensemble classification technique, subagging, particularly suitable for highly unbalanced data, such as credit scoring data. Both the imputation and subagging steps are embedded in a customized cross-validation loop, which handles dependencies between different credit requests. The methodology has been applied using several classifiers (kernel support vector machines, nearest neighbors, decision trees, Adaboost) and their subagged versions. The use of subagging improves the performance of the base classifier and we will show that subagging decision trees achieve better performance, still keeping the model simple and reasonably interpretable.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoArticle provided by Elsevier in its journal European Journal of Operational Research.
Volume (Year): 201 (2010)
Issue (Month): 2 (March)
Contact details of provider:
Web page: http://www.elsevier.com/locate/eor
Risk analysis Credit scoring Classification Decision Support Systems;
You can help add them by filling out this form.
CitEc Project, subscribe to its RSS feed for this item.
- Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.
- Ju, Yong Han & Sohn, So Young, 2014. "Updating a credit-scoring model based on new attributes without realization of actual data," European Journal of Operational Research, Elsevier, vol. 234(1), pages 119-126.
- Akkoç, Soner, 2012. "An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish cred," European Journal of Operational Research, Elsevier, vol. 222(1), pages 168-178.
- Coussement, Kristof & Buckinx, Wouter, 2011. "A probability-mapping algorithm for calibrating the posterior probabilities: A direct marketing application," European Journal of Operational Research, Elsevier, vol. 214(3), pages 732-738, November.
- K. W. De Bock & D. Van Den Poel, 2012. "Reconciling Performance and Interpretability in Customer Churn Prediction using Ensemble Learning based on Generalized Additive Models," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 12/805, Ghent University, Faculty of Economics and Business Administration.
- Lessmann, Stefan & Sung, Ming-Chien & Johnson, Johnnie E.V. & Ma, Tiejun, 2012. "A new methodology for generating and combining statistical forecasting models to enhance competitive event prediction," European Journal of Operational Research, Elsevier, vol. 218(1), pages 163-174.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Zhang, Lei).
If references are entirely missing, you can add them using this form.