IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v201y2010i2p490-499.html
   My bibliography  Save this article

Subagging for credit scoring models

Author

Listed:
  • Paleologo, Giuseppe
  • Elisseeff, André
  • Antonini, Gianluca

Abstract

The logistic regression framework has been for long time the most used statistical method when assessing customer credit risk. Recently, a more pragmatic approach has been adopted, where the first issue is credit risk prediction, instead of explanation. In this context, several classification techniques have been shown to perform well on credit scoring, such as support vector machines among others. While the investigation of better classifiers is an important research topic, the specific methodology chosen in real world applications has to deal with the challenges arising from the real world data collected in the industry. Such data are often highly unbalanced, part of the information can be missing and some common hypotheses, such as the i.i.d. one, can be violated. In this paper we present a case study based on a sample of IBM Italian customers, which presents all the challenges mentioned above. The main objective is to build and validate robust models, able to handle missing information, class unbalancedness and non-iid data points. We define a missing data imputation method and propose the use of an ensemble classification technique, subagging, particularly suitable for highly unbalanced data, such as credit scoring data. Both the imputation and subagging steps are embedded in a customized cross-validation loop, which handles dependencies between different credit requests. The methodology has been applied using several classifiers (kernel support vector machines, nearest neighbors, decision trees, Adaboost) and their subagged versions. The use of subagging improves the performance of the base classifier and we will show that subagging decision trees achieve better performance, still keeping the model simple and reasonably interpretable.

Suggested Citation

  • Paleologo, Giuseppe & Elisseeff, André & Antonini, Gianluca, 2010. "Subagging for credit scoring models," European Journal of Operational Research, Elsevier, vol. 201(2), pages 490-499, March.
  • Handle: RePEc:eee:ejores:v:201:y:2010:i:2:p:490-499
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377-2217(09)00153-2
    Download Restriction: Full text for ScienceDirect subscribers only

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wiginton, John C., 1980. "A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 15(03), pages 757-770, September.
    2. Martens, David & Baesens, Bart & Van Gestel, Tony & Vanthienen, Jan, 2007. "Comprehensible credit scoring models using rule extraction from support vector machines," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1466-1476, December.
    3. Frydman, Halina & Altman, Edward I & Kao, Duen-Li, 1985. " Introducing Recursive Partitioning for Financial Classification: The Case of Financial Distress," Journal of Finance, American Finance Association, vol. 40(1), pages 269-291, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. K. W. De Bock & D. Van Den Poel, 2012. "Reconciling Performance and Interpretability in Customer Churn Prediction using Ensemble Learning based on Generalized Additive Models," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 12/805, Ghent University, Faculty of Economics and Business Administration.
    2. Ju, Yong Han & Sohn, So Young, 2014. "Updating a credit-scoring model based on new attributes without realization of actual data," European Journal of Operational Research, Elsevier, vol. 234(1), pages 119-126.
    3. Coussement, Kristof & Buckinx, Wouter, 2011. "A probability-mapping algorithm for calibrating the posterior probabilities: A direct marketing application," European Journal of Operational Research, Elsevier, vol. 214(3), pages 732-738, November.
    4. Fan, Zhi-Ping & Sun, Minghe, 2015. "Behavior-aware user response modeling in social media: Learning from diverse heterogeneous dataAuthor-Name: Chen, Zhen-Yu," European Journal of Operational Research, Elsevier, vol. 241(2), pages 422-434.
    5. Lessmann, Stefan & Sung, Ming-Chien & Johnson, Johnnie E.V. & Ma, Tiejun, 2012. "A new methodology for generating and combining statistical forecasting models to enhance competitive event prediction," European Journal of Operational Research, Elsevier, vol. 218(1), pages 163-174.
    6. Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.
    7. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    8. repec:gam:jsusta:v:9:y:2017:i:10:p:1834-:d:114674 is not listed on IDEAS
    9. Akkoç, Soner, 2012. "An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish cred," European Journal of Operational Research, Elsevier, vol. 222(1), pages 168-178.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:201:y:2010:i:2:p:490-499. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu). General contact details of provider: http://www.elsevier.com/locate/eor .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.