IDEAS home Printed from https://ideas.repec.org/a/pal/risman/v25y2023i4d10.1057_s41283-023-00128-y.html
   My bibliography  Save this article

A-RDBOTE: an improved oversampling technique for imbalanced credit-scoring datasets

Author

Listed:
  • Sudhansu R. Lenka

    (C. V. Raman Global University
    Trident Academy of Technology)

  • Sukant Kishoro Bisoy

    (C. V. Raman Global University)

  • Rojalina Priyadarshini

    (C. V. Raman Global University)

Abstract

Banks and financial industries evaluate the creditworthiness of their customers through credit-scoring models before allocating loans to them. The performance of credit-scoring models significantly degrades due to class imbalance data, in which the class of defaulters is underrepresented as compared to that of non-defaulters, which is one of the major challenging tasks. In this paper, we propose a novel adaptive representative and density-based oversampling technique (A-RDBOTE) to deal with imbalanced credit-scoring datasets. First, the reverse k-nearest neighbor algorithm is applied to eliminate the noisy samples from the training set. Next, a semi-unsupervised clustering method is applied to cluster the minority instances. Then, from each sub-cluster, the representativeness of an instance is determined by considering its degree of similarity with respect to inter and intra-cluster. Subsequently, from each sub-cluster, the instances having high representative values are selected as anchor instances. Finally, artificial minority instances are generated around each anchor instance within the same sub-cluster. The experimental results showed that A-RDBOTE has achieved significantly better results than eight oversampling methods in terms of F1-score, AUC, and G-mean.

Suggested Citation

  • Sudhansu R. Lenka & Sukant Kishoro Bisoy & Rojalina Priyadarshini, 2023. "A-RDBOTE: an improved oversampling technique for imbalanced credit-scoring datasets," Risk Management, Palgrave Macmillan, vol. 25(4), pages 1-37, December.
  • Handle: RePEc:pal:risman:v:25:y:2023:i:4:d:10.1057_s41283-023-00128-y
    DOI: 10.1057/s41283-023-00128-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/s41283-023-00128-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1057/s41283-023-00128-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Martens, David & Baesens, Bart & Van Gestel, Tony & Vanthienen, Jan, 2007. "Comprehensible credit scoring models using rule extraction from support vector machines," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1466-1476, December.
    2. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    3. Yang, Yingxu, 2007. "Adaptive credit scoring with kernel learning methods," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1521-1536, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Juan Laborda & Seyong Ryoo, 2021. "Feature Selection in a Credit Scoring Model," Mathematics, MDPI, vol. 9(7), pages 1-22, March.
    2. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    3. Przemys{l}aw Biecek & Marcin Chlebus & Janusz Gajda & Alicja Gosiewska & Anna Kozak & Dominik Ogonowski & Jakub Sztachelski & Piotr Wojewnik, 2021. "Enabling Machine Learning Algorithms for Credit Scoring -- Explainable Artificial Intelligence (XAI) methods for clear understanding complex predictive models," Papers 2104.06735, arXiv.org.
    4. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    5. Maria Rocha Sousa & João Gama & Elísio Brandão, 2013. "Introducing time-changing economics into credit scoring," FEP Working Papers 513, Universidade do Porto, Faculdade de Economia do Porto.
    6. Khandani, Amir E. & Kim, Adlar J. & Lo, Andrew W., 2010. "Consumer credit-risk models via machine-learning algorithms," Journal of Banking & Finance, Elsevier, vol. 34(11), pages 2767-2787, November.
    7. Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.
    8. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    9. Jonathan K. Budd & Peter G. Taylor, 2015. "Calculating optimal limits for transacting credit card customers," Papers 1506.05376, arXiv.org, revised Aug 2015.
    10. Dinh, K. & Kleimeier, S., 2006. "Credit scoring for Vietnam's retail banking market : implementation and implications for transactional versus relationship lending," Research Memorandum 012, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
    11. Ulf Römer & Oliver Musshoff, 2017. "Can agricultural credit scoring for microfinance institutions be implemented and improved by weather data?," Agricultural Finance Review, Emerald Group Publishing Limited, vol. 78(1), pages 83-97, December.
    12. Derhami, Shahab & Smith, Alice E., 2017. "An integer programming approach for fuzzy rule-based classification systems," European Journal of Operational Research, Elsevier, vol. 256(3), pages 924-934.
    13. Li, Yibei & Wang, Ximei & Djehiche, Boualem & Hu, Xiaoming, 2020. "Credit scoring by incorporating dynamic networked information," European Journal of Operational Research, Elsevier, vol. 286(3), pages 1103-1112.
    14. Loterman, Gert & Brown, Iain & Martens, David & Mues, Christophe & Baesens, Bart, 2012. "Benchmarking regression algorithms for loss given default modeling," International Journal of Forecasting, Elsevier, vol. 28(1), pages 161-170.
    15. Kraft, Holger & Kroisandt, Gerald & Müller, Marlene, 2002. "Assessing the discriminatory power of credit scores," SFB 373 Discussion Papers 2002,67, Humboldt University of Berlin, Interdisciplinary Research Project 373: Quantification and Simulation of Economic Processes.
    16. Chen Ying & Härdle Wolfgang K. & He Qiang & Majer Piotr, 2018. "Risk related brain regions detection and individual risk classification with 3D image FPCA," Statistics & Risk Modeling, De Gruyter, vol. 35(3-4), pages 89-110, July.
    17. Sun, Weixin & Zhang, Xuantao & Li, Minghao & Wang, Yong, 2023. "Interpretable high-stakes decision support system for credit default forecasting," Technological Forecasting and Social Change, Elsevier, vol. 196(C).
    18. Thomas Wainwright, 2011. "Elite Knowledges: Framing Risk and the Geographies of Credit," Environment and Planning A, , vol. 43(3), pages 650-665, March.
    19. Roy Cerqueti & Francesca Pampurini & Annagiulia Pezzola & Anna Grazia Quaranta, 2022. "Dangerous liasons and hot customers for banks," Review of Quantitative Finance and Accounting, Springer, vol. 59(1), pages 65-89, July.
    20. R T Stewart, 2011. "A profit-based scoring system in consumer credit: making acquisition decisions for credit cards," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(9), pages 1719-1725, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:risman:v:25:y:2023:i:4:d:10.1057_s41283-023-00128-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.palgrave.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.