IDEAS home Printed from https://ideas.repec.org/a/vrs/poicbe/v15y2021i1p13-32n40.html
   My bibliography  Save this article

Improvements in PD models. A case-study approach

Author

Listed:
  • Caplescu Raluca Dana

    (Bucharest University of Economic Studies, Bucharest, Romania)

  • Cojocea Manuela-Simona

    (University of Bucharest, Bucharest, Romania)

  • Pele Daniel Traian

    (Bucharest University of Economic Studies, Bucharest, Romania)

  • Strat Vasile Alecsandru

    (Bucharest University of Economic Studies, Bucharest, Romania)

Abstract

Models for estimating the probability of default are widely used in the business throughout the lending process, starting as early as the application stage, where they play an important role in loan approval status. For model soundness and performance, ensuring adequate data quality is essential. Identifying outliers, analyzing their impact and choosing the right method to treat them is a necessary stage of preprocessing, which is often overlooked in practice for a variety of reasons, an important one being insufficient data. Given the inherent imbalance of the loan portfolio with regard to default status, elimination of outliers is seldom feasible. The current widely accepted approach is based on binning and weight of evidence. Usually two types of binning are tested, namely bucket and quantile. While the latter is robust to outlier presence, the former is not. Both approaches lead to the discretization of the continuous variable they are applied on. This causes information loss both in terms of variation given by individual values and in terms of distance between the various observation points on a certain variable. In the present paper, we explore the opportunity of using other methods for dealing with outlier presence and we describe their advantages and disadvantages in the context of probability of default estimation for credit risk. We conclude that, aside from quantile binning, not dealing with outliers in case of very large datasets or winsorizing are also effective. More importantly, several methods should be considered and tested for each variable in order to find the optimal balance between altering the data and reducing variance.

Suggested Citation

  • Caplescu Raluca Dana & Cojocea Manuela-Simona & Pele Daniel Traian & Strat Vasile Alecsandru, 2021. "Improvements in PD models. A case-study approach," Proceedings of the International Conference on Business Excellence, Sciendo, vol. 15(1), pages 13-32, December.
  • Handle: RePEc:vrs:poicbe:v:15:y:2021:i:1:p:13-32:n:40
    DOI: 10.2478/picbe-2021-0004
    as

    Download full text from publisher

    File URL: https://doi.org/10.2478/picbe-2021-0004
    Download Restriction: no

    File URL: https://libkey.io/10.2478/picbe-2021-0004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Eduard Sariev & Guido Germano, 2020. "Bayesian regularized artificial neural networks for the estimation of the probability of default," Quantitative Finance, Taylor & Francis Journals, vol. 20(2), pages 311-328, February.
    2. Petr Gurný & Martin Gurný, 2013. "Comparison of Credit Scoring Models on Probability of Default Estimation for Us Banks," Prague Economic Papers, Prague University of Economics and Business, vol. 2013(2), pages 163-181.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Caplescu Raluca Dana & Panaite Ana-Maria & Pele Daniel Traian & Strat Vasile Alecsandru, 2020. "Will they repay their debt? Identification of borrowers likely to be charged off," Management & Marketing, Sciendo, vol. 15(3), pages 393-409, September.
    2. Timothy Praditia & Thilo Walser & Sergey Oladyshkin & Wolfgang Nowak, 2020. "Improving Thermochemical Energy Storage Dynamics Forecast with Physics-Inspired Neural Network Architecture," Energies, MDPI, vol. 13(15), pages 1-26, July.
    3. Michael L. Polemis & Mike G. Tsionas, 2023. "The environmental consequences of blockchain technology: A Bayesian quantile cointegration analysis for Bitcoin," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 28(2), pages 1602-1621, April.
    4. Juan Rafael Ruiz & Patricia Stupariu & Ángel Vilariño, 2024. "The weakest links in the crisis of the Spanish Savings Banks," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 29(1), pages 654-664, January.
    5. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Are post-crisis statistical initiatives completed?, volume 49, Bank for International Settlements.
    6. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan-level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), The use of big data analytics and artificial intelligence in central banking, volume 50, Bank for International Settlements.
    7. Irving Fisher Committee, 2019. "The use of big data analytics and artificial intelligence in central banking," IFC Bulletins, Bank for International Settlements, number 50.
    8. Salman Bahoo & Marco Cucculelli & Xhoana Goga & Jasmine Mondolo, 2024. "Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis," SN Business & Economics, Springer, vol. 4(2), pages 1-46, February.
    9. A. R. Provenzano & D. Trifir`o & A. Datteo & L. Giada & N. Jean & A. Riciputi & G. Le Pera & M. Spadaccino & L. Massaron & C. Nordio, 2020. "Machine Learning approach for Credit Scoring," Papers 2008.01687, arXiv.org.
    10. Sergio Edwin Torrico Salamanca, 2014. "Macro credit scoring as a proposal for quantifying credit risk," Investigación & Desarrollo 0814, Universidad Privada Boliviana, revised Nov 2014.
    11. D. Bidzhoyan S. & Д. Биджоян С., 2018. "Модель Оценки Вероятности Отзыва Лицензии У Российского Банка // Model For Assessing The Probability Of Revocation Of A License From The Russian Bank," Финансы: теория и практика/Finance: Theory and Practice // Finance: Theory and Practice, ФГОБУВО Финансовый университет при Правительстве Российской Федерации // Financial University under The Government of Russian Federation, vol. 22(2), pages 26-37.
    12. Wei Li & Florentina Paraschiv & Georgios Sermpinis, 2022. "A data-driven explainable case-based reasoning approach for financial risk detection," Quantitative Finance, Taylor & Francis Journals, vol. 22(12), pages 2257-2274, December.
    13. Jaewon Park & Minsoo Shin & Wookjae Heo, 2021. "Estimating the BIS Capital Adequacy Ratio for Korean Banks Using Machine Learning: Predicting by Variable Selection Using Random Forest Algorithms," Risks, MDPI, vol. 9(2), pages 1-19, February.
    14. Haris Doukas & Panos Xidonas & Nikos Mastromichalakis, 2022. "How Successful are Energy Efficiency Investments? A Comparative Analysis for Classification & Performance Prediction," Computational Economics, Springer;Society for Computational Economics, vol. 59(2), pages 579-598, February.
    15. Anita Nandi & Partha Pratim Sengupta & Abhijit Dutta, 2019. "Diagnosing the Financial Distress in Oil Drilling and Exploration Sector of India through Discriminant Analysis," Vision, , vol. 23(4), pages 364-373, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:poicbe:v:15:y:2021:i:1:p:13-32:n:40. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.