IDEAS home Printed from https://ideas.repec.org/a/wly/jforec/v43y2024i2p429-455.html
   My bibliography  Save this article

Enhancing credit risk prediction based on ensemble tree‐based feature transformation and logistic regression

Author

Listed:
  • Jiaming Liu
  • Jiajia Liu
  • Chong Wu
  • Shouyang Wang

Abstract

The assessment of credit risk for P2P lending platform applicants is critical to investors. Feature engineering is an essential technique in distilling classification knowledge during the credit risk prediction data preprocessing stage. Although previous literature used feature selection methods to identify key features, feature transformation is more useful in discovering intrinsic nonlinear characteristics in credit data. In this study, we propose a synthetic multiple tree‐based feature transformation method to generate features. Multiple tree‐based feature transformation methods are employed and fused to acquire a new feature set. The bagging‐based tree ensemble feature transformation method (Bagging‐TreeEnsembleFT) and boosting‐based tree ensemble feature transformation method (Boosting‐TreeEnsembleFT) are two types of feature transformation methods that we specifically propose to validate their effect. We verify the credit risk prediction performance using the proposed synthetic feature transformation methods on real P2P Lending credit datasets. Empirical analysis demonstrates that tree‐based ensemble feature transformation methods with boosting ensemble strategy achieve better prediction performance on various datasets corresponding to different partitions and class distributions compared to tree‐based ensemble feature transformation methods with bagging ensemble strategy and individuals. Moreover, the proposed synthetic feature transformation method improves the credit risk prediction performance in terms of accuracy, AUC, and F1‐score.

Suggested Citation

  • Jiaming Liu & Jiajia Liu & Chong Wu & Shouyang Wang, 2024. "Enhancing credit risk prediction based on ensemble tree‐based feature transformation and logistic regression," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(2), pages 429-455, March.
  • Handle: RePEc:wly:jforec:v:43:y:2024:i:2:p:429-455
    DOI: 10.1002/for.3040
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/for.3040
    Download Restriction: no

    File URL: https://libkey.io/10.1002/for.3040?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Maldonado, Sebastián & Pérez, Juan & Bravo, Cristián, 2017. "Cost-based feature selection for Support Vector Machines: An application in credit scoring," European Journal of Operational Research, Elsevier, vol. 261(2), pages 656-665.
    2. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    3. Weidong Guo & Zach Zhizhong Zhou, 2022. "A comparative study of combining tree‐based feature selection methods and classifiers in personal loan default prediction," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1248-1313, September.
    4. Capotorti, Andrea & Barbanera, Eva, 2012. "Credit scoring analysis using a fuzzy probabilistic rough set model," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 981-994.
    5. Zhiyong Li & Ke Li & Xiao Yao & Qing Wen, 2019. "Predicting Prepayment and Default Risks of Unsecured Consumer Loans in Online Lending," Emerging Markets Finance and Trade, Taylor & Francis Journals, vol. 55(1), pages 118-132, January.
    6. Qizhi Tao & Yizhe Dong & Ziming Lin, 2017. "Who can get money? Evidence from the Chinese peer-to-peer lending platform," Information Systems Frontiers, Springer, vol. 19(3), pages 425-441, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Trivedi, Shrawan Kumar, 2020. "A study on credit scoring modeling with different feature selection and machine learning approaches," Technology in Society, Elsevier, vol. 63(C).
    2. Hamidreza Arian & Seyed Mohammad Sina Seyfi & Azin Sharifi, 2020. "Forecasting Probability of Default for Consumer Loan Management with Gaussian Mixture Models," Papers 2011.07906, arXiv.org.
    3. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    4. Dongwoo Kim, 2023. "Can investors’ collective decision-making evolve? Evidence from peer-to-peer lending markets," Electronic Commerce Research, Springer, vol. 23(2), pages 1323-1358, June.
    5. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    6. Jonathan K. Budd & Peter G. Taylor, 2015. "Calculating optimal limits for transacting credit card customers," Papers 1506.05376, arXiv.org, revised Aug 2015.
    7. Dinh, K. & Kleimeier, S., 2006. "Credit scoring for Vietnam's retail banking market : implementation and implications for transactional versus relationship lending," Research Memorandum 012, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
    8. Ulf Römer & Oliver Musshoff, 2017. "Can agricultural credit scoring for microfinance institutions be implemented and improved by weather data?," Agricultural Finance Review, Emerald Group Publishing Limited, vol. 78(1), pages 83-97, December.
    9. Kraft, Holger & Kroisandt, Gerald & Müller, Marlene, 2002. "Assessing the discriminatory power of credit scores," SFB 373 Discussion Papers 2002,67, Humboldt University of Berlin, Interdisciplinary Research Project 373: Quantification and Simulation of Economic Processes.
    10. Chen Ying & Härdle Wolfgang K. & He Qiang & Majer Piotr, 2018. "Risk related brain regions detection and individual risk classification with 3D image FPCA," Statistics & Risk Modeling, De Gruyter, vol. 35(3-4), pages 89-110, July.
    11. Sun, Weixin & Zhang, Xuantao & Li, Minghao & Wang, Yong, 2023. "Interpretable high-stakes decision support system for credit default forecasting," Technological Forecasting and Social Change, Elsevier, vol. 196(C).
    12. Thomas Wainwright, 2011. "Elite Knowledges: Framing Risk and the Geographies of Credit," Environment and Planning A, , vol. 43(3), pages 650-665, March.
    13. Roy Cerqueti & Francesca Pampurini & Annagiulia Pezzola & Anna Grazia Quaranta, 2022. "Dangerous liasons and hot customers for banks," Review of Quantitative Finance and Accounting, Springer, vol. 59(1), pages 65-89, July.
    14. R T Stewart, 2011. "A profit-based scoring system in consumer credit: making acquisition decisions for credit cards," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(9), pages 1719-1725, September.
    15. Rasa Kanapickiene & Renatas Spicas, 2019. "Credit Risk Assessment Model for Small and Micro-Enterprises: The Case of Lithuania," Risks, MDPI, vol. 7(2), pages 1-23, June.
    16. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W., 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," European Journal of Operational Research, Elsevier, vol. 269(2), pages 760-772.
    17. Rodrigo Alfaro A. & David Pacheco L. & Andrés Sagner T, 2011. "Dinámica de la Tasa de Incumplimiento de Créditos de Consumo en Cuotas," Notas de Investigación Journal Economía Chilena (The Chilean Economy), Central Bank of Chile, vol. 14(2), pages 119-124, August.
    18. Theuri, Joseph & Olukuru, John, 2022. "The impact of Artficial Intelligence and how it is shaping banking," KBA Centre for Research on Financial Markets and Policy Working Paper Series 61, Kenya Bankers Association (KBA).
    19. Zhiyong Li & Xinyi Hu & Ke Li & Fanyin Zhou & Feng Shen, 2020. "Inferring the outcomes of rejected loans: an application of semisupervised clustering," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(2), pages 631-654, February.
    20. K Rajaratnam & P Beling & G Overstreet, 2010. "Scoring decisions in the context of economic uncertainty," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(3), pages 421-429, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:jforec:v:43:y:2024:i:2:p:429-455. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www3.interscience.wiley.com/cgi-bin/jhome/2966 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.