IDEAS home Printed from https://ideas.repec.org/a/eee/beexfi/v25y2020ics2214635019302072.html
   My bibliography  Save this article

Combining multiple probability predictions in the presence of class imbalance to discriminate between potential bad and good borrowers in the peer-to-peer lending market

Author

Listed:
  • Zanin, Luca

Abstract

Credit risk scoring predictions represent an effective guide for lenders to discriminate between potential good (who will repay the loan) and bad (who will default) borrowers in the online social lending market. A common characteristic of such a market is a lower percentage of defaulted borrowers than non-defaulted borrowers; thus, the sample is class imbalanced. Class imbalance may affect the accuracy of default predictions, as classifiers tend to be biased towards the majority class (good borrowers). We analyse the default prediction performance when combining class rebalancing methods with different regression and machine learning techniques. We also propose to combine multiple probability predictions to improve the predictive performance. The analysis is based on a book of loans (with a three-year term) funded in the 2010–2015 period though the online platform of Lending Club. The results show that some measures of predictive accuracy tend to improve when the scoring models are trained using a rebalanced, rather than an imbalanced sample, except when the extreme gradient boosting approach is applied. Finally, we find that combining multiple probability predictions via regularised logistic regression may help to improve the predictive accuracy.

Suggested Citation

  • Zanin, Luca, 2020. "Combining multiple probability predictions in the presence of class imbalance to discriminate between potential bad and good borrowers in the peer-to-peer lending market," Journal of Behavioral and Experimental Finance, Elsevier, vol. 25(C).
  • Handle: RePEc:eee:beexfi:v:25:y:2020:i:c:s2214635019302072
    DOI: 10.1016/j.jbef.2020.100272
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S2214635019302072
    Download Restriction: no

    File URL: https://libkey.io/10.1016/j.jbef.2020.100272?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Simon N. Wood & Yannig Goude & Simon Shaw, 2015. "Generalized additive models for large data sets," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 64(1), pages 139-155, January.
    2. Stijn Claessens & Jon Frost & Grant Turner & Feng Zhu, 2018. "Fintech credit markets around the world: size, drivers and policy issues," BIS Quarterly Review, Bank for International Settlements, September.
    3. Satopää, Ville A. & Baron, Jonathan & Foster, Dean P. & Mellers, Barbara A. & Tetlock, Philip E. & Ungar, Lyle H., 2014. "Combining multiple probability predictions using a simple logit model," International Journal of Forecasting, Elsevier, vol. 30(2), pages 344-356.
    4. Ahelegbey, Daniel Felix & Giudici, Paolo & Hadji-Misheva, Branka, 2019. "Factorial Network Models To Improve P2P Credit Risk Management," MPRA Paper 92633, University Library of Munich, Germany.
    5. Yan Zhang & Peter Trubey, 2019. "Machine Learning and Sampling Scheme: An Empirical Study of Money Laundering Detection," Computational Economics, Springer;Society for Computational Economics, vol. 54(3), pages 1043-1063, October.
    6. Gonzalez, Laura & Loureiro, Yuliya Komarova, 2014. "When can a photo increase credit? The impact of lender and borrower profiles on online peer-to-peer loans," Journal of Behavioral and Experimental Finance, Elsevier, vol. 2(C), pages 44-58.
    7. Zanin, Luca, 2015. "Determinants of risk attitudes using sample surveys: The implications of a high rate of nonresponse," Journal of Behavioral and Experimental Finance, Elsevier, vol. 8(C), pages 44-53.
    8. Antoine Bouveret, 2018. "Cyber Risk for the Financial Sector: A Framework for Quantitative Assessment," IMF Working Papers 2018/143, International Monetary Fund.
    9. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Luz López-Palacios, 2015. "Determinants of Default in P2P Lending," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-22, October.
    10. Raffaella Calabrese & Silvia Angela Osmetti & Luca Zanin, 2019. "A joint scoring model for peer‐to‐peer and traditional lending: a bivariate model with copula dependence," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1163-1188, October.
    11. S Figini & P Giudici, 2011. "Statistical merging of rating models," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(6), pages 1067-1074, June.
    12. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    13. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    14. George A. Akerlof, 1970. "The Market for "Lemons": Quality Uncertainty and the Market Mechanism," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 84(3), pages 488-500.
    15. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    16. Prystav, Fabian, 2016. "Personal information in peer-to-peer loan applications: Is less more?," Journal of Behavioral and Experimental Finance, Elsevier, vol. 9(C), pages 6-19.
    17. Emmanuel O. Ogundimu, 2019. "Prediction of default probability by using statistical models for rare events," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1143-1162, October.
    18. Jiang, Cuixia & Xu, Qifa & Zhang, Weiming & Li, Mengting & Yang, Shanlin, 2018. "Does automatic bidding mechanism affect herding behavior? Evidence from online P2P lending in China," Journal of Behavioral and Experimental Finance, Elsevier, vol. 20(C), pages 39-44.
    19. S. le Cessie & J. C. van Houwelingen, 1992. "Ridge Estimators in Logistic Regression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(1), pages 191-201, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Weidong Guo & Zach Zhizhong Zhou, 2022. "A comparative study of combining tree‐based feature selection methods and classifiers in personal loan default prediction," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1248-1313, September.
    2. Hyunwoo Woo & So Young Sohn, 2022. "A credit scoring model based on the Myers–Briggs type indicator in online peer-to-peer lending," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-19, December.
    3. Revathi Bhuvaneswari & Antonio Segalini, 2020. "Determining Secondary Attributes for Credit Evaluation in P2P Lending," Papers 2006.13921, arXiv.org.
    4. Baidoo, Edwin & Natarajan, Ramachandran, 2021. "Profit-based credit models with lender’s attitude towards risk and loss," Journal of Behavioral and Experimental Finance, Elsevier, vol. 32(C).
    5. Ligang Zhou & Chao Ma, 2023. "A Comparison of Different Rules on Loans Evaluation in Peer-to-Peer Lending by Gradient Boosting Models Under Moving Windows with Two Timestamps," Computational Economics, Springer;Society for Computational Economics, vol. 62(4), pages 1481-1504, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Štefan Lyócsa & Petra Vašaničová & Branka Hadji Misheva & Marko Dávid Vateha, 2022. "Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-21, December.
    2. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    3. Li Shaoyu & Lu Qing & Fu Wenjiang & Romero Roberto & Cui Yuehua, 2009. "A Regularized Regression Approach for Dissecting Genetic Conflicts that Increase Disease Risk in Pregnancy," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-28, October.
    4. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    5. Koen W. de Bock, 2017. "The best of two worlds: Balancing model strength and comprehensibility in business failure prediction using spline-rule ensembles," Post-Print hal-01588059, HAL.
    6. Chen, Jian & Katchova, Ani L. & Zhou, Chenxi, 2021. "Agricultural loan delinquency prediction using machine learning methods," International Food and Agribusiness Management Review, International Food and Agribusiness Management Association, vol. 24(5), May.
    7. Nicola Branzoli & Ilaria Supino, 2020. "FinTech credit: a critical review of empirical research," Questioni di Economia e Finanza (Occasional Papers) 549, Bank of Italy, Economic Research and International Relations Area.
    8. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    9. Tutz, Gerhard & Binder, Harald, 2007. "Boosting ridge regression," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6044-6059, August.
    10. Wang, Yao & Drabek, Zdenek & Wang, Zhengwei, 2022. "The role of social and psychological related soft information in credit analysis: Evidence from a Fintech Company," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 96(C).
    11. Luca Insolia & Ana Kenney & Martina Calovi & Francesca Chiaromonte, 2021. "Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression," Stats, MDPI, vol. 4(3), pages 1-17, August.
    12. Petra P. Šimović & Claire Y. T. Chen & Edward W. Sun, 2023. "Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression," Computational Economics, Springer;Society for Computational Economics, vol. 61(1), pages 451-485, January.
    13. Petra Posedel v{S}imovi'c & Davor Horvatic & Edward W. Sun, 2021. "Classifying variety of customer's online engagement for churn prediction with mixed-penalty logistic regression," Papers 2105.07671, arXiv.org, revised Jul 2021.
    14. Kamiar Rahnama Rad & Arian Maleki, 2020. "A scalable estimate of the out‐of‐sample prediction error via approximate leave‐one‐out cross‐validation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(4), pages 965-996, September.
    15. Fitzpatrick, Trevor & Mues, Christophe, 2021. "How can lenders prosper? Comparing machine learning approaches to identify profitable peer-to-peer loan investments," European Journal of Operational Research, Elsevier, vol. 294(2), pages 711-722.
    16. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Luz López-Palacios, 2015. "Determinants of Default in P2P Lending," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-22, October.
    17. Satopää, Ville A. & Salikhov, Marat & Tetlock, Philip E. & Mellers, Barbara, 2023. "Decomposing the effects of crowd-wisdom aggregators: The bias–information–noise (BIN) model," International Journal of Forecasting, Elsevier, vol. 39(1), pages 470-485.
    18. Andreas Hoegen & Dennis M. Steininger & Daniel Veit, 2018. "How do investors decide? An interdisciplinary review of decision-making in crowdfunding," Electronic Markets, Springer;IIM University of St. Gallen, vol. 28(3), pages 339-365, August.
    19. Christian S Göbl & Latife Bozkurt & Andrea Tura & Giovanni Pacini & Alexandra Kautzky-Willer & Martina Mittlböck, 2015. "Application of Penalized Regression Techniques in Modelling Insulin Sensitivity by Correlated Metabolic Parameters," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-19, November.
    20. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.

    More about this item

    Keywords

    Class imbalance; Machine learning; Combining multiple probability predictions; Credit risk scoring prediction; Peer-to-peer lending;
    All these keywords.

    JEL classification:

    • D82 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Asymmetric and Private Information; Mechanism Design
    • G4 - Financial Economics - - Behavioral Finance

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:beexfi:v:25:y:2020:i:c:s2214635019302072. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/journal-of-behavioral-and-experimental-finance .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.