Fighting sampling bias: A framework for training and evaluating credit scoring models

My bibliography Save this article

Fighting sampling bias: A framework for training and evaluating credit scoring models

Author

Listed:

Kozodoi, Nikita
Lessmann, Stefan
Alamgir, Morteza
Moreira-Matias, Luis
Papakonstantinou, Konstantinos

Registered:

Abstract

Scoring models support decision-making in financial institutions. Their estimation and evaluation rely on labeled data from previously accepted clients. Ignoring rejected applicants with unknown repayment behavior introduces sampling bias, as the available labeled data only partially represents the population of potential borrowers. This paper examines the impact of sampling bias and introduces new methods to mitigate its adverse effect. First, we develop a bias-aware self-labeling algorithm for scorecard training, which debiases the training data by adding selected rejects with an inferred label. Second, we propose a Bayesian framework to address sampling bias in scorecard evaluation. To provide reliable projections of future scorecard performance, we include rejected clients with random pseudo-labels in the test set and use Monte Carlo sampling to estimate the scorecard’s expected performance across label realizations. We conduct extensive experiments using both synthetic and observational data. The observational data includes an unbiased sample of applicants accepted without scoring, representing the true borrower population and facilitating a realistic assessment of reject inference techniques. The results show that our methods outperform established benchmarks in predictive accuracy and profitability. Additional sensitivity analysis clarifies the conditions under which they are most effective. Comparing the relative effectiveness of addressing sampling bias during scorecard training versus evaluation, we find the latter much more promising. For example, we estimate the expected return per dollar issued to increase by up to 2.07 and up to 5.76 percentage points when using bias-aware self-labeling and Bayesian evaluation, respectively.

Suggested Citation

Kozodoi, Nikita & Lessmann, Stefan & Alamgir, Morteza & Moreira-Matias, Luis & Papakonstantinou, Konstantinos, 2025. "Fighting sampling bias: A framework for training and evaluating credit scoring models," European Journal of Operational Research, Elsevier, vol. 324(2), pages 616-628.

Handle: RePEc:eee:ejores:v:324:y:2025:i:2:p:616-628
DOI: 10.1016/j.ejor.2025.01.040

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Meng, Chun-Lo & Schmidt, Peter, 1985. "On the Cost of Partial Observability in the Bivariate Probit Model," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 26(1), pages 71-85, February.
Kamesh Korangi & Christophe Mues & Cristi'an Bravo, 2021. "A transformer-based model for default prediction in mid-cap corporate markets," Papers 2111.09902, arXiv.org, revised Apr 2023.
Bravo, Cristián & Maldonado, Sebastián & Weber, Richard, 2013. "Granting and managing loans for micro-entrepreneurs: New developments and practical experiences," European Journal of Operational Research, Elsevier, vol. 227(2), pages 358-366.
Verbraken, Thomas & Bravo, Cristián & Weber, Richard & Baesens, Bart, 2014. "Development and application of consumer credit scoring models using profit-based classification measures," European Journal of Operational Research, Elsevier, vol. 238(2), pages 505-513.
Wu, I-Ding & Hand, David J., 2007. "Handling selection bias when choosing actions in retail credit applications," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1560-1568, December.
Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
Yanhao Wei & Pinar Yildirim & Christophe Van den Bulte & Chrysanthos Dellarocas, 2016. "Credit Scoring with Social Network Data," Marketing Science, INFORMS, vol. 35(2), pages 234-258, March.
Banasik, John & Crook, Jonathan, 2007. "Reject inference, augmentation, and sample selection," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1582-1594, December.
Korangi, Kamesh & Mues, Christophe & Bravo, Cristián, 2023. "A transformer-based model for default prediction in mid-cap corporate markets," European Journal of Operational Research, Elsevier, vol. 308(1), pages 306-320.
J Banasik & J Crook & L Thomas, 2003. "Sample selection bias in credit scoring models," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(8), pages 822-832, August.
Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
- Susan Athey & Stefan Wager, 2017. "Policy Learning with Observational Data," Papers 1702.02896, arXiv.org, revised Sep 2020.
Alfonso-Sánchez, Sherly & Solano, Jesús & Correa-Bahnsen, Alejandro & Sendova, Kristina P. & Bravo, Cristián, 2024. "Optimizing credit limit adjustments under adversarial goals using reinforcement learning," European Journal of Operational Research, Elsevier, vol. 315(2), pages 802-817.
Thomas B. Astebro & G. Chen, 2001. "The Economic Value of Reject Inference in Credit Scoring," Post-Print hal-00654597, HAL.
A.J. Feelders, 1999. "Credit scoring and reject inference with mixture models," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 8(4), pages 271-279, December.
J Banasik & J Crook, 2005. "Credit scoring, augmentation and lean models," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(9), pages 1072-1081, September.
Nikita Kozodoi & Panagiotis Katsas & Stefan Lessmann & Luis Moreira-Matias & Konstantinos Papakonstantinou, 2019. "Shallow Self-Learning for Reject Inference in Credit Scoring," Papers 1909.06108, arXiv.org.
Sherly Alfonso-S'anchez & Jes'us Solano & Alejandro Correa-Bahnsen & Kristina P. Sendova & Cristi'an Bravo, 2023. "Optimizing Credit Limit Adjustments Under Adversarial Goals Using Reinforcement Learning," Papers 2306.15585, arXiv.org, revised Feb 2024.
Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
- Heckman, James J, 1979. "Sample Selection Bias as a Specification Error," Econometrica, Econometric Society, vol. 47(1), pages 153-161, January.
Óskarsdóttir, María & Bravo, Cristián, 2021. "Multilayer network analysis for improved credit risk prediction," Omega, Elsevier, vol. 105(C).
Zandi, Sahab & Korangi, Kamesh & Óskarsdóttir, María & Mues, Christophe & Bravo, Cristián, 2025. "Attention-based dynamic multilayer graph neural networks for loan default prediction," European Journal of Operational Research, Elsevier, vol. 321(2), pages 586-599.
Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
Billie Anderson & Mark A. Newman & Philip A. Grim II & J. Michael Hardin, 2023. "A Monte Carlo simulation framework for reject inference," Journal of the Operational Research Society, Taylor & Francis Journals, vol. 74(4), pages 1133-1149, April.
Crook, Jonathan & Banasik, John, 2004. "Does reject inference really improve the performance of application scoring models?," Journal of Banking & Finance, Elsevier, vol. 28(4), pages 857-874, April.
Jiaxu Peng & Jungpil Hahn & Ke-Wei Huang, 2023. "Handling Missing Values in Information Systems Research: A Review of Methods and Assumptions," Information Systems Research, INFORMS, vol. 34(1), pages 5-26, March.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Rogelio A. Mancisidor & Michael Kampffmeyer & Kjersti Aas & Robert Jenssen, 2019. "Deep Generative Models for Reject Inference in Credit Scoring," Papers 1904.11376, arXiv.org, revised Sep 2021.
Ha-Thu Nguyen, 2016. "Reject inference in application scorecards: evidence from France," EconomiX Working Papers 2016-10, University of Paris Nanterre, EconomiX.
Ha Thu Nguyen, 2016. "Reject inference in application scorecards: evidence from France," Working Papers hal-04141601, HAL.
Tang, Haoxin & Liang, Decui, 2025. "Multi-view reject inference for semi-supervised credit scoring with consistency training and three-way decision," Omega, Elsevier, vol. 133(C).
Monir El Annas & Badreddine Benyacoub & Mohamed Ouzineb, 2023. "Semi-supervised adapted HMMs for P2P credit scoring systems with reject inference," Computational Statistics, Springer, vol. 38(1), pages 149-169, March.
Y Kim & S Y Sohn, 2007. "Technology scoring model considering rejected applicants and effect of reject inference," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 58(10), pages 1341-1347, October.
Banasik, John & Crook, Jonathan, 2007. "Reject inference, augmentation, and sample selection," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1582-1594, December.
Kozodoi, Nikita & Jacob, Johannes & Lessmann, Stefan, 2022. "Fairness in credit scoring: Assessment, implementation and profit implications," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1083-1094.
De Bock, Koen W. & Coussement, Kristof & Caigny, Arno De & Słowiński, Roman & Baesens, Bart & Boute, Robert N. & Choi, Tsan-Ming & Delen, Dursun & Kraus, Mathias & Lessmann, Stefan & Maldonado, Sebast, 2024. "Explainable AI for Operational Research: A defining framework, methods, applications, and a research agenda," European Journal of Operational Research, Elsevier, vol. 317(2), pages 249-272.
Koen W. de Bock & Kristof Coussement & Arno De Caigny & Roman Slowiński & Bart Baesens & Robert N Boute & Tsan-Ming Choi & Dursun Delen & Mathias Kraus & Stefan Lessmann & Sebastián Maldonado & David , 2023. "Explainable AI for Operational Research: A Defining Framework, Methods, Applications, and a Research Agenda," Post-Print hal-04219546, HAL.
Gero Szepannek, 2022. "An Overview on the Landscape of R Packages for Open Source Scorecard Modelling," Risks, MDPI, vol. 10(3), pages 1-33, March.
Mengnan Song & Jiasong Wang & Suisui Su, 2022. "Towards a Better Microcredit Decision," Papers 2209.07574, arXiv.org.
Calabrese, Raffaella & Osmetti, Silvia Angela & Zanin, Luca, 2024. "Sample selection bias in non-traditional lending: A copula-based approach for imbalanced data," Socio-Economic Planning Sciences, Elsevier, vol. 95(C).
Doumpos, Michalis & Zopounidis, Constantin & Gounopoulos, Dimitrios & Platanakis, Emmanouil & Zhang, Wenke, 2023. "Operational research and artificial intelligence methods in banking," European Journal of Operational Research, Elsevier, vol. 306(1), pages 1-16.
Michael Bucker & Gero Szepannek & Alicja Gosiewska & Przemyslaw Biecek, 2020. "Transparency, Auditability and eXplainability of Machine Learning Models in Credit Scoring," Papers 2009.13384, arXiv.org.
J Banasik & J Crook, 2010. "Reject inference in survival analysis by augmentation," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(3), pages 473-485, March.
Shi, Yong & Qu, Yi & Chen, Zhensong & Mi, Yunlong & Wang, Yunong, 2024. "Improved credit risk prediction based on an integrated graph representation learning approach with graph transformation," European Journal of Operational Research, Elsevier, vol. 315(2), pages 786-801.
Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
Luisa Roa & Alejandro Correa-Bahnsen & Gabriel Suarez & Fernando Cort'es-Tejada & Mar'ia A. Luque & Cristi'an Bravo, 2020. "Super-App Behavioral Patterns in Credit Risk Models: Financial, Statistical and Regulatory Implications," Papers 2005.14658, arXiv.org, revised Jan 2021.

More about this item

Keywords

; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:324:y:2025:i:2:p:616-628. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Fighting sampling bias: A framework for training and evaluating credit scoring models

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data