Credit scoring model for fintech lending: An integration of large language models and FocalPoly loss

Credit scoring model for fintech lending: An integration of large language models and FocalPoly loss

Author

Listed:

Xia, Yufei
Han, Zhiyin
Li, Yawen
He, Lingyun

Abstract

Fintech lending experiences high credit risk and needs an efficient credit scoring model, but it also faces limited data sources and severe class imbalance. We develop a novel two-stage credit scoring model (called LLM-FP-CatBoost) by solving the two issues simultaneously. Large language models (LLMs) initially extract narrative data as a supplementary credit dataset. A new FocalPoly loss is then incorporated with CatBoost to handle the class imbalance problem. Extensive comparisons demonstrate that the proposed LLM-FP-CatBoost significantly outperforms the benchmarks in most circumstances. When making pairwise comparisons between LLMs on the fintech lending dataset, we found that the Chinese-specific LLM, i.e., ERNIE 4.0, achieves the best overall performance, followed by GPT-4 and BERT-based models. The performance decomposition reveals that the superiority is mainly attributed to the new data source extracted by the LLMs. The SHAP algorithm further ensures the interpretability of LLM-FP-CatBoost. The superiority of the proposed LLM-FP-CatBoost model remains robust to hyperparameters of the loss function, specific LLMs, and other extraction methods of narrative data. Finally, we discuss some managerial implications concerning credit scoring in fintech lending.

Suggested Citation

Xia, Yufei & Han, Zhiyin & Li, Yawen & He, Lingyun, 2025. "Credit scoring model for fintech lending: An integration of large language models and FocalPoly loss," International Journal of Forecasting, Elsevier, vol. 41(3), pages 894-919.

Handle: RePEc:eee:intfor:v:41:y:2025:i:3:p:894-919
DOI: 10.1016/j.ijforecast.2024.07.005

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Wiginton, John C., 1980. "A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 15(3), pages 757-770, September.
Chao Wang & Junbo Wang & Chunchi Wu & Yue Zhang, 2023. "Voluntary disclosure in P2P lending: Information or hyperbole?," Post-Print hal-04232648, HAL.
Andreas Fuster & Matthew Plosser & Philipp Schnabl & James Vickery, 2019. "The Role of Technology in Mortgage Lending," The Review of Financial Studies, Society for Financial Studies, vol. 32(5), pages 1854-1899.
- Schnabl, Philipp & Vickery, James & Plosser, Matthew, 2018. "The Role of Technology in Mortgage Lending," CEPR Discussion Papers 12961, C.E.P.R. Discussion Papers.
- Andreas Fuster & Matthew Plosser & Philipp Schnabl & James Vickery, 2018. "The role of technology in mortgage lending," Staff Reports 836, Federal Reserve Bank of New York.
- Andreas Fuster & Matthew Plosser & Philipp Schnabl & James Vickery, 2018. "The Role of Technology in Mortgage Lending," NBER Working Papers 24500, National Bureau of Economic Research, Inc.
Xiaotao Zhang & Yuelei Li & Wei Zhang & Jieli Xing, 2023. "The role of narrative style in a peer-to-peer lending market: an empirical investigation," Asia-Pacific Journal of Accounting & Economics, Taylor & Francis Journals, vol. 30(1), pages 156-171, January.
Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
Chen, Xiao & Huang, Bihong & Ye, Dezhu, 2018. "The role of punctuation in P2P lending: Evidence from China," Economic Modelling, Elsevier, vol. 68(C), pages 634-643.
Trivedi, Shrawan Kumar, 2020. "A study on credit scoring modeling with different feature selection and machine learning approaches," Technology in Society, Elsevier, vol. 63(C).
Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
Pengnate, Supavich (Fone) & Riggins, Frederick J., 2020. "The role of emotion in P2P microfinance funding: A sentiment analysis approach," International Journal of Information Management, Elsevier, vol. 54(C).
Ronald L. Wasserstein & Nicole A. Lazar, 2016. "The ASA's Statement on p -Values: Context, Process, and Purpose," The American Statistician, Taylor & Francis Journals, vol. 70(2), pages 129-133, May.
Marco Di Maggio & Vincent Yao, 2021. "Fintech Borrowers: Lax Screening or Cream-Skimming?," The Review of Financial Studies, Society for Financial Studies, vol. 34(10), pages 4565-4618.
- Marco Di Maggio & Vincent Yao, 2020. "Fintech Borrowers: Lax-Screening or Cream-Skimming?," NBER Working Papers 28021, National Bureau of Economic Research, Inc.
Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
Nikita Kozodoi & Johannes Jacob & Stefan Lessmann, 2021. "Fairness in Credit Scoring: Assessment, Implementation and Profit Implications," Papers 2103.01907, arXiv.org, revised Jun 2022.
Yufei Xia & Xinyi Guo & Yinguo Li & Lingyun He & Xueyuan Chen, 2022. "Deep learning meets decision trees: An application of a heterogeneous deep forest approach in credit scoring for online consumer lending," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(8), pages 1669-1690, December.
Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
Giulio Cornelli & Jon Frost & Leonardo Gambacorta & Raghavendra Rau & Robert Wardrop & Tania Ziegler, 2020. "Fintech and big tech credit: a new database," BIS Working Papers 887, Bank for International Settlements.
- Gambacorta, Leonardo & Cornelli, Giulio & Frost, Jon & Rau, Raghavendra & Wardrop, Robert & Ziegler, Tania, 2020. "Fintech and big tech credit: a new database," CEPR Discussion Papers 15357, C.E.P.R. Discussion Papers.
Yufei Xia & Lingyun He & Yinguo Li & Nana Liu & Yanlin Ding, 2020. "Predicting loan default in peer‐to‐peer lending using narrative data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(2), pages 260-280, March.
Dorfleitner, Gregor & Priberny, Christopher & Schuster, Stephanie & Stoiber, Johannes & Weber, Martina & de Castro, Ivan & Kammler, Julia, 2016. "Description-text related soft information in peer-to-peer lending – Evidence from two leading European platforms," Journal of Banking & Finance, Elsevier, vol. 64(C), pages 169-187.
Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
Wang, Chao & Wang, Junbo & Wu, Chunchi & Zhang, Yue, 2023. "Voluntary disclosure in P2P lending: Information or hyperbole?," Pacific-Basin Finance Journal, Elsevier, vol. 79(C).
Chen, Yujia & Calabrese, Raffaella & Martin-Barragan, Belen, 2024. "Interpretable machine learning for imbalanced credit scoring datasets," European Journal of Operational Research, Elsevier, vol. 312(1), pages 357-372.
Liran Einav & Mark Jenkins & Jonathan Levin, 2013. "The impact of credit scoring on consumer lending," RAND Journal of Economics, RAND Corporation, vol. 44(2), pages 249-274, June.
Kozodoi, Nikita & Jacob, Johannes & Lessmann, Stefan, 2022. "Fairness in credit scoring: Assessment, implementation and profit implications," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1083-1094.
Michael Spence, 1973. "Job Market Signaling," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 87(3), pages 355-374.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
Liu, Yi & Yang, Menglong & Wang, Yudong & Li, Yongshan & Xiong, Tiancheng & Li, Anzhe, 2022. "Applying machine learning algorithms to predict default probability in the online credit market: Evidence from China," International Review of Financial Analysis, Elsevier, vol. 79(C).
Dimitrios Nikolaidis & Michalis Doumpos, 2022. "Credit Scoring with Drift Adaptation Using Local Regions of Competence," SN Operations Research Forum, Springer, vol. 3(4), pages 1-28, December.
Tigges, Maximilian & Mestwerdt, Sönke & Tschirner, Sebastian & Mauer, René, 2024. "Who gets the money? A qualitative analysis of fintech lending and credit scoring through the adoption of AI and alternative data," Technological Forecasting and Social Change, Elsevier, vol. 205(C).
Topuz, Kazim & Urban, Timothy L. & Yildirim, Mehmet B., 2024. "A Markovian score model for evaluating provider performance for continuity of care—An explainable analytics approach," European Journal of Operational Research, Elsevier, vol. 317(2), pages 341-351.
Baesens, Bart & Smedts, Kristien, 2025. "Boosting credit risk models," The British Accounting Review, Elsevier, vol. 57(4).
Tobias Berg & Andreas Fuster & Manju Puri, 2022. "FinTech Lending," Annual Review of Financial Economics, Annual Reviews, vol. 14(1), pages 187-207, November.
- Tobias Berg & Andreas Fuster & Manju Puri, 2021. "FinTech Lending," Swiss Finance Institute Research Paper Series 21-72, Swiss Finance Institute.
- Berg, Tobias & Puri, Manju, 2021. "FinTech Lending," CEPR Discussion Papers 16668, C.E.P.R. Discussion Papers.
- Tobias Berg & Andreas Fuster & Manju Puri, 2021. "FinTech Lending," NBER Working Papers 29421, National Bureau of Economic Research, Inc.
Zha, Yong & Wang, Yuting & Li, Quan & Yao, Wenying, 2022. "Credit offering strategy and dynamic pricing in the presence of consumer strategic behavior," European Journal of Operational Research, Elsevier, vol. 303(2), pages 753-766.
Sullivan Hué, 2022. "GAM(L)A: An econometric model for interpretable machine learning," French Stata Users' Group Meetings 2022 19, Stata Users Group.
De Bock, Koen W. & Coussement, Kristof & Caigny, Arno De & Słowiński, Roman & Baesens, Bart & Boute, Robert N. & Choi, Tsan-Ming & Delen, Dursun & Kraus, Mathias & Lessmann, Stefan & Maldonado, Sebast, 2024. "Explainable AI for Operational Research: A defining framework, methods, applications, and a research agenda," European Journal of Operational Research, Elsevier, vol. 317(2), pages 249-272.
Zou, Yao & Xia, Meng & Lan, Xingyu, 2025. "Interpretable credit scoring based on an additive extreme gradient boosting," Chaos, Solitons & Fractals, Elsevier, vol. 194(C).
Koen W. de Bock & Kristof Coussement & Arno De Caigny & Roman Slowiński & Bart Baesens & Robert N Boute & Tsan-Ming Choi & Dursun Delen & Mathias Kraus & Stefan Lessmann & Sebastián Maldonado & David , 2023. "Explainable AI for Operational Research: A Defining Framework, Methods, Applications, and a Research Agenda," Post-Print hal-04219546, HAL.
Rasa Kanapickiene & Renatas Spicas, 2019. "Credit Risk Assessment Model for Small and Micro-Enterprises: The Case of Lithuania," Risks, MDPI, vol. 7(2), pages 1-23, June.
Peter Eccles & Paul Grout & Paolo Siciliani & Anna Zalewska, 2021. "The impact of machine learning and big data on credit markets," Bank of England working papers 930, Bank of England.
Tu, Jiancheng & Wu, Zhibin, 2025. "Inherently interpretable machine learning for credit scoring: Optimal classification tree with hyperplane splits," European Journal of Operational Research, Elsevier, vol. 322(2), pages 647-664.
Emmanuel Flachaire & Gilles Hacheme & Sullivan Hu'e & S'ebastien Laurent, 2022. "GAM(L)A: An econometric model for interpretable Machine Learning," Papers 2203.11691, arXiv.org.
Wang, Zhongyi & Tian, Yuhang & Li, Sihan & Xiao, Jin, 2025. "A secure cross-silo collaborative method for imbalanced credit scoring," European Journal of Operational Research, Elsevier, vol. 326(2), pages 357-373.
Sigrist, Fabio & Leuenberger, Nicola, 2023. "Machine learning for corporate default risk: Multi-period prediction, frailty correlation, loan portfolios, and tail probabilities," European Journal of Operational Research, Elsevier, vol. 305(3), pages 1390-1406.
Wu, Manhua & Tian, Xiujuan & Ma, Lin & Peng, Nianjiao, 2024. "Role transition, investment practice and risk Management of Peer-to-peer Lending Investors: Based on the perspective of investor learning," Pacific-Basin Finance Journal, Elsevier, vol. 88(C).
Shi, Yong & Qu, Yi & Chen, Zhensong & Mi, Yunlong & Wang, Yunong, 2024. "Improved credit risk prediction based on an integrated graph representation learning approach with graph transformation," European Journal of Operational Research, Elsevier, vol. 315(2), pages 786-801.

More about this item

Keywords

; ; ; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:intfor:v:41:y:2025:i:3:p:894-919. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/ijforecast .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Credit scoring model for fintech lending: An integration of large language models and FocalPoly loss

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data