Evaluating LLMs in Finance Requires Explicit Bias Consideration

Evaluating LLMs in Finance Requires Explicit Bias Consideration

Author

Listed:

Yaxuan Kong
Hoyoung Lee
Yoontae Hwang
Alejandro Lopez-Lira
Bradford Levy
Dhagash Mehta
Qingsong Wen
Chanyeol Choi
Yongjae Lee
Stefan Zohren

Abstract

Large Language Models (LLMs) are increasingly integrated into financial workflows, but evaluation practice has not kept up. Finance-specific biases can inflate performance, contaminate backtests, and make reported results useless for any deployment claim. We identify five recurring biases in financial LLM applications. They include look-ahead bias, survivorship bias, narrative bias, objective bias, and cost bias. These biases break financial tasks in distinct ways and they often compound to create an illusion of validity. We reviewed 164 papers from 2023 to 2025 and found that no single bias is discussed in more than 28 percent of studies. This position paper argues that bias in financial LLM systems requires explicit attention and that structural validity should be enforced before any result is used to support a deployment claim. We propose a Structural Validity Framework and an evaluation checklist with minimal requirements for bias diagnosis and future system design. The material is available at https://github.com/Eleanorkong/Awesome-Financial-LLM-Bias-Mitigation.

Suggested Citation

Yaxuan Kong & Hoyoung Lee & Yoontae Hwang & Alejandro Lopez-Lira & Bradford Levy & Dhagash Mehta & Qingsong Wen & Chanyeol Choi & Yongjae Lee & Stefan Zohren, 2026. "Evaluating LLMs in Finance Requires Explicit Bias Consideration," Papers 2602.14233, arXiv.org.

Handle: RePEc:arx:papers:2602.14233

Download full text from publisher

References listed on IDEAS

Yaxuan Kong & Yoontae Hwang & Marcus Kaiser & Chris Vryonides & Roel Oomen & Stefan Zohren, 2025. "Fusing Narrative Semantics for Financial Volatility Forecasting," Papers 2510.20699, arXiv.org.
Youngbin Lee & Yejin Kim & Juhyeong Kim & Suin Kim & Yongjae Lee, 2025. "LLM-Enhanced Black-Litterman Portfolio Optimization," Papers 2504.14345, arXiv.org, revised Oct 2025.
Ali Kakhbod & Peiyao Li, 2025. "NoLBERT: A No Lookahead(back) Foundational Language Model," Papers 2509.01110, arXiv.org, revised Nov 2025.
Thomas Henning & Siddhartha M. Ojha & Ross Spoon & Jiatong Han & Colin F. Camerer, 2025. "LLM Agents Do Not Replicate Human Market Traders: Evidence From Experimental Finance," Papers 2502.15800, arXiv.org, revised Oct 2025.
Hoyoung Lee & Junhyuk Seo & Suhwan Park & Junhyeong Lee & Wonbin Ahn & Chanyeol Choi & Alejandro Lopez-Lira & Yongjae Lee, 2025. "Your AI, Not Your View: The Bias of LLMs in Investment Analysis," Papers 2507.20957, arXiv.org, revised Oct 2025.
Takehiro Takayanagi & Kiyoshi Izumi & Javier Sanz-Cruzado & Richard McCreadie & Iadh Ounis, 2025. "Are Generative AI Agents Effective Personalized Financial Advisors?," Papers 2504.05862, arXiv.org, revised Apr 2025.
Yu Shi & Zongliang Fu & Shuo Chen & Bohan Zhao & Wei Xu & Changshui Zhang & Jian Li, 2025. "Kronos: A Foundation Model for the Language of Financial Markets," Papers 2508.02739, arXiv.org.
Yoontae Hwang & Yaxuan Kong & Stefan Zohren & Yongjae Lee, 2025. "Decision-informed Neural Networks with Large Language Model Integration for Portfolio Optimization," Papers 2502.00828, arXiv.org.
Fabrizio Dimino & Krati Saxena & Bhaskarjit Sarmah & Stefano Pasquali, 2025. "Uncovering Representation Bias for Investment Decisions in Open-Source Large Language Models," Papers 2510.05702, arXiv.org, revised Nov 2025.
Zihan Dong & Xinyu Fan & Zhiyuan Peng, 2024. "FNSPID: A Comprehensive Financial News Dataset in Time Series," Papers 2402.06698, arXiv.org.
Hoyoung Lee & Wonbin Ahn & Suhwan Park & Jaehoon Lee & Minjae Kim & Sungdong Yoo & Taeyoon Lim & Woohyung Lim & Yongjae Lee, 2025. "THEME: Enhancing Thematic Investing with Semantic Stock Representations and Temporal Dynamics," Papers 2508.16936, arXiv.org, revised Aug 2025.
Zhenyu Gao & Wenxi Jiang & Yutong Yan, 2025. "A Test of Lookahead Bias in LLM Forecasts," Papers 2512.23847, arXiv.org.
Xinyu Wei & Luojia Liu, 2025. "Are Large Language Models Good In-context Learners for Financial Sentiment Analysis?," Papers 2503.04873, arXiv.org.
Fabrizio Dimino & Krati Saxena & Bhaskarjit Sarmah & Stefano Pasquali, 2025. "Tracing Positional Bias in Financial Decision-Making: Mechanistic Insights from Qwen2.5," Papers 2508.18427, arXiv.org, revised Oct 2025.
Tianjiao Zhao & Jingrao Lyu & Stokes Jones & Harrison Garber & Stefano Pasquali & Dhagash Mehta, 2025. "AlphaAgents: Large Language Model based Multi-Agents for Equity Portfolio Constructions," Papers 2508.11152, arXiv.org.
Brown, Stephen J, et al, 1992. "Survivorship Bias in Performance Studies," The Review of Financial Studies, Society for Financial Studies, vol. 5(4), pages 553-580.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Kunihiro Miyazaki & Takanobu Kawahara & Stephen Roberts & Stefan Zohren, 2026. "Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks," Papers 2602.23330, arXiv.org.
Fabrizio Dimino & Krati Saxena & Bhaskarjit Sarmah & Stefano Pasquali, 2025. "Uncovering Representation Bias for Investment Decisions in Open-Source Large Language Models," Papers 2510.05702, arXiv.org, revised Nov 2025.
Hoyoung Lee & Junhyuk Seo & Suhwan Park & Junhyeong Lee & Wonbin Ahn & Chanyeol Choi & Alejandro Lopez-Lira & Yongjae Lee, 2025. "Your AI, Not Your View: The Bias of LLMs in Investment Analysis," Papers 2507.20957, arXiv.org, revised Oct 2025.
Stephen Brown & William Goetzmann & Bing Liang & Christopher Schwarz, 2008. "Mandatory Disclosure and Operational Risk: Evidence from Hedge Fund Registration," Journal of Finance, American Finance Association, vol. 63(6), pages 2785-2815, December.
- Stephen Brown & William Goetzmann & Bing Liang & Christopher Schwarz, 2006. "Mandatory Disclosure and Operational Risk: Evidence from Hedge Fund Registration," Yale School of Management Working Papers amz2472, Yale School of Management, revised 11 Sep 2009.
Butt, Prof. Khursheed A & Pandow, Bilal Ahmad, 2013. "An analysis into the Stock Selectivity skill of Indian Fund Managers," MPRA Paper 83500, University Library of Munich, Germany, revised 2013.
A. Colin Cameron & Anthony D. Hall, 2003. "A Survival Analysis of Australian Equity Mutual Funds," Australian Journal of Management, Australian School of Business, vol. 28(2), pages 209-226, September.
- A. Colin Cameron & Anthony D. Hall, 2003. "A Survival Analysis of Australian Equity Mutual Funds," Research Paper Series 94, Quantitative Finance Research Centre, University of Technology, Sydney.
Abraham Itzhak Weinberg, 2025. "Hybrid Quantum-Classical Ensemble Learning for S\&P 500 Directional Prediction," Papers 2512.15738, arXiv.org.
Terrence Hallahan & Robert Faff, 2001. "Induced persistence or reversals in fund performance?: the effect of survivorship bias," Applied Financial Economics, Taylor & Francis Journals, vol. 11(2), pages 119-126.
Schneider, Constantin J. & Yilmaz, Yahya, 2025. "Stock portfolio selection based on risk appetite: Evidence from ChatGPT," Finance Research Letters, Elsevier, vol. 82(C).
Ian Tonks, 2005. "Performance Persistence of Pension-Fund Managers," The Journal of Business, University of Chicago Press, vol. 78(5), pages 1917-1942, September.
- Tonks, Ian, 2002. "Performance Persistence of Pension Fund Managers," Royal Economic Society Annual Conference 2002 175, Royal Economic Society.
- Tonks, Ian, 2002. "Performance persistence of pension fund managers," LSE Research Online Documents on Economics 24942, London School of Economics and Political Science, LSE Library.
Chen, Zhimin & Ibragimov, Rustam, 2019. "One country, two systems? The heavy-tailedness of Chinese A- and H- share markets," Emerging Markets Review, Elsevier, vol. 38(C), pages 115-141.
Bali, Turan G. & Brown, Stephen J. & Caglayan, Mustafa O., 2019. "Upside potential of hedge funds as a predictor of future performance," Journal of Banking & Finance, Elsevier, vol. 98(C), pages 212-229.
Michael K. Berkowitz & Yehuda Kotowitz, 1997. "Management Compensation and the Performance of Mutual Funds," Working Papers berk-97-01, University of Toronto, Department of Economics.
repec:osf:socarx:a9436_v1 is not listed on IDEAS
Ana C. DÃazâ€ Mendoza & GermÃ¡n LÃ³pezâ€ Espinosa & Miguel A. MartÃnez, 2014. "The Efficiency of Performanceâ€ Based Fee Funds," European Financial Management, European Financial Management Association, vol. 20(4), pages 825-855, September.
Sean Cao & Wei Jiang & Hui Xu, 2026. "Seeing the Goal, Missing the Truth: Human Accountability for AI Bias," Papers 2602.09504, arXiv.org, revised May 2026.
Diantimala, Yossi & Wijayana, Singgih, 2024. "Compliance and familiarity with fixed assets' disclosure requirements and firm value," Emerging Markets Review, Elsevier, vol. 62(C).
Getmansky, Mila & Lo, Andrew W. & Makarov, Igor, 2004. "An econometric model of serial correlation and illiquidity in hedge fund returns," Journal of Financial Economics, Elsevier, vol. 74(3), pages 529-609, December.
- Mila Getmansky & Andrew W. Lo & Igor Makarov, 2003. "An Econometric Model of Serial Correlation and Illiquidity in Hedge Fund Returns," NBER Working Papers 9571, National Bureau of Economic Research, Inc.
- Getmansky, Mila & Lo, Andrew & Makarov, Igor, 2003. "An Econometric Model of Serial Correlation and Illiquidity In Hedge Fund Returns," Working papers 4288-03, Massachusetts Institute of Technology (MIT), Sloan School of Management.
Tim Heubeck & Annina Ahrens, 2025. "Governing the Responsible Investment of Slack Resources in Environmental, Social, and Governance (ESG) Performance: How Beneficial are CSR Committees?," Journal of Business Ethics, Springer, vol. 198(2), pages 365-385, May.
Du, Ding & Huang, Zhaodan & Blanchfield, Peter J., 2009. "Do fixed income mutual fund managers have managerial skills?," The Quarterly Review of Economics and Finance, Elsevier, vol. 49(2), pages 378-397, May.
Beattie, Vivien & Goodacre, Alan & Thomson, Sarah, 2000. "Operating leases and the assessment of lease-debt substitutability," Journal of Banking & Finance, Elsevier, vol. 24(3), pages 427-470, March.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2026-03-02 (Artificial Intelligence)
NEP-CMP-2026-03-02 (Computational Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2602.14233. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Evaluating LLMs in Finance Requires Explicit Bias Consideration

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data