IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2505.07078.html
   My bibliography  Save this paper

Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?

Author

Listed:
  • Weixian Waylon Li
  • Hyeonjun Kim
  • Mihai Cucuringu
  • Tiejun Ma

Abstract

Large Language Models (LLMs) have recently been leveraged for asset pricing tasks and stock trading applications, enabling AI agents to generate investment decisions from unstructured financial data. However, most evaluations of LLM timing-based investing strategies are conducted on narrow timeframes and limited stock universes, overstating effectiveness due to survivorship and data-snooping biases. We critically assess their generalizability and robustness by proposing FINSABER, a backtesting framework evaluating timing-based strategies across longer periods and a larger universe of symbols. Systematic backtests over two decades and 100+ symbols reveal that previously reported LLM advantages deteriorate significantly under broader cross-section and over a longer-term evaluation. Our market regime analysis further demonstrates that LLM strategies are overly conservative in bull markets, underperforming passive benchmarks, and overly aggressive in bear markets, incurring heavy losses. These findings highlight the need to develop LLM strategies that are able to prioritise trend detection and regime-aware risk controls over mere scaling of framework complexity.

Suggested Citation

  • Weixian Waylon Li & Hyeonjun Kim & Mihai Cucuringu & Tiejun Ma, 2025. "Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?," Papers 2505.07078, arXiv.org, revised May 2025.
  • Handle: RePEc:arx:papers:2505.07078
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2505.07078
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Fuli Feng & Xiangnan He & Xiang Wang & Cheng Luo & Yiqun Liu & Tat-Seng Chua, 2018. "Temporal Relational Ranking for Stock Prediction," Papers 1809.09441, arXiv.org, revised Jan 2019.
    2. Ryan Sullivan & Allan Timmermann & Halbert White, 1999. "Data‐Snooping, Technical Trading Rule Performance, and the Bootstrap," Journal of Finance, American Finance Association, vol. 54(5), pages 1647-1691, October.
    3. Hongyang Yang & Boyu Zhang & Neng Wang & Cheng Guo & Xiaoli Zhang & Likun Lin & Junlin Wang & Tianyu Zhou & Mao Guan & Runjia Zhang & Christina Dan Wang, 2024. "FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models," Papers 2405.14767, arXiv.org, revised May 2024.
    4. Evan Gatev & William N. Goetzmann & K. Geert Rouwenhorst, 2006. "Pairs Trading: Performance of a Relative-Value Arbitrage Rule," The Review of Financial Studies, Society for Financial Studies, vol. 19(3), pages 797-827.
    5. Yinheng Li & Shaofei Wang & Han Ding & Hang Chen, 2023. "Large Language Models in Finance: A Survey," Papers 2311.10723, arXiv.org, revised Jul 2024.
    6. Han Ding & Yinheng Li & Junhao Wang & Hang Chen, 2024. "Large Language Model Agent in Financial Trading: A Survey," Papers 2408.06361, arXiv.org.
    7. Grinblatt, Mark & Titman, Sheridan D, 1989. "Mutual Fund Performance: An Analysis of Quarterly Portfolio Holdings," The Journal of Business, University of Chicago Press, vol. 62(3), pages 393-416, July.
    8. Elton, Edwin J & Gruber, Martin J & Blake, Christopher R, 1996. "Survivorship Bias and Mutual Fund Performance," The Review of Financial Studies, Society for Financial Studies, vol. 9(4), pages 1097-1120.
    9. Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," Finance Research Letters, Elsevier, vol. 62(PB).
    10. Saizhuo Wang & Hao Kong & Jiadong Guo & Fengrui Hua & Yiyan Qi & Wanyun Zhou & Jiahao Zheng & Xinyu Wang & Lionel M. Ni & Jian Guo, 2025. "QuantBench: Benchmarking AI Methods for Quantitative Investment," Papers 2504.18600, arXiv.org.
    11. R. Cont, 2001. "Empirical properties of asset returns: stylized facts and statistical issues," Quantitative Finance, Taylor & Francis Journals, vol. 1(2), pages 223-236.
    12. Kim, Jae H. & Shamsuddin, Abul & Lim, Kian-Ping, 2011. "Stock return predictability and the adaptive markets hypothesis: Evidence from century-long U.S. data," Journal of Empirical Finance, Elsevier, vol. 18(5), pages 868-879.
    13. Georgios Fatouros & Konstantinos Metaxas & John Soldatos & Dimosthenis Kyriazis, 2024. "Can Large Language Models Beat Wall Street? Unveiling the Potential of AI in Stock Selection," Papers 2401.03737, arXiv.org, revised Apr 2024.
    14. Haohan Zhang & Fengrui Hua & Chengjin Xu & Hao Kong & Ruiting Zuo & Jian Guo, 2023. "Unveiling the Potential of Sentiment: Can Large Language Models Predict Chinese Stock Price Movements?," Papers 2306.14222, arXiv.org, revised May 2024.
    15. Brown, Stephen J, et al, 1992. "Survivorship Bias in Performance Studies," The Review of Financial Studies, Society for Financial Studies, vol. 5(4), pages 553-580.
    16. Fama, Eugene F, 1970. "Efficient Capital Markets: A Review of Theory and Empirical Work," Journal of Finance, American Finance Association, vol. 25(2), pages 383-417, May.
    17. Binh Do & Robert Faff, 2010. "Does Simple Pairs Trading Still Work?," Financial Analysts Journal, Taylor & Francis Journals, vol. 66(4), pages 83-95, July.
    18. Huan-Yi Su & Ke Wu & Yu-Hao Huang & Wu-Jun Li, 2024. "NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance," Papers 2405.00566, arXiv.org.
    19. Yujie Ding & Shuai Jia & Tianyi Ma & Bingcheng Mao & Xiuze Zhou & Liuliu Li & Dongming Han, 2023. "Integrating Stock Features and Global Information via Large Language Models for Enhanced Stock Return Prediction," Papers 2310.05627, arXiv.org.
    20. Yangyang Yu & Haohang Li & Zhi Chen & Yuechen Jiang & Yang Li & Denghui Zhang & Rong Liu & Jordan W. Suchow & Khaldoun Khashanah, 2023. "FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design," Papers 2311.13743, arXiv.org, revised Dec 2023.
    21. Jean Lee & Nicholas Stevens & Soyeon Caren Han & Minseok Song, 2024. "A Survey of Large Language Models in Finance (FinLLMs)," Papers 2402.02315, arXiv.org.
    22. Blitz, D.C. & van Vliet, P., 2007. "The Volatility Effect: Lower Risk without Lower Return," ERIM Report Series Research in Management ERS-2007-044-F&A, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Han Ding & Yinheng Li & Junhao Wang & Hang Chen, 2024. "Large Language Model Agent in Financial Trading: A Survey," Papers 2408.06361, arXiv.org.
    2. Paulo Armada Leite & Maria Ceu Cortez, 2009. "Conditioning information in mutual fund performance evaluation: Portuguese evidence," The European Journal of Finance, Taylor & Francis Journals, vol. 15(5-6), pages 585-605.
    3. Keith Cuthbertson & Dirk Nitzsche & Niall O'Sullivan, 2010. "Mutual Fund Performance: Measurement and Evidence," Financial Markets, Institutions & Instruments, John Wiley & Sons, vol. 19(2), pages 95-187, May.
    4. Ferson, Wayne E., 2013. "Investment Performance: A Review and Synthesis," Handbook of the Economics of Finance, in: G.M. Constantinides & M. Harris & R. M. Stulz (ed.), Handbook of the Economics of Finance, volume 2, chapter 0, pages 969-1010, Elsevier.
    5. Chen, Zhimin & Ibragimov, Rustam, 2019. "One country, two systems? The heavy-tailedness of Chinese A- and H- share markets," Emerging Markets Review, Elsevier, vol. 38(C), pages 115-141.
    6. Bariviera, Aurelio F. & Font-Ferrer, Alejandro & Sorrosal-Forradellas, M. Teresa & Rosso, Osvaldo A., 2019. "An information theory perspective on the informational efficiency of gold price," The North American Journal of Economics and Finance, Elsevier, vol. 50(C).
    7. Jeremy Eng-Tuck Cheah & Thong Dao & Haozhe Su, 2024. "Measuring cryptocurrency moment convergence using distance analysis," Annals of Operations Research, Springer, vol. 332(1), pages 533-577, January.
    8. Semei Coronado-Ram'irez & Pedro Celso-Arellano & Omar Rojas, 2014. "Adaptive Market Efficiency of Agricultural Commodity Futures Contracts," Papers 1412.8017, arXiv.org, revised Mar 2015.
    9. Loriana Pelizzon & Roberto Casarin & Andrea Piva, 2008. "Italian Equity Funds: Efficiency and Performance Persistence," Working Papers 2008_12, Department of Economics, University of Venice "Ca' Foscari".
    10. Bartram, Söhnke M. & Grinblatt, Mark, 2018. "Agnostic fundamental analysis works," Journal of Financial Economics, Elsevier, vol. 128(1), pages 125-147.
    11. Ioana-Andreea Boboc & Mihai-Cristian Dinică, 2013. "An Algorithm for Testing the Efficient Market Hypothesis," PLOS ONE, Public Library of Science, vol. 8(10), pages 1-11, October.
    12. Fabrice Hervé, 2003. "La persistance de la performance des fonds de pension individuels britanniques:une étude empirique sur des fonds investis en actions et des fonds obligataires," Revue Finance Contrôle Stratégie, revues.org, vol. 6(3), pages 41-77, September.
    13. Liping Wang & Jiawei Li & Lifan Zhao & Zhizhuo Kou & Xiaohan Wang & Xinyi Zhu & Hao Wang & Yanyan Shen & Lei Chen, 2023. "Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey," Papers 2308.04947, arXiv.org.
    14. Degenhardt, Thomas & Auer, Benjamin R., 2018. "The “Sell in May” effect: A review and new empirical evidence," The North American Journal of Economics and Finance, Elsevier, vol. 43(C), pages 169-205.
    15. Elton, Edwin J. & Gruber, Martin J., 2013. "Mutual Funds," Handbook of the Economics of Finance, in: G.M. Constantinides & M. Harris & R. M. Stulz (ed.), Handbook of the Economics of Finance, volume 2, chapter 0, pages 1011-1061, Elsevier.
    16. Mar Grande & Florentino Borondo & Juan Carlos Losada & Javier Borondo, 2024. "Anti-Persistent Values of the Hurst Exponent Anticipate Mean Reversion in Pairs Trading: The Cryptocurrencies Market as a Case Study," Mathematics, MDPI, vol. 12(18), pages 1-14, September.
    17. Gilles Daniel & Didier Sornette & Peter Wohrmann, 2008. "Look-Ahead Benchmark Bias in Portfolio Performance Evaluation," Papers 0810.1922, arXiv.org.
    18. Bertin, William J. & Prather, Laurie, 2009. "Management structure and the performance of funds of mutual funds," Journal of Business Research, Elsevier, vol. 62(12), pages 1364-1369, December.
    19. Keith Cuthbertson & Dirk Nitzsche & Niall O' Sullivan, 2004. "UK Mutual Fund Performance: Genuine Stock-Picking Ability or Luck," Money Macro and Finance (MMF) Research Group Conference 2004 55, Money Macro and Finance Research Group.
    20. Carmen-Pilar Mart¨ª-Ballester, 2012. "A Comparative Analysis of the Performance of Collective Investment Institutions," Review of Economics & Finance, Better Advances Press, Canada, vol. 2, pages 43-52, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2505.07078. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.