IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2605.28359.html

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

Author

Listed:
  • Taojie Zhu
  • Wentao Zhao
  • Rui Sun
  • Beidi Luan
  • Jiacheng Lu
  • Sinuo Wang
  • Jing Li
  • Daxin Jiang
  • Yonghong He
  • Zuo Bai

Abstract

Evaluating whether large language model (LLM) agents can profit in capital markets is increasingly framed as end-to-end trading: place an agent in a historical market, let it trade, and measure portfolio returns. This setup is vulnerable to two evaluation failures. First, long backtests often overlap with the knowledge cutoffs of frontier LLMs, allowing memorized tickers, dates, prices, and market narratives to substitute for investment reasoning. Second, raw returns are a noisy proxy for stock-selection ability, since positive performance may come from market beta, style exposure, or favorable regimes rather than genuine alpha. We introduce KTD-Fin (Knowing-To-Doing Financial Benchmark), an end-to-end stock-market trading benchmark that addresses both issues. KTD-Fin uses a data-side masking protocol to anonymize key identifiers and calendar information consistently across prompts and tools, separating historical market memory from investment decision-making. It also incorporates a Barra-style performance attribution framework that decomposes portfolio returns into market, style, and stock-selection alpha components. Across ten frontier LLM agents evaluated on the Chinese CSI300 over a 2024--2026 window, masking substantially changes agent rationales, pushing them towards anonymized factor-based reasoning. Attribution analysis further shows that LLM agents' cumulative returns under leakage-controlled evaluation are largely explained by passive market and style exposure, with limited evidence of persistent stock-selection alpha. These findings suggest that financial LLM benchmarks should evaluate not only whether an agent makes money, but also whether the source of returns reflects transferable investment skill. We release KTD-Fin as a reproducible template for leakage-controlled and attribution-aware evaluation of LLM trading agents.

Suggested Citation

  • Taojie Zhu & Wentao Zhao & Rui Sun & Beidi Luan & Jiacheng Lu & Sinuo Wang & Jing Li & Daxin Jiang & Yonghong He & Zuo Bai, 2026. "From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets," Papers 2605.28359, arXiv.org.
  • Handle: RePEc:arx:papers:2605.28359
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2605.28359
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Wentao Zhang & Lingxuan Zhao & Haochong Xia & Shuo Sun & Jiaze Sun & Molei Qin & Xinyi Li & Yuqing Zhao & Yilei Zhao & Xinyu Cai & Longtao Zheng & Xinrun Wang & Bo An, 2024. "A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist," Papers 2402.18485, arXiv.org, revised Jun 2024.
    2. Haohang Li & Yupeng Cao & Yangyang Yu & Shashidhar Reddy Javaji & Zhiyang Deng & Yueru He & Yuechen Jiang & Zining Zhu & Koduvayur Subbalakshmi & Guojun Xiong & Jimin Huang & Lingfei Qian & Xueqing Pe, 2024. "INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent," Papers 2412.18174, arXiv.org.
    3. Yang Li & Yangyang Yu & Haohang Li & Zhi Chen & Khaldoun Khashanah, 2023. "TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance," Papers 2309.03736, arXiv.org.
    4. Hongyang Yang & Boyu Zhang & Neng Wang & Cheng Guo & Xiaoli Zhang & Likun Lin & Junlin Wang & Tianyu Zhou & Mao Guan & Runjia Zhang & Christina Dan Wang, 2024. "FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models," Papers 2405.14767, arXiv.org, revised May 2024.
    5. Yangyang Yu & Haohang Li & Zhi Chen & Yuechen Jiang & Yang Li & Denghui Zhang & Rong Liu & Jordan W. Suchow & Khaldoun Khashanah, 2023. "FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design," Papers 2311.13743, arXiv.org, revised Dec 2023.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mostapha Benhenda, 2026. "Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance," Papers 2601.13770, arXiv.org.
    2. Yijia Xiao & Edward Sun & Tong Chen & Fang Wu & Di Luo & Wei Wang, 2025. "Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning," Papers 2509.11420, arXiv.org.
    3. Patrick Cheridito & Jean-Loup Dupret & Zhexin Wu, 2025. "ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book," Papers 2511.02016, arXiv.org.
    4. Zuoyou Jiang & Li Zhao & Rui Sun & Ruohan Sun & Zhongjian Li & Jing Li & Daxin Jiang & Zuo Bai & Cheng Hua, 2025. "Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning," Papers 2512.23515, arXiv.org.
    5. Weixian Waylon Li & Hyeonjun Kim & Mihai Cucuringu & Tiejun Ma, 2025. "Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?," Papers 2505.07078, arXiv.org, revised Feb 2026.
    6. Han Ding & Yinheng Li & Junhao Wang & Hang Chen & Doudou Guo & Yunbai Zhang, 2024. "Large Language Model Agent in Financial Trading: A Survey," Papers 2408.06361, arXiv.org, revised Mar 2026.
    7. Zefeng Chen & Darcy Pu, 2026. "Autonomous Market Intelligence: Agentic AI Nowcasting Predicts Stock Returns," Papers 2601.11958, arXiv.org.
    8. Kunihiro Miyazaki & Takanobu Kawahara & Stephen Roberts & Stefan Zohren, 2026. "Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks," Papers 2602.23330, arXiv.org.
    9. Mohammed-Khalil Ghali & Cecil Pang & Oscar Molina & Carlos Gershenson-Garcia & Daehan Won, 2025. "Forecasting Commodity Price Shocks Using Temporal and Semantic Fusion of Prices Signals and Agentic Generative AI Extracted Economic News," Papers 2508.06497, arXiv.org.
    10. Maher Hamid, 2026. "Implementing domain-specific LLMs for strategic investment decisions: a retrospective case study comparing AI and human expertise," Digital Finance, Springer, vol. 8(1), pages 1-134, March.
    11. Haofei Yu & Fenghai Li & Jiaxuan You, 2025. "LiveTradeBench: Seeking Real-World Alpha with Large Language Models," Papers 2511.03628, arXiv.org.
    12. Zheng Li, 2026. "Design and Empirical Study of a Large Language Model-Based Multi-Agent Investment System for Chinese Public REITs," Papers 2602.00082, arXiv.org.
    13. Zichen Chen & Jiaao Chen & Jianda Chen & Misha Sra, 2025. "Standard Benchmarks Fail -- Auditing LLM Agents in Finance Must Prioritize Risk," Papers 2502.15865, arXiv.org, revised Jun 2025.
    14. Irene Aldridge & Jolie An & Riley Burke & Michael Cao & Chia-Yi Chien & Kexin Deng & Ruipeng Deng & Yichen Gao & Olivia Guo & Shunran He & Zheng Li & George Lin & Weihang Lin & Percy Lyu & Alex Ng & Q, 2026. "Agentic Artificial Intelligence in Finance: A Comprehensive Survey," Papers 2604.21672, arXiv.org.
    15. Kassiani Papasotiriou & Srijan Sood & Shayleen Reynolds & Tucker Balch, 2024. "AI in Investment Analysis: LLMs for Equity Stock Ratings," Papers 2411.00856, arXiv.org.
    16. Tao Ren & Ruihan Zhou & Jinyang Jiang & Jiafeng Liang & Qinghao Wang & Yijie Peng, 2024. "RiskMiner: Discovering Formulaic Alphas via Risk Seeking Monte Carlo Tree Search," Papers 2402.07080, arXiv.org, revised Feb 2024.
    17. Wentao Zhang & Lingxuan Zhao & Haochong Xia & Shuo Sun & Jiaze Sun & Molei Qin & Xinyi Li & Yuqing Zhao & Yilei Zhao & Xinyu Cai & Longtao Zheng & Xinrun Wang & Bo An, 2024. "A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist," Papers 2402.18485, arXiv.org, revised Jun 2024.
    18. Aadi Singhi, 2025. "An Adaptive Multi Agent Bitcoin Trading System," Papers 2510.08068, arXiv.org, revised Nov 2025.
    19. Kausar, Shafiya, 2026. "When LLM Signals Hurt: A Coverage-Density Analysis of LLM-Augmented Reinforcement Learning for Stock Trading," SocArXiv nxvdp_v1, Center for Open Science.
    20. Jun Han & Shuo Zhang & Wei Li & Yifan Dong & Tu Hu & Yumo Zhu & Xiaomin Yu & Xin Guo & Zhaowei Liu & Kunyi Wang & Jingping Liu & Tianyi Jiang & Ruichuan An & Sen Hu & Zhi Yang & Ronghao Che & Huacan W, 2026. "QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining," Papers 2602.07085, arXiv.org, revised May 2026.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2605.28359. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.