IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2507.22932.html
   My bibliography  Save this paper

FinMarBa: A Market-Informed Dataset for Financial Sentiment Classification

Author

Listed:
  • Baptiste Lefort
  • Eric Benhamou
  • Beatrice Guez
  • Jean-Jacques Ohana
  • Ethan Setrouk
  • Alban Etienne

Abstract

This paper presents a novel hierarchical framework for portfolio optimization, integrating lightweight Large Language Models (LLMs) with Deep Reinforcement Learning (DRL) to combine sentiment signals from financial news with traditional market indicators. Our three-tier architecture employs base RL agents to process hybrid data, meta-agents to aggregate their decisions, and a super-agent to merge decisions based on market data and sentiment analysis. Evaluated on data from 2018 to 2024, after training on 2000-2017, the framework achieves a 26% annualized return and a Sharpe ratio of 1.2, outperforming equal-weighted and S&P 500 benchmarks. Key contributions include scalable cross-modal integration, a hierarchical RL structure for enhanced stability, and open-source reproducibility.

Suggested Citation

  • Baptiste Lefort & Eric Benhamou & Beatrice Guez & Jean-Jacques Ohana & Ethan Setrouk & Alban Etienne, 2025. "FinMarBa: A Market-Informed Dataset for Financial Sentiment Classification," Papers 2507.22932, arXiv.org.
  • Handle: RePEc:arx:papers:2507.22932
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2507.22932
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Igor Mozetič & Miha Grčar & Jasmina Smailović, 2016. "Multilingual Twitter Sentiment Classification: The Role of Human Annotators," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-26, May.
    2. Baptiste Lefort & Eric Benhamou & Jean-Jacques Ohana & David Saltiel & Beatrice Guez & Thomas Jacquot, 2024. "Stress index strategy enhanced with financial news sentiment analysis for the equity markets," Papers 2404.00012, arXiv.org.
    3. Baptiste Lefort & Eric Benhamou & Jean-Jacques Ohana & David Saltiel & Beatrice Guez & D Challet, 2024. "Can ChatGPT Compute Trustworthy Sentiment Scores from Bloomberg Market Wraps? [ChatGPT peut-il calculer des scores de sentiment dignes de confiance à partir de Bloomberg Market Wraps ?]," Working Papers hal-04739906, HAL.
    4. Yi Yang & Yixuan Tang & Kar Yan Tam, 2023. "InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning," Papers 2309.13064, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Baptiste Lefort & Eric Benhamou & Jean-Jacques Ohana & David Saltiel & Beatrice Guez, 2024. "Optimizing Performance: How Compact Models Match or Exceed GPT's Classification Capabilities through Fine-Tuning," Papers 2409.11408, arXiv.org.
    2. Xuewen Han & Neng Wang & Shangkun Che & Hongyang Yang & Kunpeng Zhang & Sean Xin Xu, 2024. "Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research," Papers 2411.04788, arXiv.org.
    3. Liyuan Chen & Shuoling Liu & Jiangpeng Yan & Xiaoyu Wang & Henglin Liu & Chuang Li & Kecheng Jiao & Jixuan Ying & Yang Veronica Liu & Qiang Yang & Xiu Li, 2025. "Advancing Financial Engineering with Foundation Models: Progress, Applications, and Challenges," Papers 2507.18577, arXiv.org.
    4. Vuk Batanović & Miloš Cvetanović & Boško Nikolić, 2020. "A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-30, November.
    5. Chansiri, Karikarn & Wei, Xinyu & Chor, Ka Ho Brian, 2024. "Using natural language processing approaches to characterize professional experiences of child welfare workers," Children and Youth Services Review, Elsevier, vol. 166(C).
    6. Jiaxin Liu & Yixuan Tang & Yi Yang & Kar Yan Tam, 2025. "Evaluating and Aligning Human Economic Risk Preferences in LLMs," Papers 2503.06646, arXiv.org, revised Sep 2025.
    7. Durrant, Eleanor & Howson, Pete & Sallu, Susannah M. & Shirima, Deo D. & Lala, Margherita & Milheiras, Sergio G. & Lyimo, Francis & Nyiti, Petro P. & Mwanga, Lilian & Kioko, Esther & Pfeifer, Marion, 2025. "Understanding farmers' attitudes and aspirations for tree-cover restoration in the Kilombero Valley, Tanzania," Forest Policy and Economics, Elsevier, vol. 172(C).
    8. Chiu, I-Chan & Hung, Mao-Wei, 2025. "Finance-specific large language models: Advancing sentiment analysis and return prediction with LLaMA 2," Pacific-Basin Finance Journal, Elsevier, vol. 90(C).
    9. Lars Hornuf & David J. Streich & Niklas Töllich, 2025. "Making GenAI Smarter: Evidence from a Portfolio Allocation Experiment," CESifo Working Paper Series 11862, CESifo.
    10. Haohang Li & Yupeng Cao & Yangyang Yu & Shashidhar Reddy Javaji & Zhiyang Deng & Yueru He & Yuechen Jiang & Zining Zhu & Koduvayur Subbalakshmi & Guojun Xiong & Jimin Huang & Lingfei Qian & Xueqing Pe, 2024. "INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent," Papers 2412.18174, arXiv.org.
    11. Yuqi Nie & Yaxuan Kong & Xiaowen Dong & John M. Mulvey & H. Vincent Poor & Qingsong Wen & Stefan Zohren, 2024. "A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges," Papers 2406.11903, arXiv.org.
    12. Zhizhuo Kou & Holam Yu & Junyu Luo & Jingshu Peng & Xujia Li & Chengzhong Liu & Juntao Dai & Lei Chen & Sirui Han & Yike Guo, 2024. "Automate Strategy Finding with LLM in Quant Investment," Papers 2409.06289, arXiv.org, revised Nov 2025.
    13. Peter Gabrovšek & Darko Aleksovski & Igor Mozetič & Miha Grčar, 2017. "Twitter sentiment around the Earnings Announcement events," PLOS ONE, Public Library of Science, vol. 12(2), pages 1-21, February.
    14. Shijie Han & Jingshu Zhang & Yiqing Shen & Kaiyuan Yan & Hongguang Li, 2025. "FinSphere, a Real-Time Stock Analysis Agent Powered by Instruction-Tuned LLMs and Domain Tools," Papers 2501.12399, arXiv.org, revised Jul 2025.
    15. Christian Fieberg & Lars Hornuf & Maximilian Meiler & David J. Streich, 2025. "Using Large Language Models for Financial Advice," CESifo Working Paper Series 11666, CESifo.
    16. David Kuo Chuen Lee & Chong Guan & Yinghui Yu & Qinxu Ding, 2024. "A Comprehensive Review of Generative AI in Finance," FinTech, MDPI, vol. 3(3), pages 1-19, September.
    17. Qianggang Ding & Haochen Shi & Jiadong Guo & Bang Liu, 2024. "TradExpert: Revolutionizing Trading with Mixture of Expert LLMs," Papers 2411.00782, arXiv.org, revised May 2025.
    18. Fernando Spadea & Oshani Seneviratne, 2025. "Aligning Language Models with Investor and Market Behavior for Financial Recommendations," Papers 2510.15993, arXiv.org.
    19. Zhiyu Cao & Zachary Feinstein, 2024. "Large Language Model in Financial Regulatory Interpretation," Papers 2405.06808, arXiv.org, revised Jul 2024.
    20. Herbert Dawid & Philipp Harting & Hankui Wang & Zhongli Wang & Jiachen Yi, 2025. "Agentic Workflows for Economic Research: Design and Implementation," Papers 2504.09736, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2507.22932. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.