IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2601.22162.html

UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos

Author

Listed:
  • Zhi Yang
  • Lingfeng Zeng
  • Fangqi Lou
  • Qi Qi
  • Wei Zhang
  • Zhenyu Wu
  • Zhenxiong Yu
  • Jun Han
  • Zhiheng Jin
  • Lejie Zhang
  • Xiaoming Huang
  • Xiaolong Liang
  • Zheng Wei
  • Junbo Zou
  • Dongpo Cheng
  • Zhaowei Liu
  • Xin Guo
  • Rongjunchen Zhang
  • Liwen Zhang

Abstract

Multimodal large language models are playing an increasingly significant role in empowering the financial domain, however, the challenges they face, such as multimodal and high-density information and cross-modal multi-hop reasoning, go beyond the evaluation scope of existing multimodal benchmarks. To address this gap, we propose UniFinEval, the first unified multimodal benchmark designed for high-information-density financial environments, covering text, images, and videos. UniFinEval systematically constructs five core financial scenarios grounded in real-world financial systems: Financial Statement Auditing, Company Fundamental Reasoning, Industry Trend Insights, Financial Risk Sensing, and Asset Allocation Analysis. We manually construct a high-quality dataset consisting of 3,767 question-answer pairs in both chinese and english and systematically evaluate 10 mainstream MLLMs under Zero-Shot and CoT settings. Results show that Gemini-3-pro-preview achieves the best overall performance, yet still exhibits a substantial gap compared to financial experts. Further error analysis reveals systematic deficiencies in current models. UniFinEval aims to provide a systematic assessment of MLLMs' capabilities in fine-grained, high-information-density financial environments, thereby enhancing the robustness of MLLMs applications in real-world financial scenarios. Data and code are available at https://github.com/aifinlab/UniFinEval.

Suggested Citation

  • Zhi Yang & Lingfeng Zeng & Fangqi Lou & Qi Qi & Wei Zhang & Zhenyu Wu & Zhenxiong Yu & Jun Han & Zhiheng Jin & Lejie Zhang & Xiaoming Huang & Xiaolong Liang & Zheng Wei & Junbo Zou & Dongpo Cheng & Zh, 2026. "UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos," Papers 2601.22162, arXiv.org.
  • Handle: RePEc:arx:papers:2601.22162
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2601.22162
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Liu, Yingnan & Bu, Ningbo & Li, Zhiqiang & Zhang, Yongmin & Zhao, Zhenyu, 2025. "AT-FinGPT: Financial risk prediction via an audio-text large language model," Finance Research Letters, Elsevier, vol. 77(C).
    2. Yijia Xiao & Edward Sun & Tong Chen & Fang Wu & Di Luo & Wei Wang, 2025. "Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning," Papers 2509.11420, arXiv.org.
    3. Dat Mai, 2024. "StockGPT: A GenAI Model for Stock Prediction and Trading," Papers 2404.05101, arXiv.org, revised Oct 2024.
    4. Saizhuo Wang & Hang Yuan & Lionel M. Ni & Jian Guo, 2024. "QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model," Papers 2402.03755, arXiv.org.
    5. Yanlong Wang & Jian Xu & Fei Ma & Hongkang Zhang & Hang Yu & Tiantian Gao & Yu Wang & Haochen You & Shao-Lun Huang & Danny Dongning Sun & Xiao-Ping Zhang, 2025. "FinZero: Launching Multi-modal Financial Time Series Forecast with Large Reasoning Model," Papers 2509.08742, arXiv.org.
    6. Yang Chen & Yueheng Jiang & Zhaozhao Ma & Yuchen Cao & Jacky Keung & Kun Kuang & Leilei Gan & Yiquan Wu & Fei Wu, 2025. "MM-DREX: Multimodal-Driven Dynamic Routing of LLM Experts for Financial Trading," Papers 2509.05080, arXiv.org, revised Sep 2025.
    7. Jean Lee & Nicholas Stevens & Soyeon Caren Han & Minseok Song, 2024. "A Survey of Large Language Models in Finance (FinLLMs)," Papers 2402.02315, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Han Ding & Yinheng Li & Junhao Wang & Hang Chen & Doudou Guo & Yunbai Zhang, 2024. "Large Language Model Agent in Financial Trading: A Survey," Papers 2408.06361, arXiv.org, revised Mar 2026.
    2. Benjamin Coriat & Eric Benhamou, 2025. "HARLF: Hierarchical Reinforcement Learning and Lightweight LLM-Driven Sentiment Integration for Financial Portfolio Optimization," Papers 2507.18560, arXiv.org.
    3. Chen, Liangyu & Yusuyin, Alkut & Zhang, Renyi & Zhang, Yongmin, 2025. "Boards' green background and corporate ESG," International Review of Financial Analysis, Elsevier, vol. 105(C).
    4. Li, Yang, 2025. "Can large language models (LLMs) replace human reading? Empirical evidence from sustainability reports," Finance Research Letters, Elsevier, vol. 85(PC).
    5. Tirulo, Aschalew & Yadav, Monika & Lolamo, Mathewos & Chauhan, Siddhartha & Siano, Pierluigi & Shafie-khah, Miadreza, 2026. "Beyond automation: Unveiling the potential of agentic intelligence," Renewable and Sustainable Energy Reviews, Elsevier, vol. 226(PA).
    6. Haofei Yu & Fenghai Li & Jiaxuan You, 2025. "LiveTradeBench: Seeking Real-World Alpha with Large Language Models," Papers 2511.03628, arXiv.org.
    7. Hamidou Tembine & Manzoor Ahmed Khan & Issa Bamia, 2024. "Mean-Field-Type Transformers," Mathematics, MDPI, vol. 12(22), pages 1-51, November.
    8. Mostapha Benhenda, 2026. "Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance," Papers 2601.13770, arXiv.org.
    9. Zheng Li, 2026. "Design and Empirical Study of a Large Language Model-Based Multi-Agent Investment System for Chinese Public REITs," Papers 2602.00082, arXiv.org.
    10. Kassiani Papasotiriou & Srijan Sood & Shayleen Reynolds & Tucker Balch, 2024. "AI in Investment Analysis: LLMs for Equity Stock Ratings," Papers 2411.00856, arXiv.org.
    11. Junhua Liu, 2024. "A Survey of Financial AI: Architectures, Advances and Open Challenges," Papers 2411.12747, arXiv.org.
    12. Aadi Singhi, 2025. "An Adaptive Multi Agent Bitcoin Trading System," Papers 2510.08068, arXiv.org, revised Nov 2025.
    13. Muhammed Golec & Maha AlabdulJalil, 2025. "Interpretable LLMs for Credit Risk: A Systematic Review and Taxonomy," Papers 2506.04290, arXiv.org, revised Jun 2025.
    14. Haoyi Zhang & Tianyi Zhu, 2025. "Neither Consent nor Property: A Policy Lab for Data Law," Papers 2510.26727, arXiv.org, revised Jan 2026.
    15. Jerick Shi & Burton Hollifield, 2025. "Market-Dependent Communication in Multi-Agent Alpha Generation," Papers 2511.13614, arXiv.org.
    16. Song, Yuping & Zhang, Yilun & Huang, Jiefei & Yang, Aijun, 2025. "Volatility and value-at-risk forecasting using BERT and transformer models incorporating investors' textual sentiments," Finance Research Letters, Elsevier, vol. 85(PD).
    17. Rui Sun & Yifan Sun & Sheng Xu & Li Zhao & Jing Li & Daxin Jiang & Cheng Hua & Zuo Bai, 2026. "Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification," Papers 2601.03948, arXiv.org, revised Jan 2026.
    18. Satyadhar Joshi, 2025. "Review of Gen AI Models for Financial Risk Management: Architectural Frameworks and Implementation Strategies," Post-Print hal-05101589, HAL.
    19. Zuoyou Jiang & Li Zhao & Rui Sun & Ruohan Sun & Zhongjian Li & Jing Li & Daxin Jiang & Zuo Bai & Cheng Hua, 2025. "Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning," Papers 2512.23515, arXiv.org.
    20. Weixian Waylon Li & Hyeonjun Kim & Mihai Cucuringu & Tiejun Ma, 2025. "Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?," Papers 2505.07078, arXiv.org, revised Feb 2026.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.22162. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.