Author
Listed:
- Zhi Yang
- Lingfeng Zeng
- Fangqi Lou
- Qi Qi
- Wei Zhang
- Zhenyu Wu
- Zhenxiong Yu
- Jun Han
- Zhiheng Jin
- Lejie Zhang
- Xiaoming Huang
- Xiaolong Liang
- Zheng Wei
- Junbo Zou
- Dongpo Cheng
- Zhaowei Liu
- Xin Guo
- Rongjunchen Zhang
- Liwen Zhang
Abstract
Multimodal large language models are playing an increasingly significant role in empowering the financial domain, however, the challenges they face, such as multimodal and high-density information and cross-modal multi-hop reasoning, go beyond the evaluation scope of existing multimodal benchmarks. To address this gap, we propose UniFinEval, the first unified multimodal benchmark designed for high-information-density financial environments, covering text, images, and videos. UniFinEval systematically constructs five core financial scenarios grounded in real-world financial systems: Financial Statement Auditing, Company Fundamental Reasoning, Industry Trend Insights, Financial Risk Sensing, and Asset Allocation Analysis. We manually construct a high-quality dataset consisting of 3,767 question-answer pairs in both chinese and english and systematically evaluate 10 mainstream MLLMs under Zero-Shot and CoT settings. Results show that Gemini-3-pro-preview achieves the best overall performance, yet still exhibits a substantial gap compared to financial experts. Further error analysis reveals systematic deficiencies in current models. UniFinEval aims to provide a systematic assessment of MLLMs' capabilities in fine-grained, high-information-density financial environments, thereby enhancing the robustness of MLLMs applications in real-world financial scenarios. Data and code are available at https://github.com/aifinlab/UniFinEval.
Suggested Citation
Zhi Yang & Lingfeng Zeng & Fangqi Lou & Qi Qi & Wei Zhang & Zhenyu Wu & Zhenxiong Yu & Jun Han & Zhiheng Jin & Lejie Zhang & Xiaoming Huang & Xiaolong Liang & Zheng Wei & Junbo Zou & Dongpo Cheng & Zh, 2026.
"UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos,"
Papers
2601.22162, arXiv.org.
Handle:
RePEc:arx:papers:2601.22162
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.22162. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.