IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2505.13533.html
   My bibliography  Save this paper

FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs

Author

Listed:
  • Junzhe Jiang
  • Chang Yang
  • Aixin Cui
  • Sihan Jin
  • Ruiyu Wang
  • Bo Li
  • Xiao Huang
  • Dongning Sun
  • Xinrun Wang

Abstract

Financial tasks are pivotal to global economic stability; however, their execution faces challenges including labor intensive processes, low error tolerance, data fragmentation, and tool limitations. Although large language models (LLMs) have succeeded in various natural language processing tasks and have shown potential in automating workflows through reasoning and contextual understanding, current benchmarks for evaluating LLMs in finance lack sufficient domain-specific data, have simplistic task design, and incomplete evaluation frameworks. To address these gaps, this article presents FinMaster, a comprehensive financial benchmark designed to systematically assess the capabilities of LLM in financial literacy, accounting, auditing, and consulting. Specifically, FinMaster comprises three main modules: i) FinSim, which builds simulators that generate synthetic, privacy-compliant financial data for companies to replicate market dynamics; ii) FinSuite, which provides tasks in core financial domains, spanning 183 tasks of various types and difficulty levels; and iii) FinEval, which develops a unified interface for evaluation. Extensive experiments over state-of-the-art LLMs reveal critical capability gaps in financial reasoning, with accuracy dropping from over 90% on basic tasks to merely 40% on complex scenarios requiring multi-step reasoning. This degradation exhibits the propagation of computational errors, where single-metric calculations initially demonstrating 58% accuracy decreased to 37% in multimetric scenarios. To the best of our knowledge, FinMaster is the first benchmark that covers full-pipeline financial workflows with challenging tasks. We hope that FinMaster can bridge the gap between research and industry practitioners, driving the adoption of LLMs in real-world financial practices to enhance efficiency and accuracy.

Suggested Citation

  • Junzhe Jiang & Chang Yang & Aixin Cui & Sihan Jin & Ruiyu Wang & Bo Li & Xiao Huang & Dongning Sun & Xinrun Wang, 2025. "FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs," Papers 2505.13533, arXiv.org.
  • Handle: RePEc:arx:papers:2505.13533
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2505.13533
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Xiao-Yang Liu & Guoxuan Wang & Hongyang Yang & Daochen Zha, 2023. "FinGPT: Democratizing Internet-scale Data for Financial Large Language Models," Papers 2307.10485, arXiv.org, revised Nov 2023.
    2. Miriam Bruhn & Dean Karlan & Antoinette Schoar, 2018. "The Impact of Consulting Services on Small and Medium Enterprises: Evidence from a Randomized Trial in Mexico," Journal of Political Economy, University of Chicago Press, vol. 126(2), pages 635-687.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. González-Uribe, Juanita & Reyes, Santiago, 2021. "Identifying and boosting “Gazelles”: Evidence from business accelerators," Journal of Financial Economics, Elsevier, vol. 139(1), pages 260-287.
    2. Margaret Miller & Julia Reichelstein & Christian Salas & Bilal Zia, 2015. "Can You Help Someone Become Financially Capable? A Meta-Analysis of the Literature," The World Bank Research Observer, World Bank, vol. 30(2), pages 220-246.
    3. World Bank, "undated". "Shifting Gears," World Bank Publications - Reports 36317, The World Bank Group.
    4. Antonia Grohmann & Lukas Menkhoff & Helke Seitz, 2022. "The Effect of Personalized Feedback on Small Enterprises’ Finances in Uganda," Economic Development and Cultural Change, University of Chicago Press, vol. 70(3), pages 1197-1227.
    5. Agnes Norris Keiller & John Van Reenen, 2024. "Disaster management," CEP Discussion Papers dp2007, Centre for Economic Performance, LSE.
    6. Sule Alan & Gozde Corekcioglu & Matthias Sutter, 2023. "Improving Workplace Climate in Large Corporations: A Clustered Randomized Intervention," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 138(1), pages 151-203.
    7. Tulio Cravo & Caio Piza, 2016. "The Impact of Business Support Services for Small and Medium Enterprises on Firm Performance in Low -and Middle- Income Countries: A Meta-Analysis," IDB Publications (Working Papers) 94938, Inter-American Development Bank.
    8. Natalie Chun & Soohyung Lee, 2015. "Bonus compensation and productivity: evidence from Indian manufacturing plant-level data," Journal of Productivity Analysis, Springer, vol. 43(1), pages 47-58, February.
    9. Daniela Scur & Raffaella Sadun & John Van Reenen & Renata Lemos & Nicholas Bloom, 2021. "The World Management Survey at 18: lessons and the way forward," Oxford Review of Economic Policy, Oxford University Press and Oxford Review of Economic Policy Limited, vol. 37(2), pages 231-258.
    10. Andreas Georgiadis & Christos N. Pitelis, 2016. "The Impact of Employees' and Managers' Training on the Performance of Small- and Medium-Sized Enterprises: Evidence from a Randomized Natural Experiment in the UK Service Sector," British Journal of Industrial Relations, London School of Economics, vol. 54(2), pages 409-421, June.
    11. Marshall Burke & Lauren Falcao Bergquist & Edward Miguel, 2019. "Sell Low and Buy High: Arbitrage and Local Price Effects in Kenyan Markets," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 134(2), pages 785-842.
    12. Islam,Asif Mohammed & Gatti,Roberta V., 2024. "Dysfunctional Family Management : Family-Managed Businesses and the Quality of Management Practices," Policy Research Working Paper Series 10684, The World Bank.
    13. Grimm, Michael & Paffhausen, Anna Luisa, 2015. "Do interventions targeted at micro-entrepreneurs and small and medium-sized firms create jobs? A systematic review of the evidence for low and middle income countries," Labour Economics, Elsevier, vol. 32(C), pages 67-85.
    14. Catia Batista & Sandra Sequeira & Pedro C. Vicente, 2022. "Closing the Gender Profit Gap?," Management Science, INFORMS, vol. 68(12), pages 8553-8567, December.
    15. Hao Dong & Daniel L. Millimet, 2020. "Propensity Score Weighting with Mismeasured Covariates: An Application to Two Financial Literacy Interventions," JRFM, MDPI, vol. 13(11), pages 1-24, November.
    16. Yichen Luo & Yebo Feng & Jiahua Xu & Paolo Tasca & Yang Liu, 2025. "LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management," Papers 2501.00826, arXiv.org, revised Jan 2025.
    17. David McKenzie, 2017. "Identifying and Spurring High-Growth Entrepreneurship: Experimental Evidence from a Business Plan Competition," American Economic Review, American Economic Association, vol. 107(8), pages 2278-2307, August.
    18. Michael Kremer & Jonathan Robinson & Olga Rostapshova, 2014. "Success in Entrepreneurship: Doing the Math," NBER Chapters, in: African Successes, Volume II: Human Capital, pages 281-303, National Bureau of Economic Research, Inc.
    19. Macchiavello, Rocco & Morjaria, Ameet, 2022. "Acquisitions, Management, and Efficiency in Rwanda's Coffee Industry," CEPR Discussion Papers 17434, C.E.P.R. Discussion Papers.
    20. Thanos Konstantinidis & Giorgos Iacovides & Mingxue Xu & Tony G. Constantinides & Danilo Mandic, 2024. "FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications," Papers 2403.12285, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2505.13533. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.