PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

Author

Listed:

Yuxuan Zhao
Sijia Chen
Ningxin Su

Abstract

Large language models (LLMs) have shown strong performance across diverse financial tasks, yet portfolio management (PM), a critical financial decision-making task, remains poorly benchmarked. Existing benchmarks exhibit two main gaps: they ignore cross-asset correlation structures, thereby failing to distinguish genuinely diversified portfolios from concentrated ones, and fail to evaluate the complete PM decision pipeline in real-world scenarios. We introduce PortBench, a benchmark spanning six heterogeneous asset classes over ten years. PortBench consists of two complementary layers: a static QA dataset of 6,269 correlation-based questions across seven task templates, and a dynamic five-stage allocation pipeline that mirrors the full PM decision cycle. To evaluate these layers, we introduce two dedicated metrics: a dual-layer correlation score that measures whether proposed portfolios exploit inter-class hedging and avoid intra-class concentration, and CEPS, a metric that quantifies how reasoning errors compound across pipeline stages. We further assess strategy robustness and investor alignment under three historical stress regimes and risk profiles. Evaluating ten frontier LLMs, we find that despite strong performance on static financial QA, 90\% of model-profile combinations fail to outperform a basic equal-weight allocation, and models that satisfy every procedural constraint still suffer catastrophic drawdowns under stress. Our source code is available at \href{https://github.com/AgenticFinLab/portbench}{this https URL}.

Suggested Citation

Yuxuan Zhao & Sijia Chen & Ningxin Su, 2026. "PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management," Papers 2605.27887, arXiv.org, revised Jun 2026.

Handle: RePEc:arx:papers:2605.27887

Download full text from publisher

References listed on IDEAS

Zichen Chen & Jiaao Chen & Jianda Chen & Misha Sra, 2025. "Standard Benchmarks Fail -- Auditing LLM Agents in Finance Must Prioritize Risk," Papers 2502.15865, arXiv.org, revised Jun 2025.
Jeremy Berkowitz & James O'Brien, 2002. "How Accurate Are Value‐at‐Risk Models at Commercial Banks?," Journal of Finance, American Finance Association, vol. 57(3), pages 1093-1111, June.
R. Mantegna, 1999. "Hierarchical structure in financial markets," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 11(1), pages 193-197, September.
- R. Mantegna, 1999. "Hierarchical structure in financial markets," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 11(1), pages 193-197, September.
- Rosario N. Mantegna, 1998. "Hierarchical Structure in Financial Markets," Papers cond-mat/9802256, arXiv.org.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Junyi Yao & Zihao Zheng, 2026. "Beyond Agent Architecture: Execution Assumptions and Reproducibility in LLM-Based Trading Systems," Papers 2606.08285, arXiv.org.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Schroeder, Anna Louise & Fryzlewicz, Piotr, 2013. "Adaptive trend estimation in financial time series via multiscale change-point-induced basis recovery," LSE Research Online Documents on Economics 54934, London School of Economics and Political Science, LSE Library.
- Schröder, Anna Louise & Fryzlewicz, Piotr, 2013. "Adaptive trend estimation in financial time series via multiscale change-point-induced basis recovery," MPRA Paper 52379, University Library of Munich, Germany.
Champagne, Claudia, 2014. "The international syndicated loan market network: An “unholy trinity”?," Global Finance Journal, Elsevier, vol. 25(2), pages 148-168.
Trindade, Graça & Dias, José G. & Ambrósio, Jorge, 2017. "Extracting clusters from aggregate panel data: A market segmentation study," Applied Mathematics and Computation, Elsevier, vol. 296(C), pages 277-288.
Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-16, July.
João A. Bastos & Jorge Caiado, 2014. "Clustering financial time series with variance ratio statistics," Quantitative Finance, Taylor & Francis Journals, vol. 14(12), pages 2121-2133, December.
- Joao A. Bastos & Jorge Caiado, 2009. "Clustering financial time series with variance ratio statistics," CEMAPRE Working Papers 0904, Centre for Applied Mathematics and Economics (CEMAPRE), School of Economics and Management (ISEG), Technical University of Lisbon.
Sebastiano Michele Zema & Giorgio Fagiolo & Tiziano Squartini & Diego Garlaschelli, 2025. "Mesoscopic structure of the stock market and portfolio optimization," Journal of Economic Interaction and Coordination, Springer;Society for Economic Science with Heterogeneous Interacting Agents, vol. 20(2), pages 307-333, April.
- Sebastiano Michele Zema & Giorgio Fagiolo & Tiziano Squartini & Diego Garlaschelli, 2021. "Mesoscopic Structure of the Stock Market and Portfolio Optimization," Papers 2112.06544, arXiv.org.
- Sebastiano Michele Zema & Giorgio Fagiolo & Tiziano Squartini & Diego Garlaschelli, 2021. "Mesoscopic Structure of the Stock Market and Portfolio Optimization," LEM Papers Series 2021/45, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
Guido Caldarelli & Matthieu Cristelli & Andrea Gabrielli & Luciano Pietronero & Antonio Scala & Andrea Tacchella, 2012. "A Network Analysis of Countries’ Export Flows: Firm Grounds for the Building Blocks of the Economy," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-11, October.
Michelle B Graczyk & Sílvio M Duarte Queirós, 2017. "Intraday seasonalities and nonstationarity of trading volume in financial markets: Collective features," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-23, July.
Torben G. Andersen & Tim Bollerslev & Peter Christoffersen & Francis X. Diebold, 2007. "Practical Volatility and Correlation Modeling for Financial Market Risk Management," NBER Chapters, in: The Risks of Financial Institutions, pages 513-544, National Bureau of Economic Research, Inc.
- Torben G. Andersen & Tim Bollerslev & Peter F. Christoffersen & Francis X. Diebold, 2005. "Practical Volatility and Correlation Modeling for Financial Market Risk Management," NBER Working Papers 11069, National Bureau of Economic Research, Inc.
- Torben G. Andersen & Tim Bollerslev & Peter F. Christoffersen & Francis X. Diebold, 2005. "Practical Volatility and Correlation Modeling for Financial Market Risk Management," PIER Working Paper Archive 05-007, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
- Andersen, Torben G. & Bollerslev, Tim & Christoffersen, Peter F. & Diebold, Francis X., 2005. "Practical volatility and correlation modeling for financial market risk management," CFS Working Paper Series 2005/02, Center for Financial Studies (CFS).
Alexander, Gordon J. & Baptista, Alexandre M. & Yan, Shu, 2012. "When more is less: Using multiple constraints to reduce tail risk," Journal of Banking & Finance, Elsevier, vol. 36(10), pages 2693-2716.
Shamshuritawati Sharif, 2012. "Correlation Network Analysis of International Postgraduate Studentsâ€™ Satisfaction in Top Malaysian Universities: A Robust Approach," Modern Applied Science, Canadian Center of Science and Education, vol. 6(12), pages 1-91, December.
Trancoso, Tiago, 2014. "Emerging markets in the global economic network: Real(ly) decoupling?," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 395(C), pages 499-510.
Lan, Hairong & Wang, Liukai & Li, Mengting & Xiong, Yu & Li, Yuqing, 2025. "Network centrality, diversification, and portfolio returns: Economic insights from blockchain industry," Economic Modelling, Elsevier, vol. 152(C).
Ixandra Achitouv, 2025. "Dynamical analysis of financial stocks network: Improving forecasting using network properties," PLOS ONE, Public Library of Science, vol. 20(5), pages 1-23, May.
López Pérez, Mario & Mansilla Corona, Ricardo, 2022. "Ordinal synchronization and typical states in high-frequency digital markets," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 598(C).
Christophe Hurlin & Christophe Pérignon, 2012. "Margin Backtesting," Working Papers halshs-00746274, HAL.
Paulus, Michal & Kristoufek, Ladislav, 2015. "Worldwide clustering of the corruption perception," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 428(C), pages 351-358.
- Michal Paulus & Ladislav Kristoufek, 2015. "Worldwide clustering of the corruption perception," Papers 1502.00104, arXiv.org.
Miguel A. Ferreira, 2005. "Evaluating Interest Rate Covariance Models Within a Value-at-Risk Framework," Journal of Financial Econometrics, Oxford University Press, vol. 3(1), pages 126-168.
- Miguel A. Ferreira & Jose A. Lopez, 2004. "Evaluating Interest Rate Covariance Models within a Value-at-Risk Framework," Working Paper Series 2004-03, Federal Reserve Bank of San Francisco.
Chen, Binxia & Jiang, Yuanying & Zhou, Donghai, 2025. "Risk contagion network and characteristic measurement among international financial markets," Pacific-Basin Finance Journal, Elsevier, vol. 92(C).
Roy Cerqueti & Pierpaolo D’Urso & Livia Giovanni & Raffaele Mattera & Vincenzina Vitale, 2024. "Fuzzy clustering of time series based on weighted conditional higher moments," Computational Statistics, Springer, vol. 39(6), pages 3091-3114, September.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2605.27887. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data