The Exploratory Multi-Asset Mean-Variance Portfolio Selection using Reinforcement Learning

The Exploratory Multi-Asset Mean-Variance Portfolio Selection using Reinforcement Learning

Author

Listed:

Yu Li
Yuhan Wu
Shuhua Zhang

Abstract

In this paper, we study the continuous-time multi-asset mean-variance (MV) portfolio selection using a reinforcement learning (RL) algorithm, specifically the soft actor-critic (SAC) algorithm, in the time-varying financial market. A family of Gaussian portfolio selections is derived, and a policy iteration process is crafted to learn the optimal exploratory portfolio selection. We prove the convergence of the policy iteration process theoretically, based on which the SAC algorithm is developed. To improve the algorithm's stability and the learning accuracy in the multi-asset scenario, we divide the model parameters that influence the optimal portfolio selection into three parts, and learn each part progressively. Numerical studies in the simulated and real financial markets confirm the superior performance of the proposed SAC algorithm under various criteria.

Suggested Citation

Yu Li & Yuhan Wu & Shuhua Zhang, 2025. "The Exploratory Multi-Asset Mean-Variance Portfolio Selection using Reinforcement Learning," Papers 2505.07537, arXiv.org.

Handle: RePEc:arx:papers:2505.07537

Download full text from publisher

References listed on IDEAS

Jorion, Philippe, 1986. "Bayes-Stein Estimation for Portfolio Analysis," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 21(3), pages 279-292, September.
Candelon, B. & Hurlin, C. & Tokpavi, S., 2012. "Sampling error and double shrinkage estimation of minimum variance portfolios," Journal of Empirical Finance, Elsevier, vol. 19(4), pages 511-527.
- Candelon, B. & Hurlin, C. & Tokpavi, S., 2011. "Sampling error and double shrinkage estimation of minimum variance portfolios," Research Memorandum 002, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
- Bertrand Candelon & Christophe Hurlin & Sessi Tokpavi, 2012. "Sampling Error and Double Shrinkage Estimation of Minimum Variance Portfolios," Post-Print hal-01385835, HAL.
Balduzzi, Pierluigi & Lynch, Anthony W., 1999. "Transaction costs and predictability: some utility cost calculations," Journal of Financial Economics, Elsevier, vol. 52(1), pages 47-78, April.
Pedro Barroso & Konark Saxena, 2022. "Lest We Forget: Learn from Out-of-Sample Forecast Errors When Optimizing Portfolios," The Review of Financial Studies, Society for Financial Studies, vol. 35(3), pages 1222-1278.
Hutchinson, James M & Lo, Andrew W & Poggio, Tomaso, 1994. "A Nonparametric Approach to Pricing and Hedging Derivative Securities via Learning Networks," Journal of Finance, American Finance Association, vol. 49(3), pages 851-889, July.
- James M. Hutchinson & Andrew W. Lo & Tomaso Poggio, 1994. "A Nonparametric Approach to Pricing and Hedging Derivative Securities Via Learning Networks," NBER Working Papers 4718, National Bureau of Economic Research, Inc.
Victor Duarte & Diogo Duarte & Dejanir H Silva, 2024. "Machine Learning for Continuous-Time Finance," The Review of Financial Studies, Society for Financial Studies, vol. 37(11), pages 3217-3271.
De Gennaro Aquino, Luca & Sornette, Didier & Strub, Moris S., 2023. "Portfolio selection with exploration of new investment assets," European Journal of Operational Research, Elsevier, vol. 310(2), pages 773-792.
Olivier Ledoit & Michael Wolf, 2017. "Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selection: Markowitz Meets Goldilocks," The Review of Financial Studies, Society for Financial Studies, vol. 30(12), pages 4349-4388.
Merton, Robert C., 1972. "An Analytic Derivation of the Efficient Portfolio Frontier," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 7(4), pages 1851-1872, September.
Min Dai & Yuchao Dong & Yanwei Jia, 2023. "Learning equilibrium mean‐variance strategy," Mathematical Finance, Wiley Blackwell, vol. 33(4), pages 1166-1212, October.
Duan Li & Wan‐Lung Ng, 2000. "Optimal Dynamic Portfolio Selection: Multiperiod Mean‐Variance Formulation," Mathematical Finance, Wiley Blackwell, vol. 10(3), pages 387-406, July.
Shi, Fangquan & Shu, Lianjie & Yang, Aijun & He, Fangyi, 2020. "Improving Minimum-Variance Portfolios by Alleviating Overdispersion of Eigenvalues," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 55(8), pages 2700-2731, December.
Victor Duarte & Diogo Duarte & Dejanir H. Silva, 2024. "Machine Learning for Continuous-Time Finance," CESifo Working Paper Series 10909, CESifo.
Wang, J. & Forsyth, P.A., 2010. "Numerical solution of the Hamilton-Jacobi-Bellman formulation for continuous time mean variance asset allocation," Journal of Economic Dynamics and Control, Elsevier, vol. 34(2), pages 207-230, February.
Haoran Wang & Xun Yu Zhou, 2020. "Continuous‐time mean–variance portfolio selection: A reinforcement learning framework," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1273-1308, October.
Hiraki, Kazuhiro & Sun, Chuanping, 2022. "A toolkit for exploiting contemporaneous stock correlations," Journal of Empirical Finance, Elsevier, vol. 65(C), pages 99-124.
Elena Vigna, 2020. "On Time Consistency For Mean-Variance Portfolio Selection," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 23(06), pages 1-22, September.
Christoph Czichowsky, 2013. "Time-consistent mean-variance portfolio selection in discrete and continuous time," Finance and Stochastics, Springer, vol. 17(2), pages 227-271, April.
Joong-Ho Won & Johan Lim & Seung-Jean Kim & Bala Rajaratnam, 2013. "Condition-number-regularized covariance estimation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 427-450, June.
Kydland, Finn E & Prescott, Edward C, 1982. "Time to Build and Aggregate Fluctuations," Econometrica, Econometric Society, vol. 50(6), pages 1345-1370, November.
- Finn E. Kydland & Edward C. Prescott, 1982. "Executable program for "Time to Build and Aggregate Fluctuations"," QM&RBC Codes 4, Quantitative Macroeconomics & Real Business Cycles.
- Finn E. Kydland & Edward C. Prescott, 1982. "Web interface for "Time to Build and Aggregate Fluctuations"," QM&RBC Codes 4a, Quantitative Macroeconomics & Real Business Cycles.
Tomas Björk & Agatha Murgoci & Xun Yu Zhou, 2014. "Mean–Variance Portfolio Optimization With State-Dependent Risk Aversion," Mathematical Finance, Wiley Blackwell, vol. 24(1), pages 1-24, January.
R. Jiang & D. Saunders & C. Weng, 2022. "The reinforcement learning Kelly strategy," Quantitative Finance, Taylor & Francis Journals, vol. 22(8), pages 1445-1464, August.
Ledoit, Olivier & Wolf, Michael, 2004. "A well-conditioned estimator for large-dimensional covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 88(2), pages 365-411, February.
- Ledoit, Olivier & Wolf, Michael, 2000. "A well conditioned estimator for large dimensional covariance matrices," DES - Working Papers. Statistics and Econometrics. WS 10087, Universidad Carlos III de Madrid. Departamento de EstadÃstica.
Merton, Robert C, 1969. "Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case," The Review of Economics and Statistics, MIT Press, vol. 51(3), pages 247-257, August.
Ledoit, Olivier & Wolf, Michael, 2003. "Improved estimation of the covariance matrix of stock returns with an application to portfolio selection," Journal of Empirical Finance, Elsevier, vol. 10(5), pages 603-621, December.
- Ledoit, Olivier & Wolf, Michael, 2000. "Improved estimation of the covariance matrix of stock returns with an application to portfolio selection," DES - Working Papers. Statistics and Econometrics. WS 10089, Universidad Carlos III de Madrid. Departamento de EstadÃstica.
- Olivier Ledoit & Michael Wolf, 2001. "Improved estimation of the covariance matrix of stock returns with an application to portofolio selection," Economics Working Papers 586, Department of Economics and Business, Universitat Pompeu Fabra.
Kirby, Chris & Ostdiek, Barbara, 2012. "It’s All in the Timing: Simple Active Portfolio Strategies that Outperform Naïve Diversification," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 47(2), pages 437-467, April.
Xin Guo & Renyuan Xu & Thaleia Zariphopoulou, 2022. "Entropy Regularization for Mean Field Games with Learning," Mathematics of Operations Research, INFORMS, vol. 47(4), pages 3239-3260, November.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Yuanyuan Zhang & Xiang Li & Sini Guo, 2018. "Portfolio selection problems with Markowitz’s mean–variance framework: a review of literature," Fuzzy Optimization and Decision Making, Springer, vol. 17(2), pages 125-158, June.
Weilong Liu & Yanchu Liu, 2025. "Covariance Matrix Estimation for Positively Correlated Assets," Papers 2507.01545, arXiv.org.
Tu, Xueyong & Li, Bin, 2024. "Robust portfolio selection with smart return prediction," Economic Modelling, Elsevier, vol. 135(C).
Zhou, Zhongbao & Xiao, Helu & Yin, Jialing & Zeng, Ximei & Lin, Ling, 2016. "Pre-commitment vs. time-consistent strategies for the generalized multi-period portfolio optimization with stochastic cash flows," Insurance: Mathematics and Economics, Elsevier, vol. 68(C), pages 187-202.
Ruili Sun & Tiefeng Ma & Shuangzhe Liu & Milind Sathye, 2019. "Improved Covariance Matrix Estimation for Portfolio Risk Measurement: A Review," JRFM, MDPI, vol. 12(1), pages 1-34, March.
Lim Hao Shen Keith, 2024. "Covariance Matrix Analysis for Optimal Portfolio Selection," Papers 2407.08748, arXiv.org.
Johannes Bock, 2018. "An updated review of (sub-)optimal diversification models," Papers 1811.08255, arXiv.org.
De Gennaro Aquino, Luca & Sornette, Didier & Strub, Moris S., 2023. "Portfolio selection with exploration of new investment assets," European Journal of Operational Research, Elsevier, vol. 310(2), pages 773-792.
Shi, Fangquan & Shu, Lianjie & He, Fangyi & Huang, Wenpo, 2025. "Improving minimum-variance portfolio through shrinkage of large covariance matrices," Economic Modelling, Elsevier, vol. 144(C).
Luca De Gennaro Aquino & Sascha Desmettre & Yevhen Havrylenko & Mogens Steffensen, 2024. "Equilibrium control theory for Kihlstrom-Mirman preferences in continuous time," Papers 2407.16525, arXiv.org, revised Oct 2024.
Istvan Varga-Haszonits & Fabio Caccioli & Imre Kondor, 2016. "Replica approach to mean-variance portfolio optimization," Papers 1606.08679, arXiv.org.
Penaranda, Francisco, 2007. "Portfolio choice beyond the traditional approach," LSE Research Online Documents on Economics 24481, London School of Economics and Political Science, LSE Library.
Chen, Jia & Li, Degui & Linton, Oliver, 2019. "A new semiparametric estimation approach for large dynamic covariance matrices with multiple conditioning variables," Journal of Econometrics, Elsevier, vol. 212(1), pages 155-176.
- Jia Chen & Degui Li & Oliver Linton, 2018. "A New Semiparametric Estimation Approach for Large Dynamic Covariance Matrices with Multiple Conditioning Variables," Discussion Papers 18/14, Department of Economics, University of York.
- Chen, J. & Li, D. & Linton, O., 2018. "A New Semiparametric Estimation Approach for Large Dynamic Covariance Matrices with Multiple Conditioning Variables," Cambridge Working Papers in Economics 1876, Faculty of Economics, University of Cambridge.
Min Dai & Yuchao Dong & Yanwei Jia & Xun Yu Zhou, 2023. "Data-Driven Merton's Strategies via Policy Randomization," Papers 2312.11797, arXiv.org, revised Feb 2026.
Yan, Cheng & Zhang, Huazhu, 2017. "Mean-variance versus naïve diversification: The role of mispricing," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 48(C), pages 61-81.
Alex Shkolnik & Alec Kercheval & Hubeyb Gurdogan & Lisa R. Goldberg & Haim Bar, 2025. "Portfolio selection revisited," Annals of Operations Research, Springer, vol. 346(1), pages 137-155, March.
Long Zhao & Deepayan Chakrabarti & Kumar Muthuraman, 2019. "Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio," Operations Research, INFORMS, vol. 67(4), pages 965-983, July.
Qiao, W. & Bu, D. & Gibberd, A. & Liao, Y. & Wen, T. & Li, E., 2023. "When “time varying” volatility meets “transaction cost” in portfolio selection," Journal of Empirical Finance, Elsevier, vol. 73(C), pages 220-237.
Raymond Kan & Xiaolu Wang & Guofu Zhou, 2022. "Optimal Portfolio Choice with Estimation Risk: No Risk-Free Asset Case," Management Science, INFORMS, vol. 68(3), pages 2047-2068, March.
Min Dai & Hanqing Jin & Steven Kou & Yuhong Xu, 2021. "A Dynamic Mean-Variance Analysis for Log Returns," Management Science, INFORMS, vol. 67(2), pages 1093-1108, February.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-BIG-2025-06-30 (Big Data)
NEP-CMP-2025-06-30 (Computational Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2505.07537. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

The Exploratory Multi-Asset Mean-Variance Portfolio Selection using Reinforcement Learning

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data