Continuous‐time mean–variance portfolio selection: A reinforcement learning framework

My bibliography Save this article

Continuous‐time mean–variance portfolio selection: A reinforcement learning framework

Author

Listed:

Haoran Wang
Xun Yu Zhou

Registered:

Abstract

We approach the continuous‐time mean–variance portfolio selection with reinforcement learning (RL). The problem is to achieve the best trade‐off between exploration and exploitation, and is formulated as an entropy‐regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time‐decaying variance. We then prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm and its variant outperform both traditional and deep neural network based algorithms in our simulation and empirical studies.

Suggested Citation

Haoran Wang & Xun Yu Zhou, 2020. "Continuous‐time mean–variance portfolio selection: A reinforcement learning framework," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1273-1308, October.

Handle: RePEc:bla:mathfi:v:30:y:2020:i:4:p:1273-1308
DOI: 10.1111/mafi.12281

Download full text from publisher

References listed on IDEAS

R. H. Strotz, 1955. "Myopia and Inconsistency in Dynamic Utility Maximization," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 23(3), pages 165-180.
Hutchinson, James M & Lo, Andrew W & Poggio, Tomaso, 1994. "A Nonparametric Approach to Pricing and Hedging Derivative Securities via Learning Networks," Journal of Finance, American Finance Association, vol. 49(3), pages 851-889, July.
- James M. Hutchinson & Andrew W. Lo & Tomaso Poggio, 1994. "A Nonparametric Approach to Pricing and Hedging Derivative Securities Via Learning Networks," NBER Working Papers 4718, National Bureau of Economic Research, Inc.
Duan Li & Wan‐Lung Ng, 2000. "Optimal Dynamic Portfolio Selection: Multiperiod Mean‐Variance Formulation," Mathematical Finance, Wiley Blackwell, vol. 10(3), pages 387-406, July.
Mannor, Shie & Tsitsiklis, John N., 2013. "Algorithmic aspects of mean–variance optimization in Markov decision processes," European Journal of Operational Research, Elsevier, vol. 231(3), pages 645-653.
Haoran Wang, 2019. "Large scale continuous-time mean-variance portfolio allocation via reinforcement learning," Papers 1907.11718, arXiv.org, revised Aug 2019.
David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Xiaofei Shi & Daran Xu & Zhanhao Zhang, 2021. "Deep Learning Algorithms for Hedging with Frictions," Papers 2111.01931, arXiv.org, revised Dec 2022.
Yuling Max Chen & Bin Li & David Saunders, 2025. "Exploratory Mean-Variance Portfolio Optimization with Regime-Switching Market Dynamics," Papers 2501.16659, arXiv.org.
Xuefeng Gao & Lingfei Li & Xun Yu Zhou, 2024. "Reinforcement Learning for Jump-Diffusions, with Financial Applications," Papers 2405.16449, arXiv.org, revised Jan 2025.
Xiaofei Shi & Daran Xu & Zhanhao Zhang, 2023. "Deep learning algorithms for hedging with frictions," Digital Finance, Springer, vol. 5(1), pages 113-147, March.
Min Dai & Yuchao Dong & Yanwei Jia & Xun Yu Zhou, 2023. "Data-Driven Merton's Strategies via Policy Randomization," Papers 2312.11797, arXiv.org, revised May 2025.
Chen Ziyi & Gu Jia-wen, 2025. "Exploratory Utility Maximization Problem with Tsallis Entropy," Papers 2502.01269, arXiv.org.
Xia Han & Ruodu Wang & Xun Yu Zhou, 2022. "Choquet regularization for reinforcement learning," Papers 2208.08497, arXiv.org.
Xiangyu Cui & Xun Li & Yun Shi & Si Zhao, 2023. "Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning," Papers 2312.15385, arXiv.org.
Jiang, Yifu & Olmo, Jose & Atwi, Majed, 2025. "High-dimensional multi-period portfolio allocation using deep reinforcement learning," International Review of Economics & Finance, Elsevier, vol. 98(C).
Wing Fung Chong & Haoen Cui & Yuxuan Li, 2021. "Pseudo-Model-Free Hedging for Variable Annuities via Deep Reinforcement Learning," Papers 2107.03340, arXiv.org, revised Oct 2022.
De Gennaro Aquino, Luca & Sornette, Didier & Strub, Moris S., 2023. "Portfolio selection with exploration of new investment assets," European Journal of Operational Research, Elsevier, vol. 310(2), pages 773-792.
Magni, Carlo Alberto & Marchioni, Andrea & Baschieri, Davide, 2023. "The Attribution Matrix and the joint use of Finite Change Sensitivity Index and Residual Income for value-based performance measurement," European Journal of Operational Research, Elsevier, vol. 306(2), pages 872-892.
Min Dai & Hanqing Jin & Xi Yang, 2024. "Data-driven Option Pricing," Papers 2401.11158, arXiv.org.
Carlo Alberto Magni & Andrea Marchioni, 2022. "Performance attribution, time-weighted rate of return, and clean finite change sensitivity index," Journal of Asset Management, Palgrave Macmillan, vol. 23(1), pages 62-72, February.
Shanyu Han & Yang Liu & Xiang Yu, 2025. "Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions," Papers 2505.04553, arXiv.org, revised May 2025.
Yu Li & Yuhan Wu & Shuhua Zhang, 2025. "The Exploratory Multi-Asset Mean-Variance Portfolio Selection using Reinforcement Learning," Papers 2505.07537, arXiv.org.
Sang Hu & Zihan Zhou, 2024. "Exploratory Dividend Optimization with Entropy Regularization," JRFM, MDPI, vol. 17(1), pages 1-23, January.
Alexandre Carbonneau & Fr'ed'eric Godin, 2021. "Deep equal risk pricing of financial derivatives with non-translation invariant risk measures," Papers 2107.11340, arXiv.org.
Dong-Mei Zhu & Jia-Wen Gu & Feng-Hui Yu & Tak-Kuen Siu & Wai-Ki Ching, 2021. "Optimal pairs trading with dynamic mean-variance objective," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 94(1), pages 145-168, August.
Jiang, Yifu & Olmo, Jose & Atwi, Majed, 2024. "Deep reinforcement learning for portfolio selection," Global Finance Journal, Elsevier, vol. 62(C).

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Haoran Wang & Shi Yu, 2021. "Robo-Advising: Enhancing Investment with Inverse Optimization and Deep Reinforcement Learning," Papers 2105.09264, arXiv.org.
Haoran Wang & Xun Yu Zhou, 2019. "Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework," Papers 1904.11392, arXiv.org, revised May 2019.
Xiangyu Cui & Xun Li & Yun Shi & Si Zhao, 2023. "Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning," Papers 2312.15385, arXiv.org.
Xiang Meng, 2019. "Dynamic Mean-Variance Portfolio Optimisation," Papers 1907.03093, arXiv.org.
Zhou Fang, 2023. "Continuous-Time Path-Dependent Exploratory Mean-Variance Portfolio Construction," Papers 2303.02298, arXiv.org.
Xiangyu Cui & Xun Li & Duan Li & Yun Shi, 2014. "Time Consistent Behavior Portfolio Policy for Dynamic Mean-Variance Formulation," Papers 1408.6070, arXiv.org, revised Aug 2015.
Zhang, Caibin & Liang, Zhibin, 2022. "Optimal time-consistent reinsurance and investment strategies for a jump–diffusion financial market without cash," The North American Journal of Economics and Finance, Elsevier, vol. 59(C).
Li Xia, 2020. "Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2808-2827, December.
Y. Zhang & Z. Jin & J. Wei & G. Yin, 2022. "Mean-variance portfolio selection with dynamic attention behavior in a hidden Markov model," Papers 2205.08743, arXiv.org.
Amirhosein Mosavi & Yaser Faghan & Pedram Ghamisi & Puhong Duan & Sina Faizollahzadeh Ardabili & Ely Salwana & Shahab S. Band, 2020. "Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics," Mathematics, MDPI, vol. 8(10), pages 1-42, September.
Zhiping Chen & Liyuan Wang & Ping Chen & Haixiang Yao, 2019. "Continuous-Time Mean–Variance Optimization For Defined Contribution Pension Funds With Regime-Switching," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 22(06), pages 1-33, September.
Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
Ying Hu & Hanqing Jin & Xun Yu Zhou, 2020. "Consistent Investment of Sophisticated Rank-Dependent Utility Agents in Continuous Time," Working Papers hal-02624308, HAL.
Li, Yongwu & Li, Zhongfei, 2013. "Optimal time-consistent investment and reinsurance strategies for mean–variance insurers with state dependent risk aversion," Insurance: Mathematics and Economics, Elsevier, vol. 53(1), pages 86-97.
Huy Chau & Duy Nguyen & Thai Nguyen, 2024. "Continuous-time optimal investment with portfolio constraints: a reinforcement learning approach," Papers 2412.10692, arXiv.org.
Bodo Herzog & Sufyan Osamah, 2019. "Reverse Engineering of Option Pricing: An AI Application," IJFS, MDPI, vol. 7(4), pages 1-12, November.
Zilan Liu & Yijun Wang & Ya Huang & Jieming Zhou, 2022. "Optimal Time-Consistent Investment and Premium Control Strategies for Insurers with Constraint under the Heston Model," Mathematics, MDPI, vol. 10(7), pages 1-22, March.
Fießinger, Felix & Stadje, Mitja, 2025. "Time-consistent asset allocation for risk measures in a Lévy market," European Journal of Operational Research, Elsevier, vol. 321(2), pages 676-695.
Felix Fie{ss}inger & Mitja Stadje, 2023. "Time-Consistent Asset Allocation for Risk Measures in a L\'evy Market," Papers 2305.09471, arXiv.org, revised Oct 2024.
Bingyan Han & Chi Seng Pun & Hoi Ying Wong, 2021. "Robust state-dependent mean–variance portfolio selection: a closed-loop approach," Finance and Stochastics, Springer, vol. 25(3), pages 529-561, July.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:mathfi:v:30:y:2020:i:4:p:1273-1308. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0960-1627 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Continuous‐time mean–variance portfolio selection: A reinforcement learning framework

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data