IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2604.22188.html

Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints

Author

Listed:
  • Thai Nguyen
  • Pertiny Nkuize

Abstract

We study the problem of optimal portfolio selection under stochastic volatility within a continuous time reinforcement learning framework with portfolio constraints. Exploration is modeled through entropy-regularized relaxed controls, where the investor selects probability distributions over admissible portfolio allocations rather than deterministic strategies. Using dynamic programming arguments, we derive the associated entropy-regularized Hamilton-Jacobi-Bellman equation, whose Hamiltonian involves optimization over probability measures supported on a compact control set. We show that the optimal exploratory policy takes the form of a truncated Gaussian distribution characterized by spatial derivatives of the solution of the resulting nonlinear quasilinear parabolic partial differential equation. Under suitable structural conditions on the model coefficients, we prove the existence of classical solutions to this nonlinear HJB equation for the value function. We then establish a verification theorem and analyze the policy-improvement structure induced by the entropy-regularized Hamiltonian, showing how the resulting sequence of PDEs provides a continuous-time interpretation of actor-critic learning dynamics. Finally, our PDE analysis with a semi-closed form of optimal value and optimal policy enables the design of an implementable reinforcement learning algorithm by recasting the optimal problem in a martingale framework.

Suggested Citation

  • Thai Nguyen & Pertiny Nkuize, 2026. "Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints," Papers 2604.22188, arXiv.org.
  • Handle: RePEc:arx:papers:2604.22188
    as

    Download full text from publisher

    File URL: https://arxiv.org/pdf/2604.22188
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bekiros, Stelios D., 2010. "Heterogeneous trading strategies with adaptive fuzzy Actor-Critic reinforcement learning: A behavioral approach," Journal of Economic Dynamics and Control, Elsevier, vol. 34(6), pages 1153-1170, June.
    2. Duffie, Darrel & Lions, Pierre-Louis, 1992. "PDE solutions of stochastic differential utility," Journal of Mathematical Economics, Elsevier, vol. 21(6), pages 577-606.
    3. Chiarella, Carl & He, Xue-Zhong & Wei, Lijian, 2015. "Learning, information processing and order submission in limit order markets," Journal of Economic Dynamics and Control, Elsevier, vol. 61(C), pages 245-268.
    4. Arifovic, Jasmina & He, Xue-zhong & Wei, Lijian, 2022. "Machine learning and speed in high-frequency trading," Journal of Economic Dynamics and Control, Elsevier, vol. 139(C).
    5. George Chacko & Luis M. Viceira, 2005. "Dynamic Consumption and Portfolio Choice with Stochastic Volatility in Incomplete Markets," The Review of Financial Studies, Society for Financial Studies, vol. 18(4), pages 1369-1402.
    6. Bandyopadhyay, Arka Prava & Maliar, Lilia, 2026. "Reinforcement learning for household finance: Designing policy via responsiveness," Journal of Economic Dynamics and Control, Elsevier, vol. 182(C).
    7. Holger Kraft, 2005. "Optimal portfolios and Heston's stochastic volatility model: an explicit solution for power utility," Quantitative Finance, Taylor & Francis Journals, vol. 5(3), pages 303-313.
    8. Chau, Huy & Nguyen, Duy & Nguyen, Thai, 2026. "Continuous-time optimal investment with portfolio constraints: A reinforcement learning approach," European Journal of Operational Research, Elsevier, vol. 328(3), pages 1068-1092.
    9. Pascal J. Maenhout, 2004. "Robust Portfolio Rules and Asset Pricing," The Review of Financial Studies, Society for Financial Studies, vol. 17(4), pages 951-983.
    10. Lars Peter Hansen & Thomas J Sargent, 2014. "Robust Control and Model Uncertainty," World Scientific Book Chapters, in: UNCERTAINTY WITHIN ECONOMIC MODELS, chapter 5, pages 145-154, World Scientific Publishing Co. Pte. Ltd..
    11. Duffie, Darrell & Epstein, Larry G, 1992. "Stochastic Differential Utility," Econometrica, Econometric Society, vol. 60(2), pages 353-394, March.
    12. Maenhout, Pascal J., 2006. "Robust portfolio rules and detection-error probabilities for a mean-reverting risk premium," Journal of Economic Theory, Elsevier, vol. 128(1), pages 136-163, May.
    13. Merton, Robert C., 1980. "On estimating the expected return on the market : An exploratory investigation," Journal of Financial Economics, Elsevier, vol. 8(4), pages 323-361, December.
    14. Haoran Wang & Xun Yu Zhou, 2020. "Continuous‐time mean–variance portfolio selection: A reinforcement learning framework," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1273-1308, October.
    15. Jun Liu, 2007. "Portfolio Selection in Stochastic Environments," The Review of Financial Studies, Society for Financial Studies, vol. 20(1), pages 1-39, January.
    16. John Y. Campbell & Luis M. Viceira, 1999. "Consumption and Portfolio Decisions when Expected Returns are Time Varying," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 114(2), pages 433-495.
    17. Eduardo Abi Jaber, 2024. "Simulation of square-root processes made simple: applications to the Heston model," Post-Print hal-04839193, HAL.
    18. Wu, Bo & Li, Lingfei, 2024. "Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market," Journal of Economic Dynamics and Control, Elsevier, vol. 158(C).
    19. Yanwei Jia & Xun Yu Zhou, 2021. "Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms," Papers 2111.11232, arXiv.org, revised Jul 2022.
    20. Duffie, Darrell & Epstein, Larry G, 1992. "Asset Pricing with Stochastic Differential Utility," The Review of Financial Studies, Society for Financial Studies, vol. 5(3), pages 411-436.
    21. N. El Karoui & S. Peng & M. C. Quenez, 1997. "Backward Stochastic Differential Equations in Finance," Mathematical Finance, Wiley Blackwell, vol. 7(1), pages 1-71, January.
    22. Yanwei Jia & Xun Yu Zhou, 2021. "Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach," Papers 2108.06655, arXiv.org, revised Feb 2022.
    23. Cuoco, Domenico, 1997. "Optimal Consumption and Equilibrium Prices with Portfolio Constraints and Stochastic Income," Journal of Economic Theory, Elsevier, vol. 72(1), pages 33-73, January.
    24. Merton, Robert C, 1969. "Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case," The Review of Economics and Statistics, MIT Press, vol. 51(3), pages 247-257, August.
    25. Kim, Tong Suk & Omberg, Edward, 1996. "Dynamic Nonmyopic Portfolio Behavior," The Review of Financial Studies, Society for Financial Studies, vol. 9(1), pages 141-161.
    26. Heston, Steven L, 1993. "A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options," The Review of Financial Studies, Society for Financial Studies, vol. 6(2), pages 327-343.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wei, Pengyu & Yang, Charles & Zhuang, Yi, 2023. "Robust consumption and portfolio choice with derivatives trading," European Journal of Operational Research, Elsevier, vol. 304(2), pages 832-850.
    2. Min Dai & Yuchao Dong & Yanwei Jia & Xun Yu Zhou, 2023. "Data-Driven Merton's Strategies via Policy Randomization," Papers 2312.11797, arXiv.org, revised Feb 2026.
    3. Marcos Escobar-Anel & Ben Spies & Rudi Zagst, 2024. "Optimal consumption and investment in general affine GARCH models," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 46(3), pages 987-1026, September.
    4. Shigeta, Yuki, 2020. "Gain/loss asymmetric stochastic differential utility," Journal of Economic Dynamics and Control, Elsevier, vol. 118(C).
    5. Holger Kraft & Thomas Seiferling & Frank Thomas Seifried, 2017. "Optimal consumption and investment with Epstein–Zin recursive utility," Finance and Stochastics, Springer, vol. 21(1), pages 187-226, January.
    6. Yiwen Shen & Chenxu Li & Olivier Scaillet & Yueting Jiang, 2026. "Dynamic Portfolio Allocation Under Market Incompleteness and Wealth Effects," Operations Research, INFORMS, vol. 74(1), pages 93-117, January.
    7. Hao Xing, 2017. "Consumption–investment optimization with Epstein–Zin utility in incomplete markets," Finance and Stochastics, Springer, vol. 21(1), pages 227-262, January.
    8. Maenhout, Pascal J., 2006. "Robust portfolio rules and detection-error probabilities for a mean-reverting risk premium," Journal of Economic Theory, Elsevier, vol. 128(1), pages 136-163, May.
    9. Aït-Sahalia, Yacine & Matthys, Felix, 2019. "Robust consumption and portfolio policies when asset prices can jump," Journal of Economic Theory, Elsevier, vol. 179(C), pages 1-56.
    10. Yi, Bo & Li, Zhongfei & Viens, Frederi G. & Zeng, Yan, 2013. "Robust optimal control for an insurer with reinsurance and investment under Heston’s stochastic volatility model," Insurance: Mathematics and Economics, Elsevier, vol. 53(3), pages 601-614.
    11. Suresh M. Sundaresan, 2000. "Continuous‐Time Methods in Finance: A Review and an Assessment," Journal of Finance, American Finance Association, vol. 55(4), pages 1569-1622, August.
    12. Johannes Muhle-Karbe & Max Reppen & H. Mete Soner, 2016. "A Primer on Portfolio Choice with Small Transaction Costs," Papers 1612.01302, arXiv.org, revised May 2017.
    13. Escobar, Marcos & Ferrando, Sebastian & Rubtsov, Alexey, 2018. "Dynamic derivative strategies with stochastic interest rates and model uncertainty," Journal of Economic Dynamics and Control, Elsevier, vol. 86(C), pages 49-71.
    14. Schroder, Mark & Skiadas, Costis, 2005. "Lifetime consumption-portfolio choice under trading constraints, recursive preferences, and nontradeable income," Stochastic Processes and their Applications, Elsevier, vol. 115(1), pages 1-30, January.
    15. Anis Matoussi & Hao Xing, 2016. "Convex duality for stochastic differential utility," Papers 1601.03562, arXiv.org.
    16. Wu, Hui & Ma, Chaoqun & Yue, Shengjie, 2017. "Momentum in strategic asset allocation," International Review of Economics & Finance, Elsevier, vol. 47(C), pages 115-127.
    17. Moreira, Alan & Muir, Tyler, 2019. "Should Long-Term Investors Time Volatility?," Journal of Financial Economics, Elsevier, vol. 131(3), pages 507-527.
    18. Yanwei Jia, 2024. "Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty," Papers 2404.12598, arXiv.org, revised Mar 2026.
    19. Jessica A. Wachter, 2010. "Asset Allocation," Annual Review of Financial Economics, Annual Reviews, vol. 2(1), pages 175-206, December.
    20. Yilie Huang & Yanwei Jia & Xun Yu Zhou, 2024. "Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study," Papers 2412.16175, arXiv.org, revised Mar 2026.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2604.22188. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.