IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2512.04697.html

Continuous-time reinforcement learning for optimal switching over multiple regimes

Author

Listed:
  • Yijie Huang
  • Mengge Li
  • Xiang Yu
  • Zhou Zhou

Abstract

This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes. We consider a type of exploratory formulation under entropy regularization where the agent randomizes both the timing of switches and the selection of regimes through the generator matrix of an associated continuous-time finite-state Markov chain. We establish the well-posedness of the associated system of Hamilton-Jacobi-Bellman (HJB) equations and provide a characterization of the optimal policy. The policy improvement and the convergence of the policy iterations are rigorously established by analyzing the system of equations. We also show the convergence of the value function in the exploratory formulation towards the value function in the classical formulation as the temperature parameter vanishes. Finally, a reinforcement learning algorithm is devised and implemented by invoking the policy evaluation based on the martingale characterization. Our numerical examples with the aid of neural networks illustrate the effectiveness of the proposed RL algorithm.

Suggested Citation

  • Yijie Huang & Mengge Li & Xiang Yu & Zhou Zhou, 2025. "Continuous-time reinforcement learning for optimal switching over multiple regimes," Papers 2512.04697, arXiv.org, revised Dec 2025.
  • Handle: RePEc:arx:papers:2512.04697
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2512.04697
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rene Carmona & Michael Ludkovski, 2008. "Pricing Asset Scheduling Flexibility using Optimal Switching," Applied Mathematical Finance, Taylor & Francis Journals, vol. 15(5-6), pages 405-447.
    2. El Asri, Brahim, 2013. "Stochastic optimal multi-modes switching with a viscosity solution approach," Stochastic Processes and their Applications, Elsevier, vol. 123(2), pages 579-602.
    3. Rene Carmona & Michael Ludkovski, 2010. "Valuation of energy storage: an optimal switching approach," Quantitative Finance, Taylor & Francis Journals, vol. 10(4), pages 359-374.
    4. Jodi Dianetti & Giorgio Ferrari & Renyuan Xu, 2024. "Exploratory Optimal Stopping: A Singular Control Formulation," Papers 2408.09335, arXiv.org, revised Mar 2026.
    5. Wu, Bo & Li, Lingfei, 2024. "Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market," Journal of Economic Dynamics and Control, Elsevier, vol. 158(C).
    6. Xiaoli Wei & Xiang Yu & Fengyi Yuan, 2024. "Unified continuous-time q-learning for mean-field game and mean-field control problems," Papers 2407.04521, arXiv.org, revised Mar 2025.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Daniel Chee & Noufel Frikha & Libo Li, 2026. "A Monotone Limit Approach to Entropy-Regularized American Options," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-05520656, HAL.
    2. Daniel Chee & Noufel Frikha & Libo Li, 2026. "A Monotone Limit Approach to Entropy-Regularized American Options," Papers 2602.18062, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dabadghao, Shaunak S. & Chockalingam, Arun & Soltani, Taimaz & Fransoo, Jan, 2021. "Valuing Switching options with the moving-boundary method," Journal of Economic Dynamics and Control, Elsevier, vol. 127(C).
    2. Dabadghao, Shaunak S. & Chockalingam, Arun & Soltani, Taimaz & Fransoo, Jan C., 2021. "Valuing switching options with the moving-boundary method," Other publications TiSEM 45fe7e78-129f-4d41-ac2f-5, Tilburg University, School of Economics and Management.
    3. Cortazar, Gonzalo & Naranjo, Lorenzo & Sainz, Felipe, 2021. "Optimal decision policy for real options under general Markovian dynamics," European Journal of Operational Research, Elsevier, vol. 288(2), pages 634-647.
    4. Junyan Ye & Hoi Ying Wong & Kyunghyun Park, 2025. "Robust Exploratory Stopping under Ambiguity in Reinforcement Learning," Papers 2510.10260, arXiv.org, revised Apr 2026.
    5. Liangchen Li & Michael Ludkovski, 2018. "Stochastic Switching Games," Papers 1807.03893, arXiv.org.
    6. Somayeh Moazeni & Warren B. Powell & Boris Defourny & Belgacem Bouzaiene-Ayari, 2017. "Parallel Nonstationary Direct Policy Search for Risk-Averse Stochastic Optimization," INFORMS Journal on Computing, INFORMS, vol. 29(2), pages 332-349, May.
    7. Lin Zhao & Sweder van Wijnbergen, 2015. "Asset Pricing in Incomplete Markets: Valuing Gas Storage Capacity," Tinbergen Institute Discussion Papers 15-104/VI/DSF95, Tinbergen Institute.
    8. Anton A. Shardin & Michaela Szölgyenyi, 2016. "Optimal Control Of An Energy Storage Facility Under A Changing Economic Environment And Partial Information," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 19(04), pages 1-27, June.
    9. Gassiat, Paul & Kharroubi, Idris & Pham, Huyên, 2012. "Time discretization and quantization methods for optimal multiple switching problem," Stochastic Processes and their Applications, Elsevier, vol. 122(5), pages 2019-2052.
    10. repec:dau:papers:123456789/11439 is not listed on IDEAS
    11. Erhan Bayraktar & Qi Feng & Zhaoyu Zhang, 2022. "Deep Signature Algorithm for Multi-dimensional Path-Dependent Options," Papers 2211.11691, arXiv.org, revised Jan 2024.
    12. Roxana Dumitrescu & Redouane Silvente & Peter Tankov, 2024. "Price impact and long-term profitability of energy storage," Papers 2410.12495, arXiv.org.
    13. Daniel R. Jiang & Warren B. Powell, 2015. "An Approximate Dynamic Programming Algorithm for Monotone Value Functions," Operations Research, INFORMS, vol. 63(6), pages 1489-1511, December.
    14. Hanfeld, Marc & Schlüter, Stephan, 2016. "Operating a swing option on today's gas markets: How least squares Monte Carlo works and why it is beneficial," FAU Discussion Papers in Economics 10/2016, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    15. Nasini, Stefano & Nessah, Rabia & Wigniolle, Bertrand, 2026. "Learning paths to multi-sector equilibrium: Belief dynamics under uncertain returns to scale," Journal of Mathematical Economics, Elsevier, vol. 122(C).
    16. Nemat Safarov & Colin Atkinson, 2017. "Natural Gas-Fired Power Plants Valuation And Optimization Under Lévy Copulas And Regime Switching," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 20(01), pages 1-38, February.
    17. Yuling Max Chen & Bin Li & David Saunders, 2025. "Exploratory Mean-Variance Portfolio Optimization with Regime-Switching Market Dynamics," Papers 2501.16659, arXiv.org.
    18. Cartea, Álvaro & González-Pedraz, Carlos, 2012. "How much should we pay for interconnecting electricity markets? A real options approach," Energy Economics, Elsevier, vol. 34(1), pages 14-30.
    19. Bastian Felix, 2012. "Gas Storage Valuation: A Comparative Simulation Study," EWL Working Papers 1201, University of Duisburg-Essen, Chair for Management Science and Energy Economics, revised Apr 2014.
    20. Woo, C.K. & Chen, Y. & Olson, A. & Moore, J. & Schlag, N. & Ong, A. & Ho, T., 2017. "Electricity price behavior and carbon trading: New evidence from California," Applied Energy, Elsevier, vol. 204(C), pages 531-543.
    21. Mononen, Lasse, 2025. "On Preference for Simplicity and Probability Weighting," Center for Mathematical Economics Working Papers 748, Center for Mathematical Economics, Bielefeld University.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2512.04697. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.