IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2604.02035.html

Reinforcement Learning for Speculative Trading under Exploratory Framework

Author

Listed:
  • Yun Zhao
  • Alex S. L. Tse
  • Harry Zheng

Abstract

We study a speculative trading problem within the exploratory reinforcement learning (RL) framework of Wang et al. [2020]. The problem is formulated as a sequential optimal stopping problem over entry and exit times under general utility function and price process. We first consider a relaxed version of the problem in which the stopping times are modeled by the jump times of Cox processes driven by bounded, non-randomized intensity controls. Under the exploratory formulation, the agent's randomized control is characterized via the probability measure over the jump intensities, and their objective function is regularized by Shannon's differential entropy. This yields a system of the exploratory HJB equations and Gibbs distributions in closed-form as the optimal policy. Error estimates and convergence of the RL objective to the value function of the original problem are established. Finally, an RL algorithm is designed, and its implementation is showcased in a pairs-trading application.

Suggested Citation

  • Yun Zhao & Alex S. L. Tse & Harry Zheng, 2026. "Reinforcement Learning for Speculative Trading under Exploratory Framework," Papers 2604.02035, arXiv.org.
  • Handle: RePEc:arx:papers:2604.02035
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2604.02035
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Alex S. L. Tse & Harry Zheng, 2023. "Speculative trading, prospect theory and transaction costs," Finance and Stochastics, Springer, vol. 27(1), pages 49-96, January.
    2. Yun Zhao & Harry Zheng, 2025. "Neural Network Convergence for Variational Inequalities," Papers 2509.26535, arXiv.org, revised Oct 2025.
    3. Sebastian Becker & Patrick Cheridito & Arnulf Jentzen & Timo Welti, 2019. "Solving high-dimensional optimal stopping problems using deep learning," Papers 1908.01602, arXiv.org, revised Aug 2021.
    4. Justin Sirignano & Konstantinos Spiliopoulos, 2017. "DGM: A deep learning algorithm for solving partial differential equations," Papers 1708.07469, arXiv.org, revised Sep 2018.
    5. David Silver & Julian Schrittwieser & Karen Simonyan & Ioannis Antonoglou & Aja Huang & Arthur Guez & Thomas Hubert & Lucas Baker & Matthew Lai & Adrian Bolton & Yutian Chen & Timothy Lillicrap & Fan , 2017. "Mastering the game of Go without human knowledge," Nature, Nature, vol. 550(7676), pages 354-359, October.
    6. Haoran Wang & Xun Yu Zhou, 2020. "Continuous‐time mean–variance portfolio selection: A reinforcement learning framework," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1273-1308, October.
    7. Henderson, Vicky & Hobson, David & Tse, Alex S.L., 2018. "Probability weighting, stop-loss and the disposition effect," Journal of Economic Theory, Elsevier, vol. 178(C), pages 360-397.
    8. Yanwei Jia & Xun Yu Zhou, 2021. "Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms," Papers 2111.11232, arXiv.org, revised Jul 2022.
    9. Daya Guo & Dejian Yang & Haowei Zhang & Junxiao Song & Peiyi Wang & Qihao Zhu & Runxin Xu & Ruoyu Zhang & Shirong Ma & Xiao Bi & Xiaokang Zhang & Xingkai Yu & Yu Wu & Z. F. Wu & Zhibin Gou & Zhihong S, 2025. "DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning," Nature, Nature, vol. 645(8081), pages 633-638, September.
    10. Henderson, Vicky & Hobson, David & Tse, Alex S.L., 2017. "Randomized strategies and prospect theory in a dynamic context," Journal of Economic Theory, Elsevier, vol. 168(C), pages 287-300.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Junyan Ye & Hoi Ying Wong & Kyunghyun Park, 2025. "Robust Exploratory Stopping under Ambiguity in Reinforcement Learning," Papers 2510.10260, arXiv.org, revised Apr 2026.
    2. A. Max Reppen & H. Mete Soner & Valentin Tissot-Daguette, 2022. "Deep Stochastic Optimization in Finance," Papers 2205.04604, arXiv.org.
    3. Jirong Zhuang & Deng Ding & Weiguo Lu & Xuan Wu & Gangnan Yuan, 2025. "A Gaussian Process Based Method with Deep Kernel Learning for Pricing High-Dimensional American Options," Computational Economics, Springer;Society for Computational Economics, vol. 66(5), pages 3687-3708, November.
    4. Sebastian Becker & Patrick Cheridito & Arnulf Jentzen, 2020. "Pricing and Hedging American-Style Options with Deep Learning," JRFM, MDPI, vol. 13(7), pages 1-12, July.
    5. Lukas Gonon, 2024. "Deep neural network expressivity for optimal stopping problems," Finance and Stochastics, Springer, vol. 28(3), pages 865-910, July.
    6. A. Max Reppen & H. Mete Soner & Valentin Tissot-Daguette, 2023. "Deep stochastic optimization in finance," Digital Finance, Springer, vol. 5(1), pages 91-111, March.
    7. Philipp Grohs & Arnulf Jentzen & Diyora Salimova, 2022. "Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms," Partial Differential Equations and Applications, Springer, vol. 3(4), pages 1-41, August.
    8. Huy Chau & Duy Nguyen & Thai Nguyen, 2024. "Continuous-time optimal investment with portfolio constraints: a reinforcement learning approach," Papers 2412.10692, arXiv.org.
    9. Kerimkulov, Bekzhan & Šiška, David & Szpruch, Łukasz & Zhang, Yufei, 2025. "Mirror descent for stochastic control problems with measure-valued controls," Stochastic Processes and their Applications, Elsevier, vol. 190(C).
    10. Yuchao Dong, 2022. "Randomized Optimal Stopping Problem in Continuous time and Reinforcement Learning Algorithm," Papers 2208.02409, arXiv.org, revised Sep 2023.
    11. He, Xuedong & Hu, Sang, 2024. "Never stop or never start? Optimal stopping under a mixture of CPT and EUT preferences," Journal of Economic Theory, Elsevier, vol. 222(C).
    12. Vicky Henderson & David Hobson & Matthew Zeng, 2023. "Cautious stochastic choice, optimal stopping and deliberate randomization," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 75(3), pages 887-922, April.
    13. Lukas Gonon, 2022. "Deep neural network expressivity for optimal stopping problems," Papers 2210.10443, arXiv.org.
    14. Beatriz Salvador & Cornelis W. Oosterlee & Remco van der Meer, 2020. "Financial Option Valuation by Unsupervised Learning with Artificial Neural Networks," Mathematics, MDPI, vol. 9(1), pages 1-20, December.
    15. Jasper Rou, 2025. "Time Deep Gradient Flow Method for pricing American options," Papers 2507.17606, arXiv.org.
    16. Xiangyu Cui & Xun Li & Yun Shi & Si Zhao, 2023. "Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning," Papers 2312.15385, arXiv.org.
    17. Serena Della Corte & Laurens Van Mieghem & Antonis Papapantoleon & Jonas Papazoglou-Hennig, 2023. "Machine learning for option pricing: an empirical investigation of network architectures," Papers 2307.07657, arXiv.org, revised Jan 2026.
    18. Christian Beck & Lukas Gonon & Arnulf Jentzen, 2024. "Overcoming the curse of dimensionality in the numerical approximation of high-dimensional semilinear elliptic partial differential equations," Partial Differential Equations and Applications, Springer, vol. 5(6), pages 1-47, December.
    19. Min Dai & Yu Sun & Zuo Quan Xu & Xun Yu Zhou, 2024. "Learning to Optimally Stop Diffusion Processes, with Financial Applications," Papers 2408.09242, arXiv.org, revised Aug 2025.
    20. A. Max Reppen & H. Mete Soner & Valentin Tissot-Daguette, 2022. "Neural Optimal Stopping Boundary," Papers 2205.04595, arXiv.org, revised May 2023.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2604.02035. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.