Robust Exploratory Stopping under Ambiguity in Reinforcement Learning

Robust Exploratory Stopping under Ambiguity in Reinforcement Learning

Author

Listed:

Junyan Ye
Hoi Ying Wong
Kyunghyun Park

Abstract

We propose and analyze a continuous-time robust reinforcement learning framework for optimal stopping under ambiguity. In this framework, an agent chooses a robust exploratory stopping time motivated by two objectives: robust decision-making under ambiguity and learning about the unknown environment. Here, ambiguity refers to considering multiple probability measures dominated by a reference measure, reflecting the agent's awareness that the reference measure representing her learned belief about the environment would be erroneous. Using the $g$-expectation framework, we reformulate the optimal stopping problem under ambiguity as a robust exploratory control problem with Bernoulli distributed controls. We then characterize the optimal Bernoulli distributed control via backward stochastic differential equations and, based on this, construct the robust exploratory stopping time that approximates the optimal stopping time under ambiguity. Last, we establish a policy iteration theorem and implement it as a reinforcement learning algorithm. Numerical experiments demonstrate the convergence, robustness, and scalability of our reinforcement learning algorithm across different levels of ambiguity and exploration.

Suggested Citation

Junyan Ye & Hoi Ying Wong & Kyunghyun Park, 2025. "Robust Exploratory Stopping under Ambiguity in Reinforcement Learning," Papers 2510.10260, arXiv.org, revised Apr 2026.

Handle: RePEc:arx:papers:2510.10260

Download full text from publisher

References listed on IDEAS

Marcel Nutz & Jianfeng Zhang, 2012. "Optimal stopping under adverse nonlinear expectation and related games," Papers 1212.2140, arXiv.org, revised Sep 2015.
Frank Riedel, 2009. "Optimal Stopping With Multiple Priors," Econometrica, Econometric Society, vol. 77(3), pages 857-908, May.
Philip H. Dybvig, 1995. "Dusenberry's Ratcheting of Consumption: Optimal Dynamic Consumption and Investment Given Intolerance for any Decline in Standard of Living," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 62(2), pages 287-313.
Epstein, Larry G. & Schneider, Martin, 2003. "Recursive multiple-priors," Journal of Economic Theory, Elsevier, vol. 113(1), pages 1-31, November.
- Larry G. Epstein & Martin Schneider, 2001. "Recursive Multiple-Priors," RCER Working Papers 485, University of Rochester - Center for Economic Research (RCER).
Sebastian Becker & Patrick Cheridito & Arnulf Jentzen & Timo Welti, 2019. "Solving high-dimensional optimal stopping problems using deep learning," Papers 1908.01602, arXiv.org, revised Aug 2021.
Zengjing Chen & Larry Epstein, 2002. "Ambiguity, Risk, and Asset Returns in Continuous Time," Econometrica, Econometric Society, vol. 70(4), pages 1403-1443, July.
- Zengjing Chen & Larry G. Epstein, 2000. "Ambiguity, risk and asset returns in continuous time," RCER Working Papers 474, University of Rochester - Center for Economic Research (RCER).
Min Dai & Yuchao Dong & Yanwei Jia, 2023. "Learning equilibrium mean‐variance strategy," Mathematical Finance, Wiley Blackwell, vol. 33(4), pages 1166-1212, October.
David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
Justin Sirignano & Konstantinos Spiliopoulos, 2017. "DGM: A deep learning algorithm for solving partial differential equations," Papers 1708.07469, arXiv.org, revised Sep 2018.
Bayraktar, Erhan & Yao, Song, 2011. "Optimal stopping for non-linear expectations--Part II," Stochastic Processes and their Applications, Elsevier, vol. 121(2), pages 212-264, February.
- Bayraktar, Erhan & Yao, Song, 2011. "Optimal stopping for non-linear expectations--Part I," Stochastic Processes and their Applications, Elsevier, vol. 121(2), pages 185-211, February.
David Silver & Julian Schrittwieser & Karen Simonyan & Ioannis Antonoglou & Aja Huang & Arthur Guez & Thomas Hubert & Lucas Baker & Matthew Lai & Adrian Bolton & Yutian Chen & Timothy Lillicrap & Fan , 2017. "Mastering the game of Go without human knowledge," Nature, Nature, vol. 550(7676), pages 354-359, October.
Haoran Wang & Xun Yu Zhou, 2020. "Continuous‐time mean–variance portfolio selection: A reinforcement learning framework," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1273-1308, October.
Jodi Dianetti & Giorgio Ferrari & Renyuan Xu, 2024. "Exploratory Optimal Stopping: A Singular Control Formulation," Papers 2408.09335, arXiv.org, revised Mar 2026.
Wu, Bo & Li, Lingfei, 2024. "Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market," Journal of Economic Dynamics and Control, Elsevier, vol. 158(C).
Lepeltier, J.-P. & Xu, M., 2005. "Penalization method for reflected backward stochastic differential equations with one r.c.l.l. barrier," Statistics & Probability Letters, Elsevier, vol. 75(1), pages 58-66, November.
Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
Yanwei Jia & Xun Yu Zhou, 2021. "Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms," Papers 2111.11232, arXiv.org, revised Jul 2022.
Marinacci, Massimo, 1999. "Limit Laws for Non-additive Probabilities and Their Frequentist Interpretation," Journal of Economic Theory, Elsevier, vol. 84(2), pages 145-195, February.
N. El Karoui & S. Peng & M. C. Quenez, 1997. "Backward Stochastic Differential Equations in Finance," Mathematical Finance, Wiley Blackwell, vol. 7(1), pages 1-71, January.
Yanwei Jia & Xun Yu Zhou, 2021. "Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach," Papers 2108.06655, arXiv.org, revised Feb 2022.
Peter Klibanoff & Massimo Marinacci & Sujoy Mukerji, 2005. "A Smooth Model of Decision Making under Ambiguity," Econometrica, Econometric Society, vol. 73(6), pages 1849-1892, November.
- Sujoy Mukerji & Peter Klibanoff & Northwesern University Massimo Marinacci & Dip. di Satistic e Matematica Applicata & Universita di Torino and ICER, 2002. "A Smooth Model of Decision,Making Under Ambiguity," Economics Series Working Papers 113, University of Oxford, Department of Economics.
- Peter Klibanoff & Massimo Marinacci & Sujoy Mukerji, 2002. "A smooth model of decision making under ambiguity," ICER Working Papers - Applied Mathematics Series 11-2003, ICER - International Centre for Economic Research, revised Apr 2003.
Daya Guo & Dejian Yang & Haowei Zhang & Junxiao Song & Peiyi Wang & Qihao Zhu & Runxin Xu & Ruoyu Zhang & Shirong Ma & Xiao Bi & Xiaokang Zhang & Xingkai Yu & Yu Wu & Z. F. Wu & Zhibin Gou & Zhihong S, 2025. "DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning," Nature, Nature, vol. 645(8081), pages 633-638, September.
Noufel Frikha & Libo Li & Daniel Chee, 2025. "An Entropy Regularized BSDE Approach to Bermudan Options and Games," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-05265653, HAL.
Andres Max Reppen & Halil Mete Soner & Valentin Tissot‐Daguette, 2025. "Neural optimal stopping boundary," Mathematical Finance, Wiley Blackwell, vol. 35(2), pages 441-469, April.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Yun Zhao & Alex S. L. Tse & Harry Zheng, 2026. "Reinforcement Learning for Speculative Trading under Exploratory Framework," Papers 2604.02035, arXiv.org.
Yu‐Jui Huang & Xiang Yu, 2021. "Optimal stopping under model ambiguity: A time‐consistent equilibrium approach," Mathematical Finance, Wiley Blackwell, vol. 31(3), pages 979-1012, July.
Bender, Christian & Thuan, Nguyen Tran, 2026. "Continuous time reinforcement learning: A random measure approach," Stochastic Processes and their Applications, Elsevier, vol. 194(C).
Thai Nguyen & Pertiny Nkuize, 2026. "Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints," Papers 2604.22188, arXiv.org.
Chau, Huy & Nguyen, Duy & Nguyen, Thai, 2026. "Continuous-time optimal investment with portfolio constraints: A reinforcement learning approach," European Journal of Operational Research, Elsevier, vol. 328(3), pages 1068-1092.
Albrecht, E & Baum, GÃ¼nter & Birsa, R & Bradamante, F & Bressan, A & Chapiro, A & Cicuttin, A & Ciliberti, P & Colavita, A & Costa, S & Crespo, M & Cristaudo, P & Dalla Torre, S & Diaz, V & Duic, V &, 2010. "Results from COMPASS RICH-1," Center for Mathematical Economics Working Papers 535, Center for Mathematical Economics, Bielefeld University.
Yuling Max Chen & Bin Li & David Saunders, 2025. "Exploratory Mean-Variance Portfolio Optimization with Regime-Switching Market Dynamics," Papers 2501.16659, arXiv.org.
Huy Chau & Duy Nguyen & Thai Nguyen, 2024. "Continuous-time optimal investment with portfolio constraints: a reinforcement learning approach," Papers 2412.10692, arXiv.org.
Yuchao Dong, 2022. "Randomized Optimal Stopping Problem in Continuous time and Reinforcement Learning Algorithm," Papers 2208.02409, arXiv.org, revised Sep 2023.
Luis H. R. Alvarez E. & Soren Christensen, 2019. "A Class of Solvable Multidimensional Stopping Problems in the Presence of Knightian Uncertainty," Papers 1907.04046, arXiv.org.
Luis H. R. Alvarez E. & Soren Christensen, 2019. "The Impact of Ambiguity on the Optimal Exercise Timing of Integral Option Contracts," Papers 1906.07533, arXiv.org.
Miao, Jianjun & Wang, Neng, 2011. "Risk, uncertainty, and option exercise," Journal of Economic Dynamics and Control, Elsevier, vol. 35(4), pages 442-461, April.
- Jianjun Miao & Neng Wang, 2004. "Risk, Uncertainty, and Option Exercise," Boston University - Department of Economics - The Institute for Economic Development Working Papers Series dp-136, Boston University - Department of Economics.
- Jianjun Miao & Neng Wang, 2010. "Risk, uncertainty,and option exercise," Boston University - Department of Economics - Working Papers Series WP2010-029, Boston University - Department of Economics.
- Jianjun Miao, 2004. "Risk, uncertainty and option exercise," Finance 0410013, University Library of Munich, Germany.
- Jianjun Miao & Neng Wang, 2007. "Risk, Uncertainty, and Option Exercise," Boston University - Department of Economics - Working Papers Series WP2007-016, Boston University - Department of Economics.
Vorbrink, Jörg, 2014. "Financial markets with volatility uncertainty," Journal of Mathematical Economics, Elsevier, vol. 53(C), pages 64-78.
Yuling Max Chen & Bin Li & David Saunders, 2025. "Exploratory Mean-Variance with Jumps: An Equilibrium Approach," Papers 2512.09224, arXiv.org.
Min Dai & Yu Sun & Zuo Quan Xu & Xun Yu Zhou, 2024. "Learning to Optimally Stop Diffusion Processes, with Financial Applications," Papers 2408.09242, arXiv.org, revised Aug 2025.
Kast, Robert & Lapied, André & Roubaud, David, 2014. "Modelling under ambiguity with dynamically consistent Choquet random walks and Choquet–Brownian motions," Economic Modelling, Elsevier, vol. 38(C), pages 495-503.
Daniele Pennesi, 2013. "Asset Prices in an Ambiguous Economy," Carlo Alberto Notebooks 315, Collegio Carlo Alberto.
Andrea Mazzon & Peter Tankov, 2024. "Optimal stopping and divestment timing under scenario ambiguity and learning," Papers 2408.09349, arXiv.org, revised Oct 2025.
Soren Christensen & Luis H. R. Alvarez E, 2019. "A Solvable Two-dimensional Optimal Stopping Problem in the Presence of Ambiguity," Papers 1905.05429, arXiv.org.
Chen Ziyi & Gu Jia-wen, 2025. "Exploratory Utility Maximization Problem with Tsallis Entropy," Papers 2502.01269, arXiv.org.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-CMP-2025-10-20 (Computational Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2510.10260. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Robust Exploratory Stopping under Ambiguity in Reinforcement Learning

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data