An Adaptive Sampling Algorithm for Solving Markov Decision Processes

My bibliography Save this article

An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Author

Listed:

Hyeong Soo Chang
(Department of Computer Science and Engineering, Sogang University, Seoul, Korea)
Michael C. Fu
(Robert H. Smith School of Business, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742)
Jiaqiao Hu
(Department of Electrical and Computer Engineering, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742)
Steven I. Marcus
(Department of Electrical and Computer Engineering, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742)

Registered:

Abstract

Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased estimator, whose bias is bounded by a quantity that converges to zero at rate (ln N )/ N , where N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is O (( |A|N ) H ), independent of the size of the state space, where | A | is the size of the action space and H is the horizon length. The algorithm can be used to create an approximate receding horizon control to solve infinite-horizon MDPs. To illustrate the algorithm, computational results are reported on simple examples from inventory control.

Suggested Citation

Hyeong Soo Chang & Michael C. Fu & Jiaqiao Hu & Steven I. Marcus, 2005. "An Adaptive Sampling Algorithm for Solving Markov Decision Processes," Operations Research, INFORMS, vol. 53(1), pages 126-139, February.

Handle: RePEc:inm:oropre:v:53:y:2005:i:1:p:126-139
DOI: 10.1287/opre.1040.0145

Download full text from publisher

References listed on IDEAS

Broadie, Mark & Glasserman, Paul, 1997. "Pricing American-style securities using simulation," Journal of Economic Dynamics and Control, Elsevier, vol. 21(8-9), pages 1323-1352, June.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Tschernutter, Daniel & Feuerriegel, Stefan, 2025. "Data-driven dynamic police patrolling: An efficient Monte Carlo tree search," European Journal of Operational Research, Elsevier, vol. 321(1), pages 177-191.
Mohammed Shahid Abdulla & Shalabh Bhatnagar, 2016. "Multi-armed bandits based on a variant of Simulated Annealing," Indian Journal of Pure and Applied Mathematics, Springer, vol. 47(2), pages 195-212, June.
Arnoud V. den Boer & Bert Zwart, 2015. "Dynamic Pricing and Learning with Finite Inventories," Operations Research, INFORMS, vol. 63(4), pages 965-978, August.
Rishabh Gupta & Qi Zhang, 2022. "Decomposition and Adaptive Sampling for Data-Driven Inverse Linear Optimization," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2720-2735, September.
William L. Cooper & Bharath Rangarajan, 2012. "Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models," Operations Research, INFORMS, vol. 60(5), pages 1267-1281, October.
Oleg Szehr, 2021. "Hedging of Financial Derivative Contracts via Monte Carlo Tree Search," Papers 2102.06274, arXiv.org, revised Apr 2021.
Daniel R. Jiang & Lina Al-Kanj & Warren B. Powell, 2020. "Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds," Operations Research, INFORMS, vol. 68(6), pages 1678-1697, November.
Woonghee Tim Huh & Paat Rusmevichientong, 2014. "Online Sequential Optimization with Biased Gradients: Theory and Applications to Censored Demand," INFORMS Journal on Computing, INFORMS, vol. 26(1), pages 150-159, February.
Michael C. Fu, 2019. "Simulation-Based Algorithms for Markov Decision Processes: Monte Carlo Tree Search from AlphaGo to AlphaZero," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 36(06), pages 1-25, December.
Fei, Xin & Branke, Jürgen & Gülpınar, Nalân, 2025. "Dynamic pharmaceutical product portfolio management with flexible resource profiles," European Journal of Operational Research, Elsevier, vol. 324(1), pages 308-323.
Ronald Ortner, 2013. "Adaptive aggregation for reinforcement learning in average reward Markov decision processes," Annals of Operations Research, Springer, vol. 208(1), pages 321-336, September.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

O. Samimi & Z. Mardani & S. Sharafpour & F. Mehrdoust, 2017. "LSM Algorithm for Pricing American Option Under Heston–Hull–White’s Stochastic Volatility Model," Computational Economics, Springer;Society for Computational Economics, vol. 50(2), pages 173-187, August.
Grosen, Anders & Lochte Jorgensen, Peter, 2000. "Fair valuation of life insurance liabilities: The impact of interest rate guarantees, surrender options, and bonus policies," Insurance: Mathematics and Economics, Elsevier, vol. 26(1), pages 37-57, February.
E. Nasakkala & J. Keppo, 2008. "Hydropower with Financial Information," Applied Mathematical Finance, Taylor & Francis Journals, vol. 15(5-6), pages 503-529.
Chi H. Truong, 2014. "A Two Factor Model for Water Prices and Its Implications for Evaluating Real Options and Other Water Price Derivatives," Canadian Journal of Agricultural Economics/Revue canadienne d'agroeconomie, Canadian Agricultural Economics Society/Societe canadienne d'agroeconomie, vol. 62(1), pages 23-45, March.
Augusto Castillo, 2004. "Firm and Corporate Bond Valuation: A Simulation Dynamic Programming Approach," Latin American Journal of Economics-formerly Cuadernos de Economía, Instituto de Economía. Pontificia Universidad Católica de Chile., vol. 41(124), pages 345-360.
Semih Yon & Cafer Erhan Bozdag, 2014. "Test of Log-Normal Process with Importance Sampling for Options Pricing," Proceedings of Economics and Finance Conferences 0401571, International Institute of Social and Economic Sciences.
repec:hum:wpaper:sfb649dp2006-051 is not listed on IDEAS
Anne Laure Bronstein & Gilles Pagès & Jacques Portès, 2013. "Multi-asset American Options and Parallel Quantization," Methodology and Computing in Applied Probability, Springer, vol. 15(3), pages 547-561, September.
Beveridge, Christopher & Joshi, Mark & Tang, Robert, 2013. "Practical policy iteration: Generic methods for obtaining rapid and tight bounds for Bermudan exotic derivatives using Monte Carlo simulation," Journal of Economic Dynamics and Control, Elsevier, vol. 37(7), pages 1342-1361.
Denis Belomestny & Grigori Milstein & Vladimir Spokoiny, 2009. "Regression methods in pricing American and Bermudan options using consumption processes," Quantitative Finance, Taylor & Francis Journals, vol. 9(3), pages 315-327.
- Belomestny, Denis & Milstein, Grigori N. & Spokoiny, Vladimir, 2006. "Regression methods in pricing American and Bermudan options using consumption processes," SFB 649 Discussion Papers 2006-051, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
David Heath & Eckhard Platen, 2002. "A variance reduction technique based on integral representations," Quantitative Finance, Taylor & Francis Journals, vol. 2(5), pages 362-369.
- David Heath & Eckhard Platen, 2002. "A Variance Reduction Technique Based on Integral Representations," Research Paper Series 75, Quantitative Finance Research Centre, University of Technology, Sydney.
Lim, Terence & Lo, Andrew W. & Merton, Robert C. & Scholes, Myron S., 2006. "The Derivatives Sourcebook," Foundations and Trends(R) in Finance, now publishers, vol. 1(5–6), pages 365-572, April.
N. Hilber & N. Reich & C. Schwab & C. Winter, 2009. "Numerical methods for Lévy processes," Finance and Stochastics, Springer, vol. 13(4), pages 471-500, September.
Andreas Thomann, 2021. "Multi-asset scenario building for trend-following trading strategies," Annals of Operations Research, Springer, vol. 299(1), pages 293-315, April.
Belomestny, Denis & Kolodko, Anastasia & Schoenmakers, John G. M., 2009. "Regression methods for stochastic control problems and their convergence analysis," SFB 649 Discussion Papers 2009-026, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
Jérôme Detemple, 2014. "Optimal Exercise for Derivative Securities," Annual Review of Financial Economics, Annual Reviews, vol. 6(1), pages 459-487, December.
Zhang, Hongyu & Guo, Xunxiang & Wang, Ke & Huang, Shoude, 2024. "The valuation of American options with the stochastic liquidity risk and jump risk," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 650(C).
Duy Nguyen, 2018. "A hybrid Markov chain-tree valuation framework for stochastic volatility jump diffusion models," International Journal of Financial Engineering (IJFE), World Scientific Publishing Co. Pte. Ltd., vol. 5(04), pages 1-30, December.
Cosma, Antonio & Galluccio, Stefano & Pederzoli, Paola & Scaillet, Olivier, 2020. "Early Exercise Decision in American Options with Dividends, Stochastic Volatility, and Jumps," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 55(1), pages 331-356, February.
- Antonio Cosma & Stefano Galluccio & Paola Pederzoli & Olivier Scaillet, 2016. "Early exercise decision in American options with dividends, stochastic volatility and jumps," Papers 1612.03031, arXiv.org.
- Antonio Cosma & Stefano Galluccio & Paola Pederzoli & O. Scaillet, 2016. "Early Exercise Decision in American Options with Dividends, Stochastic Volatility and Jumps," Swiss Finance Institute Research Paper Series 16-73, Swiss Finance Institute.
Lars Stentoft, 2004. "Convergence of the Least Squares Monte Carlo Approach to American Option Valuation," Management Science, INFORMS, vol. 50(9), pages 1193-1203, September.
Suresh M. Sundaresan, 2000. "Continuous‐Time Methods in Finance: A Review and an Assessment," Journal of Finance, American Finance Association, vol. 55(4), pages 1569-1622, August.

More about this item

Keywords

;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:53:y:2005:i:1:p:126-139. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data