IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v67y2019i1p198-214.html
   My bibliography  Save this article

Bayesian Exploration for Approximate Dynamic Programming

Author

Listed:
  • Ilya O. Ryzhov

    (Robert H. Smith School of Business, University of Maryland, College Park, Maryland 20742; Institute for Systems Research, A. James Clark School of Engineering, University of Maryland, College Park, Maryland 20742)

  • Martijn R. K. Mes

    (Industrial Engineering and Business Information Systems, University of Twente, 7500 AE Enschede, Netherlands)

  • Warren B. Powell

    (Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540)

  • Gerald van den Berg

    (Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540)

Abstract

Approximate dynamic programming (ADP) is a general methodological framework for multistage stochastic optimization problems in transportation, finance, energy, and other domains. We propose a new approach to the exploration/exploitation dilemma in ADP that leverages two important concepts from the optimal learning literature: first, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP; second, we develop a new exploration strategy based on the concept of value of information and prove that it systematically explores the state space. An important advantage of our framework is that it can be integrated into both parametric and nonparametric value function approximations, which are widely used in practical implementations of ADP. We evaluate this strategy on a variety of distinct resource allocation problems and demonstrate that, although more computationally intensive, it is highly competitive against other exploration strategies.

Suggested Citation

  • Ilya O. Ryzhov & Martijn R. K. Mes & Warren B. Powell & Gerald van den Berg, 2019. "Bayesian Exploration for Approximate Dynamic Programming," Operations Research, INFORMS, vol. 67(1), pages 198-214, January.
  • Handle: RePEc:inm:oropre:v:67:y:2019:i:1:p:198-214
    DOI: 10.1287/opre.2018.1772
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/opre.2018.1772
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2018.1772?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Pérez Rivera, Arturo E. & Mes, Martijn R.K., 2017. "Anticipatory freight selection in intermodal long-haul round-trips," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 105(C), pages 176-194.
    2. Hugo P. Simão & Jeff Day & Abraham P. George & Ted Gifford & John Nienow & Warren B. Powell, 2009. "An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application," Transportation Science, INFORMS, vol. 43(2), pages 178-197, May.
    3. Peter Frazier & Warren Powell & Savas Dayanik, 2009. "The Knowledge-Gradient Policy for Correlated Normal Beliefs," INFORMS Journal on Computing, INFORMS, vol. 21(4), pages 599-613, November.
    4. David B. Brown & James E. Smith, 2013. "Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats," Operations Research, INFORMS, vol. 61(3), pages 644-665, June.
    5. Monica Brezzi & Tze Leung Lai, 2000. "Incomplete Learning from Endogenous Data in Dynamic Allocation," Econometrica, Econometric Society, vol. 68(6), pages 1511-1516, November.
    6. Alan S. Minkoff, 1993. "A Markov Decision Model and Decomposition Heuristic for Dynamic Vehicle Dispatching," Operations Research, INFORMS, vol. 41(1), pages 77-90, February.
    7. Juliana M. Nascimento & Warren B. Powell, 2009. "An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem," Mathematics of Operations Research, INFORMS, vol. 34(1), pages 210-237, February.
    8. Diana M. Negoescu & Peter I. Frazier & Warren B. Powell, 2011. "The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery," INFORMS Journal on Computing, INFORMS, vol. 23(3), pages 346-363, August.
    9. Nicola Secomandi, 2010. "Optimal Commodity Trading with a Capacitated Storage Asset," Management Science, INFORMS, vol. 56(3), pages 449-467, March.
    10. Huseyin Topaloglu & Warren B. Powell, 2006. "Dynamic-Programming Approximations for Stochastic Time-Staged Integer Multicommodity-Flow Problems," INFORMS Journal on Computing, INFORMS, vol. 18(1), pages 31-42, February.
    11. Juliana Nascimento & Warren Powell, 2010. "Dynamic Programming Models and Algorithms for the Mutual Fund Cash Balance Problem," Management Science, INFORMS, vol. 56(5), pages 801-815, May.
    12. He, Miao & Zhao, Lei & Powell, Warren B., 2012. "Approximate dynamic programming algorithms for optimal dosage decisions in controlled ovarian hyperstimulation," European Journal of Operational Research, Elsevier, vol. 222(2), pages 328-340.
    13. Ilya O. Ryzhov & Warren B. Powell & Peter I. Frazier, 2012. "The Knowledge Gradient Algorithm for a General Class of Online Learning Problems," Operations Research, INFORMS, vol. 60(1), pages 180-195, February.
    14. Hugo P. Simão & Abraham George & Warren B. Powell & Ted Gifford & John Nienow & Jeff Day, 2010. "Approximate Dynamic Programming Captures Fleet Operations for Schneider National," Interfaces, INFORMS, vol. 40(5), pages 342-352, October.
    15. Daniel Russo & Benjamin Van Roy, 2014. "Learning to Optimize via Posterior Sampling," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1221-1243, November.
    16. Guoming Lai & François Margot & Nicola Secomandi, 2010. "An Approximate Dynamic Programming Approach to Benchmark Practice-Based Heuristics for Natural Gas Storage Valuation," Operations Research, INFORMS, vol. 58(3), pages 564-582, June.
    17. Alessandro Arlotto & Noah Gans & J. Michael Steele, 2014. "Markov Decision Problems Where Means Bound Variances," Operations Research, INFORMS, vol. 62(4), pages 864-875, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Warren B. Powell, 2016. "Perspectives of approximate dynamic programming," Annals of Operations Research, Springer, vol. 241(1), pages 319-356, June.
    2. Powell, Warren B., 2019. "A unified framework for stochastic optimization," European Journal of Operational Research, Elsevier, vol. 275(3), pages 795-821.
    3. Secomandi, Nicola & Seppi, Duane J., 2014. "Real Options and Merchant Operations of Energy and Other Commodities," Foundations and Trends(R) in Technology, Information and Operations Management, now publishers, vol. 6(3-4), pages 161-331, July.
    4. Saif Benjaafar & Daniel Jiang & Xiang Li & Xiaobo Li, 2022. "Dynamic Inventory Repositioning in On-Demand Rental Networks," Management Science, INFORMS, vol. 68(11), pages 7861-7878, November.
    5. Dimitri J. Papageorgiou & Myun-Seok Cheon & George Nemhauser & Joel Sokol, 2015. "Approximate Dynamic Programming for a Class of Long-Horizon Maritime Inventory Routing Problems," Transportation Science, INFORMS, vol. 49(4), pages 870-885, November.
    6. Nadarajah, Selvaprabu & Secomandi, Nicola, 2023. "A review of the operations literature on real options in energy," European Journal of Operational Research, Elsevier, vol. 309(2), pages 469-487.
    7. Daniel R. Jiang & Warren B. Powell, 2015. "An Approximate Dynamic Programming Algorithm for Monotone Value Functions," Operations Research, INFORMS, vol. 63(6), pages 1489-1511, December.
    8. Daniel R. Jiang & Warren B. Powell, 2015. "Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage Using Approximate Dynamic Programming," INFORMS Journal on Computing, INFORMS, vol. 27(3), pages 525-543, August.
    9. Yan Li & Kristofer G. Reyes & Jorge Vazquez-Anderson & Yingfei Wang & Lydia M. Contreras & Warren B. Powell, 2018. "A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model," INFORMS Journal on Computing, INFORMS, vol. 30(4), pages 750-767, November.
    10. Bin Han & Ilya O. Ryzhov & Boris Defourny, 2016. "Optimal Learning in Linear Regression with Combinatorial Feature Selection," INFORMS Journal on Computing, INFORMS, vol. 28(4), pages 721-735, November.
    11. Ulmer, Marlin W. & Thomas, Barrett W., 2020. "Meso-parametric value function approximation for dynamic customer acceptances in delivery routing," European Journal of Operational Research, Elsevier, vol. 285(1), pages 183-195.
    12. Lucas Agussurja & Shih-Fen Cheng & Hoong Chuin Lau, 2019. "A State Aggregation Approach for Stochastic Multiperiod Last-Mile Ride-Sharing Problems," Service Science, INFORMS, vol. 53(1), pages 148-166, February.
    13. Sripad K. Devalkar & Ravi Anupindi & Amitabh Sinha, 2011. "Integrated Optimization of Procurement, Processing, and Trade of Commodities," Operations Research, INFORMS, vol. 59(6), pages 1369-1381, December.
    14. Warren B. Powell & Abraham George & Hugo Simão & Warren Scott & Alan Lamont & Jeffrey Stewart, 2012. "SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology, and Policy," INFORMS Journal on Computing, INFORMS, vol. 24(4), pages 665-682, November.
    15. Jiao Wang & Lima Zhao & Arnd Huchzermeier, 2021. "Operations‐Finance Interface in Risk Management: Research Evolution and Opportunities," Production and Operations Management, Production and Operations Management Society, vol. 30(2), pages 355-389, February.
    16. Alain Bensoussan & Benoit Chevalier-Roignant & Alejandro Rivera, 2022. "A model for wind farm management with option interactions," Post-Print hal-04325553, HAL.
    17. Lin Zhao & Sweder van Wijnbergen, 2015. "Asset Pricing in Incomplete Markets: Valuing Gas Storage Capacity," Tinbergen Institute Discussion Papers 15-104/VI/DSF95, Tinbergen Institute.
    18. Michael F. Gorman & John-Paul Clarke & Amir Hossein Gharehgozli & Michael Hewitt & René de Koster & Debjit Roy, 2014. "State of the Practice: A Review of the Application of OR/MS in Freight Transportation," Interfaces, INFORMS, vol. 44(6), pages 535-554, December.
    19. Mercedes Esteban-Bravo & Jose M. Vidal-Sanz & Gökhan Yildirim, 2014. "Valuing Customer Portfolios with Endogenous Mass and Direct Marketing Interventions Using a Stochastic Dynamic Programming Decomposition," Marketing Science, INFORMS, vol. 33(5), pages 621-640, September.
    20. Zolfagharinia, Hossein & Haughton, Michael, 2018. "The importance of considering non-linear layover and delay costs for local truckers," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 109(C), pages 331-355.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:67:y:2019:i:1:p:198-214. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.