IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v65y2017i5p1355-1379.html
   My bibliography  Save this article

Information Relaxation Bounds for Infinite Horizon Markov Decision Processes

Author

Listed:
  • David B. Brown

    (Fuqua School of Business, Duke University, Durham, North Carolina 27708)

  • Martin B. Haugh

    (Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027)

Abstract

We consider the information relaxation approach for calculating performance bounds for stochastic dynamic programs (DPs), following Brown et al. [Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4, Part 1):785–801]. This approach generates performance bounds by solving problems with relaxed nonanticipativity constraints and a penalty that punishes violations of these constraints. In this paper, we study infinite horizon DPs with discounted costs and consider applying information relaxations to reformulations of the DP. These reformulations use different state transition functions and correct for the change in state transition probabilities by multiplying by likelihood ratio factors. These reformulations can greatly simplify solutions of the information relaxations, both in leading to finite horizon subproblems and by reducing the number of states that need to be considered in these subproblems. We show that any reformulation leads to a lower bound on the optimal cost of the DP when used with an information relaxation and a penalty built from a broad class of approximate value functions. We refer to this class of approximate value functions as subsolutions , and this includes approximate value functions based on Lagrangian relaxations as well as those based on approximate linear programs. We show that the information relaxation approach, in theory, recovers a tight lower bound using any reformulation and is guaranteed to improve on the lower bounds from subsolutions. Finally, we apply information relaxations to an inventory control application with an autoregressive demand process, as well as dynamic service allocation in a multiclass queue. In our examples, we find that the information relaxation lower bounds are easy to calculate and are very close to the expected cost using simple heuristic policies, thereby showing that these heuristic policies are nearly optimal.

Suggested Citation

  • David B. Brown & Martin B. Haugh, 2017. "Information Relaxation Bounds for Infinite Horizon Markov Decision Processes," Operations Research, INFORMS, vol. 65(5), pages 1355-1379, October.
  • Handle: RePEc:inm:oropre:v:65:y:2017:i:5:p:1355-1379
    DOI: 10.1287/opre.2017.1631
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/opre.2017.1631
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2017.1631?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Leif Andersen & Mark Broadie, 2004. "Primal-Dual Simulation Algorithm for Pricing Multidimensional American Options," Management Science, INFORMS, vol. 50(9), pages 1222-1234, September.
    2. David B. Brown & James E. Smith, 2014. "Information Relaxations, Duality, and Convex Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 62(6), pages 1394-1415, December.
    3. Stephen C. Graves, 1999. "A Single-Item Inventory Model for a Nonstationary Demand Process," Manufacturing & Service Operations Management, INFORMS, vol. 1(1), pages 50-61.
    4. G. D. Johnson & H. E. Thompson, 1975. "Optimality of Myopic Inventory Policies for Certain Dependent Demand Processes," Management Science, INFORMS, vol. 21(11), pages 1303-1307, July.
    5. J. Michael Harrison, 1975. "Dynamic Scheduling of a Multiclass Queue: Discount Optimality," Operations Research, INFORMS, vol. 23(2), pages 270-282, April.
    6. D. P. de Farias & B. Van Roy, 2003. "The Linear Programming Approach to Approximate Dynamic Programming," Operations Research, INFORMS, vol. 51(6), pages 850-865, December.
    7. Xiangwen Lu & Jing-Sheng Song & Amelia Regan, 2006. "Inventory Planning with Forecast Updates: Approximate Solutions and Cost Error Bounds," Operations Research, INFORMS, vol. 54(6), pages 1079-1097, December.
    8. Daniel Adelman & Adam J. Mersereau, 2008. "Relaxations of Weakly Coupled Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 56(3), pages 712-727, June.
    9. David B. Brown & James E. Smith & Peng Sun, 2010. "Information Relaxations and Duality in Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 58(4-part-1), pages 785-801, August.
    10. Sripad K. Devalkar & Ravi Anupindi & Amitabh Sinha, 2011. "Integrated Optimization of Procurement, Processing, and Trade of Commodities," Operations Research, INFORMS, vol. 59(6), pages 1369-1381, December.
    11. Martin B. Haugh & Leonid Kogan, 2004. "Pricing American Options: A Duality Approach," Operations Research, INFORMS, vol. 52(2), pages 258-270, April.
    12. Shane G. Henderson & Peter W. Glynn, 2002. "Approximating Martingales for Variance Reduction in Markov Process Simulation," Mathematics of Operations Research, INFORMS, vol. 27(2), pages 253-271, May.
    13. Stephen C. Graves, 1999. "Addendum to "A Single-Item Inventory Model for a Nonstationary Demand Process"," Manufacturing & Service Operations Management, INFORMS, vol. 1(2), pages 174-174.
    14. Nan Chen & Paul Glasserman, 2007. "Additive and multiplicative duals for American option pricing," Finance and Stochastics, Springer, vol. 11(2), pages 153-179, April.
    15. Michael Jong Kim & Andrew E.B. Lim, 2016. "Robust Multiarmed Bandit Problems," Management Science, INFORMS, vol. 62(1), pages 264-285, January.
    16. L. C. G. Rogers, 2002. "Monte Carlo valuation of American options," Mathematical Finance, Wiley Blackwell, vol. 12(3), pages 271-286, July.
    17. Martin Haugh & Garud Iyengar & Chun Wang, 2016. "Tax-Aware Dynamic Asset Allocation," Operations Research, INFORMS, vol. 64(4), pages 849-866, August.
    18. Guoming Lai & François Margot & Nicola Secomandi, 2010. "An Approximate Dynamic Programming Approach to Benchmark Practice-Based Heuristics for Natural Gas Storage Valuation," Operations Research, INFORMS, vol. 58(3), pages 564-582, June.
    19. David B. Brown & James E. Smith, 2011. "Dynamic Portfolio Optimization with Transaction Costs: Heuristics and Dual Bounds," Management Science, INFORMS, vol. 57(10), pages 1752-1770, October.
    20. Vijay V. Desai & Vivek F. Farias & Ciamac C. Moallemi, 2012. "Pathwise Optimization for Optimal Stopping Problems," Management Science, INFORMS, vol. 58(12), pages 2292-2308, December.
    21. P. S. Ansell & K. D. Glazebrook & J. Niño-Mora & M. O'Keeffe, 2003. "Whittle's index policy for a multi-class queueing system with convex holding costs," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 57(1), pages 21-39, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Thomas W.M. Vossen & Fan You & Dan Zhang, 2022. "Finite‐horizon approximate linear programs for capacity allocation over a rolling horizon," Production and Operations Management, Production and Operations Management Society, vol. 31(5), pages 2127-2142, May.
    2. ElHafsi, Mohsen & Fang, Jianxin & Hamouda, Essia, 2020. "A novel decomposition-based method for solving general-product structure assemble-to-order systems," European Journal of Operational Research, Elsevier, vol. 286(1), pages 233-249.
    3. Santiago R. Balseiro & David B. Brown, 2019. "Approximations to Stochastic Dynamic Programs via Information Relaxation Duality," Operations Research, INFORMS, vol. 67(2), pages 577-597, March.
    4. Christian Bender & Christian Gärtner & Nikolaus Schweizer, 2018. "Pathwise Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 43(3), pages 965-965, August.
    5. J. G. Dai & Pengyi Shi, 2019. "Inpatient Overflow: An Approximate Dynamic Programming Approach," Manufacturing & Service Operations Management, INFORMS, vol. 21(4), pages 894-911, October.
    6. Alessio Trivella & Danial Mohseni-Taheri & Selvaprabu Nadarajah, 2023. "Meeting Corporate Renewable Power Targets," Management Science, INFORMS, vol. 69(1), pages 491-512, January.
    7. Qihang Lin & Selvaprabu Nadarajah & Negar Soheili, 2020. "Revisiting Approximate Linear Programming: Constraint-Violation Learning with Applications to Inventory Control and Energy Storage," Management Science, INFORMS, vol. 66(4), pages 1544-1562, April.
    8. Mor Armony & Rami Atar & Harsha Honnappa, 2019. "Asymptotically Optimal Appointment Schedules," Management Science, INFORMS, vol. 44(4), pages 1345-1380, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Santiago R. Balseiro & David B. Brown, 2019. "Approximations to Stochastic Dynamic Programs via Information Relaxation Duality," Operations Research, INFORMS, vol. 67(2), pages 577-597, March.
    2. David B. Brown & James E. Smith, 2014. "Information Relaxations, Duality, and Convex Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 62(6), pages 1394-1415, December.
    3. David B. Brown & James E. Smith, 2013. "Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats," Operations Research, INFORMS, vol. 61(3), pages 644-665, June.
    4. Vijay V. Desai & Vivek F. Farias & Ciamac C. Moallemi, 2012. "Pathwise Optimization for Optimal Stopping Problems," Management Science, INFORMS, vol. 58(12), pages 2292-2308, December.
    5. Alessio Trivella & Danial Mohseni-Taheri & Selvaprabu Nadarajah, 2023. "Meeting Corporate Renewable Power Targets," Management Science, INFORMS, vol. 69(1), pages 491-512, January.
    6. Dragos Florin Ciocan & Velibor V. Mišić, 2022. "Interpretable Optimal Stopping," Management Science, INFORMS, vol. 68(3), pages 1616-1638, March.
    7. Daniel R. Jiang & Lina Al-Kanj & Warren B. Powell, 2020. "Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds," Operations Research, INFORMS, vol. 68(6), pages 1678-1697, November.
    8. Secomandi, Nicola & Seppi, Duane J., 2014. "Real Options and Merchant Operations of Energy and Other Commodities," Foundations and Trends(R) in Technology, Information and Operations Management, now publishers, vol. 6(3-4), pages 161-331, July.
    9. Helin Zhu & Fan Ye & Enlu Zhou, 2013. "Fast Estimation of True Bounds on Bermudan Option Prices under Jump-diffusion Processes," Papers 1305.4321, arXiv.org.
    10. Mark Broadie & Weiwei Shen, 2016. "High-Dimensional Portfolio Optimization With Transaction Costs," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 19(04), pages 1-49, June.
    11. Helin Zhu & Fan Ye & Enlu Zhou, 2015. "Fast estimation of true bounds on Bermudan option prices under jump-diffusion processes," Quantitative Finance, Taylor & Francis Journals, vol. 15(11), pages 1885-1900, November.
    12. David B. Brown & James E. Smith & Peng Sun, 2010. "Information Relaxations and Duality in Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 58(4-part-1), pages 785-801, August.
    13. Christian Bender & Christian Gaertner & Nikolaus Schweizer, 2016. "Pathwise Iteration for Backward SDEs," Papers 1605.07500, arXiv.org, revised Jun 2016.
    14. Nadarajah, Selvaprabu & Margot, François & Secomandi, Nicola, 2017. "Comparison of least squares Monte Carlo methods with applications to energy real options," European Journal of Operational Research, Elsevier, vol. 256(1), pages 196-204.
    15. Christian Bender & Christian Gärtner & Nikolaus Schweizer, 2018. "Pathwise Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 43(3), pages 965-965, August.
    16. Sebastian Becker & Patrick Cheridito & Arnulf Jentzen & Timo Welti, 2019. "Solving high-dimensional optimal stopping problems using deep learning," Papers 1908.01602, arXiv.org, revised Aug 2021.
    17. Anna Maria Gambaro & Nicola Secomandi, 2021. "A Discussion of Non‐Gaussian Price Processes for Energy and Commodity Operations," Production and Operations Management, Production and Operations Management Society, vol. 30(1), pages 47-67, January.
    18. Cosma, Antonio & Galluccio, Stefano & Pederzoli, Paola & Scaillet, Olivier, 2020. "Early Exercise Decision in American Options with Dividends, Stochastic Volatility, and Jumps," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 55(1), pages 331-356, February.
    19. Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2023. "A mathematical programming-based solution method for the nonstationary inventory problem under correlated demand," European Journal of Operational Research, Elsevier, vol. 304(2), pages 515-524.
    20. Jalaj Bhandari & Daniel Russo & Raghav Singal, 2021. "A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation," Operations Research, INFORMS, vol. 69(3), pages 950-973, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:65:y:2017:i:5:p:1355-1379. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.