IDEAS home Printed from https://ideas.repec.org/a/spr/mathme/v84y2016i3d10.1007_s00186-016-0551-3.html
   My bibliography  Save this article

Constrained Markov decision processes in Borel spaces: from discounted to average optimality

Author

Listed:
  • Armando F. Mendoza-Pérez

    (UNACH)

  • Héctor Jasso-Fuentes

    (CINVESTAV–IPN)

  • Omar A. De-la-Cruz Courtois

    (UNACH)

Abstract

In this paper we study discrete-time Markov decision processes in Borel spaces with a finite number of constraints and with unbounded rewards and costs. Our aim is to provide a simple method to compute constrained optimal control policies when the payoff functions and the constraints are of either: infinite-horizon discounted type and average (a.k.a. ergodic) type. To deduce optimality results for the discounted case, we use the Lagrange multipliers method that rewrites the original problem (with constraints) into a parametric family of discounted unconstrained problems. Based on the dynamic programming technique as long with a simple use of elementary differential calculus, we obtain both suitable Lagrange multipliers and a family of control policies associated to these multipliers, this last family becomes optimal for the original problem with constraints. We next apply the vanishing discount factor method in order to obtain, in a straightforward way, optimal control policies associated to the average problem with constraints. Finally, to illustrate our results, we provide a simple application to linear–quadratic systems (LQ-systems).

Suggested Citation

  • Armando F. Mendoza-Pérez & Héctor Jasso-Fuentes & Omar A. De-la-Cruz Courtois, 2016. "Constrained Markov decision processes in Borel spaces: from discounted to average optimality," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 84(3), pages 489-525, December.
  • Handle: RePEc:spr:mathme:v:84:y:2016:i:3:d:10.1007_s00186-016-0551-3
    DOI: 10.1007/s00186-016-0551-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00186-016-0551-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00186-016-0551-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Nishimura, Kazuo & Stachurski, John, 2007. "Stochastic optimal policies when the discount rate vanishes," Journal of Economic Dynamics and Control, Elsevier, vol. 31(4), pages 1416-1430, April.
    2. Yuanyao Ding & Rangcheng Jia & Shaoxiang Tang, 2003. "Dynamic principal agent model based on CMDP," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 58(1), pages 149-157, September.
    3. Dutta, P.K., 1991. "What Do Discounted Optima Converge To? A Theory of Discount Rate Asymptotics in Economic Models," RCER Working Papers 264, University of Rochester - Center for Economic Research (RCER).
    4. Richard Chen & Eugene Feinberg, 2007. "Non-randomized policies for constrained Markov decision processes," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 66(1), pages 165-179, August.
    5. Lisa Korf, 2006. "Approximating infinite horizon stochastic optimal control in discrete time with constraints," Annals of Operations Research, Springer, vol. 142(1), pages 165-186, February.
    6. Eugene A. Feinberg & Pavlo O. Kasyanov & Nina V. Zadoianchuk, 2012. "Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities," Mathematics of Operations Research, INFORMS, vol. 37(4), pages 591-607, November.
    7. Dutta, Prajit K., 1991. "What do discounted optima converge to?: A theory of discount rate asymptotics in economic models," Journal of Economic Theory, Elsevier, vol. 55(1), pages 64-94, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kitti, Mitri, 2018. "Sustainable social choice under risk," Mathematical Social Sciences, Elsevier, vol. 94(C), pages 19-31.
    2. Alessandro Bonatti, 2008. "Continuous-Time Screening Contracts," 2008 Meeting Papers 493, Society for Economic Dynamics.
    3. repec:ebl:ecbull:v:8:y:2003:i:5:p:1-10 is not listed on IDEAS
    4. Ronald Wendner, 2003. "Status, environmental externality, and optimal tax programs," Economics Bulletin, AccessEcon, vol. 8(5), pages 1-10.
    5. Squintani, Francesco & Valimaki, Juuso, 2002. "Imitation and Experimentation in Changing Contests," Journal of Economic Theory, Elsevier, vol. 104(2), pages 376-404, June.
    6. Chichilnisky, Graciela & Beltratti, Andrea & Heal, Geoffrey, 1994. "The environment and the long run: A comparison of different criteria," MPRA Paper 7907, University Library of Munich, Germany.
    7. Keller, Godfrey & Rady, Sven, 2020. "Undiscounted bandit games," Games and Economic Behavior, Elsevier, vol. 124(C), pages 43-61.
    8. Iho Antti & Kitti Mitri, 2011. "A Tail-Payoff Puzzle in Dynamic Pollution Control," The B.E. Journal of Economic Analysis & Policy, De Gruyter, vol. 11(1), pages 1-30, May.
    9. Nowak, Andrzej S., 2008. "Equilibrium in a dynamic game of capital accumulation with the overtaking criterion," Economics Letters, Elsevier, vol. 99(2), pages 233-237, May.
    10. Dutta, Prajit K. & Radner, Roy, 2009. "A strategic analysis of global warming: Theory and some numbers," Journal of Economic Behavior & Organization, Elsevier, vol. 71(2), pages 187-209, August.
    11. Graciela Chichilnisky, 1996. "An axiomatic approach to sustainable development," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 13(2), pages 231-257, April.
    12. Fleurbaey, Marc & Michel, Philippe, 2003. "Intertemporal equity and the extension of the Ramsey criterion," Journal of Mathematical Economics, Elsevier, vol. 39(7), pages 777-802, September.
    13. Kazuo Nishimura & John Stachurski, 2004. "Stochastic Optimal Growth when the Discount Rate Vanishes," Department of Economics - Working Papers Series 908, The University of Melbourne.
    14. Lars J. Olson & Santanu Roy, 2006. "Theory of Stochastic Optimal Economic Growth," Springer Books, in: Rose-Anne Dana & Cuong Le Van & Tapan Mitra & Kazuo Nishimura (ed.), Handbook on Optimal Growth 1, chapter 11, pages 297-335, Springer.
    15. Gerlagh, Reyer & Liski, Matti, 2008. "Strategic Resource Dependence," Economic Theory and Applications Working Papers 44222, Fondazione Eni Enrico Mattei (FEEM).
    16. Nishimura, Kazuo & Stachurski, John, 2007. "Stochastic optimal policies when the discount rate vanishes," Journal of Economic Dynamics and Control, Elsevier, vol. 31(4), pages 1416-1430, April.
    17. Ghiglino, Christian & Tvede, Mich, 2000. "Optimal Policy in OG Models," Journal of Economic Theory, Elsevier, vol. 90(1), pages 62-83, January.
    18. Gerlagh, Reyer & Liski, Matti, 2011. "Strategic resource dependence," Journal of Economic Theory, Elsevier, vol. 146(2), pages 699-727, March.
    19. Hakenes, Hendrik & Katolnik, Svetlana, 2017. "On the incentive effects of job rotation," European Economic Review, Elsevier, vol. 98(C), pages 424-441.
    20. Urmee Khan & Maxwell Stinchcombe, 2014. "Patient Preferences, Intergenerational Equity, and the Precautionary Principle," Working Papers 201427, University of California at Riverside, Department of Economics.
    21. Steinmetz, Alexander, 2010. "Competition, innovation, and the effect of knowledge accumulation," W.E.P. - Würzburg Economic Papers 81, University of Würzburg, Department of Economics.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:mathme:v:84:y:2016:i:3:d:10.1007_s00186-016-0551-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.