IDEAS home Printed from https://ideas.repec.org/a/spr/joptap/v153y2012i3d10.1007_s10957-012-9989-5.html
   My bibliography  Save this article

An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

Author

Listed:
  • Shalabh Bhatnagar

    (Indian Institute of Science)

  • K. Lakshmanan

    (Indian Institute of Science)

Abstract

We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

Suggested Citation

  • Shalabh Bhatnagar & K. Lakshmanan, 2012. "An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes," Journal of Optimization Theory and Applications, Springer, vol. 153(3), pages 688-708, June.
  • Handle: RePEc:spr:joptap:v:153:y:2012:i:3:d:10.1007_s10957-012-9989-5
    DOI: 10.1007/s10957-012-9989-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10957-012-9989-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10957-012-9989-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mas-Colell, Andreu & Whinston, Michael D. & Green, Jerry R., 1995. "Microeconomic Theory," OUP Catalogue, Oxford University Press, number 9780195102680.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Thomas Spooner & Rahul Savani, 2020. "A Natural Actor-Critic Algorithm with Downside Risk Constraints," Papers 2007.04203, arXiv.org.
    2. Yuqing Zheng & Guoshan Zhang, 2020. "Suboptimal Control for Nonlinear Systems with Disturbance via Integral Sliding Mode Control and Policy Iteration," Journal of Optimization Theory and Applications, Springer, vol. 185(2), pages 652-677, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wright, Austin L. & Sonin, Konstantin & Driscoll, Jesse & Wilson, Jarnickae, 2020. "Poverty and economic dislocation reduce compliance with COVID-19 shelter-in-place protocols," Journal of Economic Behavior & Organization, Elsevier, vol. 180(C), pages 544-554.
    2. Janvier D. Nkurunziza, 2005. "Reputation and Credit without Collateral in Africa`s Formal Banking," Economics Series Working Papers WPS/2005-02, University of Oxford, Department of Economics.
    3. Vadim Borokhov, 2014. "On the properties of nodal price response matrix in electricity markets," Papers 1404.3678, arXiv.org, revised Jan 2015.
    4. Gan, Li & Ju, Gaosheng & Zhu, Xi, 2015. "Nonparametric estimation of structural labor supply and exact welfare change under nonconvex piecewise-linear budget sets," Journal of Econometrics, Elsevier, vol. 188(2), pages 526-544.
    5. Peterson, Jeffrey M. & Boisvert, Richard N. & de Gorter, Harry, 1999. "Multifunctionality and Optimal Environmental Policies for Agriculture in an Open Economy," Working Papers 127701, Cornell University, Department of Applied Economics and Management.
    6. Aldasoro, Iñaki & Delli Gatti, Domenico & Faia, Ester, 2017. "Bank networks: Contagion, systemic risk and prudential policy," Journal of Economic Behavior & Organization, Elsevier, vol. 142(C), pages 164-188.
    7. Gatti, Nicolas & Cecil, Michael & Baylis, Kathy & Estes, Lyndon & Blekking, Jordan & Heckelei, Thomas & Vergopolan, Noemi & Evans, Tom, 2023. "Is closing the agricultural yield gap a “risky” endeavor?," Agricultural Systems, Elsevier, vol. 208(C).
    8. Chorvat, Terrence, 2006. "Taxing utility," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 35(1), pages 1-16, February.
    9. Delgado, Michael S. & Khanna, Neha, 2015. "Voluntary Pollution Abatement and Regulation," Agricultural and Resource Economics Review, Cambridge University Press, vol. 44(1), pages 1-20, April.
    10. Bhattacharya, D., 2018. "Income Effects and Rationalizability in Multinomial Choice Models," Cambridge Working Papers in Economics 1884, Faculty of Economics, University of Cambridge.
    11. List, Christian & Polak, Ben, 2010. "Introduction to judgment aggregation," Journal of Economic Theory, Elsevier, vol. 145(2), pages 441-466, March.
    12. Franke, Jörg & Leininger, Wolfgang & Wasser, Cédric, 2018. "Optimal favoritism in all-pay auctions and lottery contests," European Economic Review, Elsevier, vol. 104(C), pages 22-37.
    13. Che-Yuan Liang, 2017. "Optimal inequality behind the veil of ignorance," Theory and Decision, Springer, vol. 83(3), pages 431-455, October.
    14. Shino, Junnosuke, 2013. "A positive theory of fixed-rate funds-supplying operations in an accommodative financial environment," Journal of International Money and Finance, Elsevier, vol. 32(C), pages 595-610.
    15. Peysakhovich, Alexander & Plagborg-Møller, Mikkel, 2012. "A note on proper scoring rules and risk aversion," Economics Letters, Elsevier, vol. 117(1), pages 357-361.
    16. Badics, Tamás, 2011. "Az arbitrázs preferenciákkal történő karakterizációjáról [On the characterization of arbitrage in terms of preferences]," Közgazdasági Szemle (Economic Review - monthly of the Hungarian Academy of Sciences), Közgazdasági Szemle Alapítvány (Economic Review Foundation), vol. 0(9), pages 727-742.
    17. Peters, Glen, 2008. "Reassessing Carbon Leakage," Conference papers 331753, Purdue University, Center for Global Trade Analysis, Global Trade Analysis Project.
    18. Vizard, Polly, 2005. "The contributions of Professor Amartya Sen in the field of human rights," LSE Research Online Documents on Economics 6273, London School of Economics and Political Science, LSE Library.
    19. Bowen, T. Renee & Chen, Ying & Eraslan, Hülya & Zápal, Jan, 2017. "Efficiency of flexible budgetary institutions," Journal of Economic Theory, Elsevier, vol. 167(C), pages 148-176.
    20. Amir, Rabah & Bloch, Francis, 2009. "Comparative statics in a simple class of strategic market games," Games and Economic Behavior, Elsevier, vol. 65(1), pages 7-24, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:joptap:v:153:y:2012:i:3:d:10.1007_s10957-012-9989-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.