IDEAS home Printed from https://ideas.repec.org/a/spr/dyngam/v13y2023i1d10.1007_s13235-021-00420-0.html
   My bibliography  Save this article

Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

Author

Listed:
  • Weichao Mao

    (University of Illinois Urbana-Champaign)

  • Tamer Başar

    (University of Illinois Urbana-Champaign)

Abstract

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents’ strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an $$\epsilon $$ ϵ -approximate CCE in at most $$\widetilde{O}( H^6S A /\epsilon ^2)$$ O ~ ( H 6 S A / ϵ 2 ) episodes, where S is the number of states, A is the size of the largest individual action space, and H is the length of an episode. This appears to be the first sample complexity result for learning in generic general-sum Markov games. Our results rely on a novel investigation of an anytime high-probability regret bound for OMD with a dynamic learning rate and weighted regret, which would be of independent interest. One key feature of our algorithm is that it is decentralized, in the sense that each agent has access to only its local information, and is completely oblivious to the presence of others. This way, our algorithm can readily scale up to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents.

Suggested Citation

  • Weichao Mao & Tamer Başar, 2023. "Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games," Dynamic Games and Applications, Springer, vol. 13(1), pages 165-186, March.
  • Handle: RePEc:spr:dyngam:v:13:y:2023:i:1:d:10.1007_s13235-021-00420-0
    DOI: 10.1007/s13235-021-00420-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13235-021-00420-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13235-021-00420-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Aumann, Robert J, 1987. "Correlated Equilibrium as an Expression of Bayesian Rationality," Econometrica, Econometric Society, vol. 55(1), pages 1-18, January.
    2. Sergiu Hart & Andreu Mas-Colell, 2013. "A Simple Adaptive Procedure Leading To Correlated Equilibrium," World Scientific Book Chapters, in: Simple Adaptive Strategies From Regret-Matching to Uncoupled Dynamics, chapter 2, pages 17-46, World Scientific Publishing Co. Pte. Ltd..
    3. MOULIN, Hervé & VIAL, Jean-Philippe, 1978. "Strategically zero-sum games: the class of games whose completely mixed equilibria connot be improved upon," LIDAM Reprints CORE 359, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    4. Daniel S. Bernstein & Robert Givan & Neil Immerman & Shlomo Zilberstein, 2002. "The Complexity of Decentralized Control of Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(4), pages 819-840, November.
    5. Oriol Vinyals & Igor Babuschkin & Wojciech M. Czarnecki & Michaël Mathieu & Andrew Dudzik & Junyoung Chung & David H. Choi & Richard Powell & Timo Ewalds & Petko Georgiev & Junhyuk Oh & Dan Horgan & M, 2019. "Grandmaster level in StarCraft II using multi-agent reinforcement learning," Nature, Nature, vol. 575(7782), pages 350-354, November.
    6. David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ozdogan, Ayca & Saglam, Ismail, 2021. "Correlated equilibrium under costly disobedience," Mathematical Social Sciences, Elsevier, vol. 114(C), pages 98-104.
    2. Friedman, Daniel & Rabanal, Jean Paul & Rud, Olga A. & Zhao, Shuchen, 2022. "On the empirical relevance of correlated equilibrium," Journal of Economic Theory, Elsevier, vol. 205(C).
    3. Konstantinos Georgalos & Indrajit Ray & Sonali SenGupta, 2020. "Nash versus coarse correlation," Experimental Economics, Springer;Economic Science Association, vol. 23(4), pages 1178-1204, December.
    4. Ehud Lehrer & Eilon Solan, 2007. "Learning to play partially-specified equilibrium," Levine's Working Paper Archive 122247000000001436, David K. Levine.
    5. Fook Wai Kong & Polyxeni-Margarita Kleniati & Berç Rustem, 2012. "Computation of Correlated Equilibrium with Global-Optimal Expected Social Welfare," Journal of Optimization Theory and Applications, Springer, vol. 153(1), pages 237-261, April.
    6. Moulin, Herve & Ray, Indrajit & Sen Gupta, Sonali, 2014. "Improving Nash by coarse correlation," Journal of Economic Theory, Elsevier, vol. 150(C), pages 852-865.
    7. Kets, Willemien & Kager, Wouter & Sandroni, Alvaro, 2022. "The value of a coordination game," Journal of Economic Theory, Elsevier, vol. 201(C).
    8. Shuo Sun & Rundong Wang & Bo An, 2021. "Reinforcement Learning for Quantitative Trading," Papers 2109.13851, arXiv.org.
    9. Li, Wenqing & Ni, Shaoquan, 2022. "Train timetabling with the general learning environment and multi-agent deep reinforcement learning," Transportation Research Part B: Methodological, Elsevier, vol. 157(C), pages 230-251.
    10. Giraud, Gael & Rochon, Celine, 2002. "Consistent collusion-proofness and correlation in exchange economies," Journal of Mathematical Economics, Elsevier, vol. 38(4), pages 441-463, December.
    11. Boian Lazov, 2023. "A Deep Reinforcement Learning Trader without Offline Training," Papers 2303.00356, arXiv.org.
    12. Soham R. Phade & Venkat Anantharam, 2019. "On the Geometry of Nash and Correlated Equilibria with Cumulative Prospect Theoretic Preferences," Decision Analysis, INFORMS, vol. 16(2), pages 142-156, June.
    13. Rabah Amir & Sergei Belkov & Igor V. Evstigneev, 2017. "Correlated equilibrium in a nutshell," Theory and Decision, Springer, vol. 83(4), pages 457-468, December.
    14. Bernhard von Stengel & Françoise Forges, 2008. "Extensive-Form Correlated Equilibrium: Definition and Computational Complexity," Mathematics of Operations Research, INFORMS, vol. 33(4), pages 1002-1022, November.
    15. Kam-Chau Wong & Chongmin Kim, 2004. "Evolutionarily Stable Correlation," Econometric Society 2004 Far Eastern Meetings 495, Econometric Society.
    16. Stoltz, Gilles & Lugosi, Gabor, 2007. "Learning correlated equilibria in games with compact sets of strategies," Games and Economic Behavior, Elsevier, vol. 59(1), pages 187-208, April.
    17. Dokka, Trivikram & Bruno, Jorge & SenGupta, Sonali & Anwar, Sakib, 2022. "Pricing and Electric Vehicle Charging Equilibria," QBS Working Paper Series 2022/10, Queen's University Belfast, Queen's Business School.
    18. Christoph Graf & Viktor Zobernig & Johannes Schmidt & Claude Klöckl, 2024. "Computational Performance of Deep Reinforcement Learning to Find Nash Equilibria," Computational Economics, Springer;Society for Computational Economics, vol. 63(2), pages 529-576, February.
    19. Seyedakbar Mostafavi & Mehdi Dehghan, 2017. "A stochastic approximation resource allocation approach for HD live streaming," Telecommunication Systems: Modelling, Analysis, Design and Management, Springer, vol. 64(1), pages 87-101, January.
    20. Ayan Bhattacharya, 2019. "On Adaptive Heuristics that Converge to Correlated Equilibrium," Games, MDPI, vol. 10(1), pages 1-11, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:dyngam:v:13:y:2023:i:1:d:10.1007_s13235-021-00420-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.