Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

My bibliography Save this article

Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

Author

Listed:

Weichao Mao
(University of Illinois Urbana-Champaign)
Tamer Başar
(University of Illinois Urbana-Champaign)

Registered:

Abstract

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents’ strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an $$\epsilon $$ ϵ -approximate CCE in at most $$\widetilde{O}( H^6S A /\epsilon ^2)$$ O ~ ( H 6 S A / ϵ 2 ) episodes, where S is the number of states, A is the size of the largest individual action space, and H is the length of an episode. This appears to be the first sample complexity result for learning in generic general-sum Markov games. Our results rely on a novel investigation of an anytime high-probability regret bound for OMD with a dynamic learning rate and weighted regret, which would be of independent interest. One key feature of our algorithm is that it is decentralized, in the sense that each agent has access to only its local information, and is completely oblivious to the presence of others. This way, our algorithm can readily scale up to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents.

Suggested Citation

Weichao Mao & Tamer Başar, 2023. "Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games," Dynamic Games and Applications, Springer, vol. 13(1), pages 165-186, March.

Handle: RePEc:spr:dyngam:v:13:y:2023:i:1:d:10.1007_s13235-021-00420-0
DOI: 10.1007/s13235-021-00420-0

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Aumann, Robert J, 1987. "Correlated Equilibrium as an Expression of Bayesian Rationality," Econometrica, Econometric Society, vol. 55(1), pages 1-18, January.
- Robert J. Aumann, 2010. "Correlated Equilibrium as an expression of Bayesian Rationality," Levine's Working Paper Archive 661465000000000377, David K. Levine.
- R. Aumann, 2010. "Correlated Equilibrium as an expression of Bayesian Rationality," Levine's Bibliography 513, UCLA Department of Economics.
Sergiu Hart & Andreu Mas-Colell, 2013. "A Simple Adaptive Procedure Leading To Correlated Equilibrium," World Scientific Book Chapters, in: Simple Adaptive Strategies From Regret-Matching to Uncoupled Dynamics, chapter 2, pages 17-46, World Scientific Publishing Co. Pte. Ltd..
- Sergiu Hart & Andreu Mas-Colell, 2000. "A Simple Adaptive Procedure Leading to Correlated Equilibrium," Econometrica, Econometric Society, vol. 68(5), pages 1127-1150, September.
- Sergiu Hart & Andreu Mas-Colell, 1996. "A simple adaptive procedure leading to correlated equilibrium," Economics Working Papers 200, Department of Economics and Business, Universitat Pompeu Fabra, revised Dec 1996.
- S. Hart & A. Mas-Collel, 2010. "A Simple Adaptive Procedure Leading to Correlated Equilibrium," Levine's Working Paper Archive 572, David K. Levine.
- Sergiu Hart & Andreu Mas-Colell, 1997. "A Simple Adaptive Procedure Leading to Correlated Equilibrium," Game Theory and Information 9703006, University Library of Munich, Germany, revised 25 Nov 1997.
MOULIN, Hervé & VIAL, Jean-Philippe, 1978. "Strategically zero-sum games: the class of games whose completely mixed equilibria connot be improved upon," LIDAM Reprints CORE 359, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
Daniel S. Bernstein & Robert Givan & Neil Immerman & Shlomo Zilberstein, 2002. "The Complexity of Decentralized Control of Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(4), pages 819-840, November.
Oriol Vinyals & Igor Babuschkin & Wojciech M. Czarnecki & Michaël Mathieu & Andrew Dudzik & Junyoung Chung & David H. Choi & Richard Powell & Timo Ewalds & Petko Georgiev & Junhyuk Oh & Dan Horgan & M, 2019. "Grandmaster level in StarCraft II using multi-agent reinforcement learning," Nature, Nature, vol. 575(7782), pages 350-354, November.
David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Ozdogan, Ayca & Saglam, Ismail, 2021. "Correlated equilibrium under costly disobedience," Mathematical Social Sciences, Elsevier, vol. 114(C), pages 98-104.
- Ozdogan, Ayca & Saglam, Ismail, 2020. "Correlated Equilibrium Under Costly Disobedience," MPRA Paper 99370, University Library of Munich, Germany.
Friedman, Daniel & Rabanal, Jean Paul & Rud, Olga A. & Zhao, Shuchen, 2022. "On the empirical relevance of correlated equilibrium," Journal of Economic Theory, Elsevier, vol. 205(C).
- Friedman, Dan & Rabanal, Jean Paul & Rud, Olga A & Zhao, Shuchen, 2021. "On the empirical relevance of correlated equilibrium," UiS Working Papers in Economics and Finance 2021/2, University of Stavanger.
Konstantinos Georgalos & Indrajit Ray & Sonali SenGupta, 2020. "Nash versus coarse correlation," Experimental Economics, Springer;Economic Science Association, vol. 23(4), pages 1178-1204, December.
Ehud Lehrer & Eilon Solan, 2007. "Learning to play partially-specified equilibrium," Levine's Working Paper Archive 122247000000001436, David K. Levine.
Fook Wai Kong & Polyxeni-Margarita Kleniati & Berç Rustem, 2012. "Computation of Correlated Equilibrium with Global-Optimal Expected Social Welfare," Journal of Optimization Theory and Applications, Springer, vol. 153(1), pages 237-261, April.
Moulin, Herve & Ray, Indrajit & Sen Gupta, Sonali, 2014. "Improving Nash by coarse correlation," Journal of Economic Theory, Elsevier, vol. 150(C), pages 852-865.
- Herve Moulin & Indrajit Ray & Sonali Sen Gupta, 2013. "Improving Nash by Coarse Correlation," Discussion Papers 13-10, Department of Economics, University of Birmingham.
Kets, Willemien & Kager, Wouter & Sandroni, Alvaro, 2022. "The value of a coordination game," Journal of Economic Theory, Elsevier, vol. 201(C).
- Kets, Willemien & Kager, Wouter & Sandroni, Alvaro, 2021. "The Value of a Coordination Game," SocArXiv ymzrd, Center for Open Science.
- Kets, Willemien & Kager, Wouter & Sandroni, Alvaro, 2021. "The Value of a Coordination Game," CEPR Discussion Papers 16229, C.E.P.R. Discussion Papers.
- Willemien Kets & Wouter Kager & Alvaro Sandroni, 2021. "The Value of the Coordination Game," Economics Series Working Papers 938, University of Oxford, Department of Economics.
Shuo Sun & Rundong Wang & Bo An, 2021. "Reinforcement Learning for Quantitative Trading," Papers 2109.13851, arXiv.org.
Li, Wenqing & Ni, Shaoquan, 2022. "Train timetabling with the general learning environment and multi-agent deep reinforcement learning," Transportation Research Part B: Methodological, Elsevier, vol. 157(C), pages 230-251.
Giraud, Gael & Rochon, Celine, 2002. "Consistent collusion-proofness and correlation in exchange economies," Journal of Mathematical Economics, Elsevier, vol. 38(4), pages 441-463, December.
- GIRAUD, Gaël & ROCHON, Céline, 2001. "Consistent collusion-proofness and correlation in exchange economies," LIDAM Discussion Papers CORE 2001018, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
- Gaël Giraud & Céline Rouchon, 2002. "Consistent collusion-proofness and correlation in exchange economies," Post-Print halshs-00498879, HAL.
Boian Lazov, 2023. "A Deep Reinforcement Learning Trader without Offline Training," Papers 2303.00356, arXiv.org.
Soham R. Phade & Venkat Anantharam, 2019. "On the Geometry of Nash and Correlated Equilibria with Cumulative Prospect Theoretic Preferences," Decision Analysis, INFORMS, vol. 16(2), pages 142-156, June.
Rabah Amir & Sergei Belkov & Igor V. Evstigneev, 2017. "Correlated equilibrium in a nutshell," Theory and Decision, Springer, vol. 83(4), pages 457-468, December.
- Rabah Amir & Sergei Belkov & Igor V. Evstigneev, 2017. "Correlated Equilibrium in a Nutshell," Economics Discussion Paper Series 1706, Economics, The University of Manchester.
Bernhard von Stengel & Françoise Forges, 2008. "Extensive-Form Correlated Equilibrium: Definition and Computational Complexity," Mathematics of Operations Research, INFORMS, vol. 33(4), pages 1002-1022, November.
- Francoise Forges & Bernhard von Stengel, 2008. "Extensive form correlated equilibrium: definition and computational complexity," Post-Print hal-00360729, HAL.
Kam-Chau Wong & Chongmin Kim, 2004. "Evolutionarily Stable Correlation," Econometric Society 2004 Far Eastern Meetings 495, Econometric Society.
Stoltz, Gilles & Lugosi, Gabor, 2007. "Learning correlated equilibria in games with compact sets of strategies," Games and Economic Behavior, Elsevier, vol. 59(1), pages 187-208, April.
Dokka, Trivikram & Bruno, Jorge & SenGupta, Sonali & Anwar, Sakib, 2022. "Pricing and Electric Vehicle Charging Equilibria," QBS Working Paper Series 2022/10, Queen's University Belfast, Queen's Business School.
Christoph Graf & Viktor Zobernig & Johannes Schmidt & Claude Klöckl, 2024. "Computational Performance of Deep Reinforcement Learning to Find Nash Equilibria," Computational Economics, Springer;Society for Computational Economics, vol. 63(2), pages 529-576, February.
Seyedakbar Mostafavi & Mehdi Dehghan, 2017. "A stochastic approximation resource allocation approach for HD live streaming," Telecommunication Systems: Modelling, Analysis, Design and Management, Springer, vol. 64(1), pages 87-101, January.
Ayan Bhattacharya, 2019. "On Adaptive Heuristics that Converge to Correlated Equilibrium," Games, MDPI, vol. 10(1), pages 1-11, January.

More about this item

Keywords

Markov game; Reinforcement learning; Coarse correlated equilibrium; Sample complexity;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:dyngam:v:13:y:2023:i:1:d:10.1007_s13235-021-00420-0. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data