Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

My bibliography Save this article

Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

Author

Listed:

Jayakumar Subramanian
(Adobe Inc.)
Amit Sinha
(McGill University)
Aditya Mahajan
(McGill University)

Registered:

Abstract

Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model-based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of sample complexity for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality-based bounds to show that $$\tilde{{\mathcal {O}}}( (1-\gamma )^{-4} \alpha ^{-2})$$ O ~ ( ( 1 - γ ) - 4 α - 2 ) samples per state–action pair are sufficient to obtain a $$\alpha $$ α -approximate Markov perfect equilibrium with high probability, where $$\gamma $$ γ is the discount factor, and the $$\tilde{{\mathcal {O}}}(\cdot )$$ O ~ ( · ) notation hides logarithmic terms. We then use Bernstein inequality-based bounds to show that $$\tilde{{\mathcal {O}}}( (1-\gamma )^{-1} \alpha ^{-2} )$$ O ~ ( ( 1 - γ ) - 1 α - 2 ) samples are sufficient. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and provide explicit bounds on the approximation error. We illustrate the results via a numerical example.

Suggested Citation

Jayakumar Subramanian & Amit Sinha & Aditya Mahajan, 2023. "Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games," Dynamic Games and Applications, Springer, vol. 13(1), pages 56-88, March.

Handle: RePEc:spr:dyngam:v:13:y:2023:i:1:d:10.1007_s13235-023-00490-2
DOI: 10.1007/s13235-023-00490-2

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Herings, P. Jean-Jacques & Peeters, Ronald J. A. P., 2004. "Stationary equilibria in stochastic games: structure, selection, and computation," Journal of Economic Theory, Elsevier, vol. 118(1), pages 32-60, September.
- Herings, P.J.J. & Peeters, R.J.A.P., 2000. "Stationary equilibria in stochastic games : structure, selection, and computation," Research Memorandum 031, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
Patrick Bajari & C. Lanier Benkard & Jonathan Levin, 2007. "Estimating Dynamic Models of Imperfect Competition," Econometrica, Econometric Society, vol. 75(5), pages 1331-1370, September.
- Patrick Bajari & C. Lanier Benkard & Jonathan Levin, 2004. "Estimating Dynamic Models of Imperfect Competition," NBER Working Papers 10450, National Bureau of Economic Research, Inc.
- Bajari, Patrick & Benkard, C. Lanier & Levin, Jonathan, 2007. "Estimating Dynamic Models of Imperfect Competition," Research Papers 1852r1, Stanford University, Graduate School of Business.
- Jonathan Levin (Stanford University) & Pat Bajari & Lanier Benkard, 2004. "Estimating Dynamic Models of Imperfect Competition," Econometric Society 2004 North American Winter Meetings 627, Econometric Society.
- J. Levin & P. Bajari, 2004. "Estimating Dynamic Models of Imperfect Competition," 2004 Meeting Papers 579, Society for Economic Dynamics.
Daron Acemoglu & James A. Robinson, 2001. "A Theory of Political Transitions," American Economic Review, American Economic Association, vol. 91(4), pages 938-963, September.
- Acemoglu, Daron & Robinson, James A, 1999. "A Theory of Political Transitions," CEPR Discussion Papers 2277, C.E.P.R. Discussion Papers.
- Daron Acemoglu & James Robinson, 1999. "A Theory of Political Transitions," Working papers 99-26, Massachusetts Institute of Technology (MIT), Department of Economics.
Richard Ericson & Ariel Pakes, 1995. "Markov-Perfect Industry Dynamics: A Framework for Empirical Work," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 62(1), pages 53-82.
Maskin, Eric & Tirole, Jean, 1988. "A Theory of Dynamic Oligopoly, I: Overview and Quantity Competition with Large Fixed Costs," Econometrica, Econometric Society, vol. 56(3), pages 549-569, May.
- J. Tirole & E. Maskin, 1982. "A Theory of Dynamic Oligopoly, I: Overview and Quantity Competition with Large-Fixed Costs," Working papers 320, Massachusetts Institute of Technology (MIT), Department of Economics.
- Eric Maskin & Jean Tirole, 2010. "A Theory of Dynamic Oligopoly, 1: Overview and Quantity Competition with Large Fixed Costs," Levine's Working Paper Archive 397, David K. Levine.
Maskin, Eric & Tirole, Jean, 2001. "Markov Perfect Equilibrium: I. Observable Actions," Journal of Economic Theory, Elsevier, vol. 100(2), pages 191-219, October.
- Eric Maskin & Jean Tirole, 1997. "Markov Perfect Equilibrium, I: Observable Actions," Harvard Institute of Economic Research Working Papers 1799, Harvard - Institute of Economic Research.
Alfred Müller, 1997. "How Does the Value Function of a Markov Decision Process Depend on the Transition Probabilities?," Mathematics of Operations Research, INFORMS, vol. 22(4), pages 872-885, November.
Victor Aguirregabiria & Pedro Mira, 2007. "Sequential Estimation of Dynamic Discrete Games," Econometrica, Econometric Society, vol. 75(1), pages 1-53, January.
- Victor Aguirregabiria & Pedro Mira, 2004. "Sequential Estimation of Dynamic Discrete Games," Industrial Organization 0406006, University Library of Munich, Germany.
- Víctor Aguirregabiria & Pedro Mira, 2004. "Sequential Estimation of Dynamic Discrete Games," Working Papers wp2004_0413, CEMFI.
Martin Pesendorfer & Philipp Schmidt-Dengler, 2008. "Asymptotic Least Squares Estimators for Dynamic Games -super-1," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 75(3), pages 901-928.
P. Herings & Ronald Peeters, 2010. "Homotopy methods to compute equilibria in game theory," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 42(1), pages 119-156, January.
- Herings, P.J.J. & Peeters, R.J.A.P., 2006. "Homotopy methods to compute equilibria in game theory," Research Memorandum 046, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
, & ,, 2010. "A theory of regular Markov perfect equilibria in dynamic stochastic games: genericity, stability, and purification," Theoretical Economics, Econometric Society, vol. 5(3), September.
- Juan Escobar & Ulrich Doraszelski, 2008. "A Theory of Regular Markov Perfect Equilibria\\in Dynamic Stochastic Games: Genericity, Stability, and Purification," 2008 Meeting Papers 453, Society for Economic Dynamics.
- Doraszelski, Ulrich & Escobar, Juan, 2008. "A Theory of Regular Markov Perfect Equilibria in Dynamic Stochastic Games: Genericity, Stability, and Purification," CEPR Discussion Papers 6805, C.E.P.R. Discussion Papers.
Maskin, Eric & Tirole, Jean, 1988. "A Theory of Dynamic Oligopoly, II: Price Competition, Kinked Demand Curves, and Edgeworth Cycles," Econometrica, Econometric Society, vol. 56(3), pages 571-599, May.
Ariel Pakes & Michael Ostrovsky & Steven Berry, 2007. "Simple estimators for the parameters of discrete dynamic games (with entry/exit examples)," RAND Journal of Economics, RAND Corporation, vol. 38(2), pages 373-399, June.
- Ariel Pakes & Michael Ostrovsky & Steve Berry, 2004. "Simple Estimators for the Parameters of Discrete Dynamic Games (with Entry/Exit Samples)," NBER Working Papers 10506, National Bureau of Economic Research, Inc.
- Ariel Pakes & Michael Ostrovsky & Steve Berry, 2004. "Simple Estimators for the Parameters of Discrete Dynamic Games (with Entry/Exit Examples)," Harvard Institute of Economic Research Working Papers 2036, Harvard - Institute of Economic Research.
Mailath, George J. & Samuelson, Larry, 2006. "Repeated Games and Reputations: Long-Run Relationships," OUP Catalogue, Oxford University Press, number 9780195300796.
K. Hinderer, 2005. "Lipschitz Continuity of Value Functions in Markovian Decision Processes," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 62(1), pages 3-22, September.
Maskin, Eric & Tirole, Jean, 1988. "Corrigendum to 'A Theory of Dynamic Oligopoly, III, Cournot Competition' (vol. 31, no. 4)," European Economic Review, Elsevier, vol. 32(7), pages 1567-1568, September.
A. J. Hoffman & R. M. Karp, 1966. "On Nonterminating Stochastic Games," Management Science, INFORMS, vol. 12(5), pages 359-370, January.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Ulrich Doraszelski & Mark Satterthwaite, 2010. "Computable Markov‐perfect industry dynamics," RAND Journal of Economics, RAND Corporation, vol. 41(2), pages 215-243, June.
, & ,, 2010. "A theory of regular Markov perfect equilibria in dynamic stochastic games: genericity, stability, and purification," Theoretical Economics, Econometric Society, vol. 5(3), September.
- Juan Escobar & Ulrich Doraszelski, 2008. "A Theory of Regular Markov Perfect Equilibria\\in Dynamic Stochastic Games: Genericity, Stability, and Purification," 2008 Meeting Papers 453, Society for Economic Dynamics.
- Doraszelski, Ulrich & Escobar, Juan, 2008. "A Theory of Regular Markov Perfect Equilibria in Dynamic Stochastic Games: Genericity, Stability, and Purification," CEPR Discussion Papers 6805, C.E.P.R. Discussion Papers.
Joao Macieira, 2010. "Oblivious Equilibrium in Dynamic Discrete Games," 2010 Meeting Papers 680, Society for Economic Dynamics.
Victor Aguirregabiria & Margaret Slade, 2017. "Empirical models of firms and industries," Canadian Journal of Economics, Canadian Economics Association, vol. 50(5), pages 1445-1488, December.
- Victor Aguirregabiria & Margaret Slade, 2017. "Empirical models of firms and industries," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 50(5), pages 1445-1488, December.
- Victor Aguirregabiria & Margaret Slade, 2017. "Empirical Models of Firms and Industries," Working Papers tecipa-580, University of Toronto, Department of Economics.
- Aguirregabiria, Victor & Slade, Margaret E., 2017. "Empirical Models of Firms and Industries," Microeconomics.ca working papers margaret_e._slade-2017-4, Vancouver School of Economics, revised 27 Apr 2017.
- Aguirregabiria, Victor & Slade, Margaret, 2017. "Empirical Models of Firms and Industries," CEPR Discussion Papers 12074, C.E.P.R. Discussion Papers.
Aguirregabiria, Victor & Nevo, Aviv, 2010. "Recent developments in empirical IO: dynamic demand and dynamic games," MPRA Paper 27814, University Library of Munich, Germany.
- Victor Aguirregabiria & Victor Aguirregabiria & Aviv Nevo & Aviv Nevo, 2010. "Recent Developments in Empirical IO: Dynamic Demand and Dynamic Games," Working Papers tecipa-419, University of Toronto, Department of Economics.
Carlos Daniel Santos, 2009. "Recovering the Sunk Costs of R&D: the Moulds Industry Case," CEP Discussion Papers dp0958, Centre for Economic Performance, LSE.
- Santos, Carlos Daniel, 2009. "Recovering the sunk costs of R&D: the moulds industry case," LSE Research Online Documents on Economics 28689, London School of Economics and Political Science, LSE Library.
Luo, Yao & Xiao, Ping & Xiao, Ruli, 2022. "Identification of dynamic games with unobserved heterogeneity and multiple equilibria," Journal of Econometrics, Elsevier, vol. 226(2), pages 343-367.
Ron Borkovsky & Ulrich Doraszelski & Yaroslav Kryukov, 2012. "A dynamic quality ladder model with entry and exit: Exploring the equilibrium correspondence using the homotopy method," Quantitative Marketing and Economics (QME), Springer, vol. 10(2), pages 197-229, June.
Paul S. Koh, 2022. "Estimating Dynamic Games with Unknown Information Structure," Papers 2205.03706, arXiv.org, revised May 2022.
Peter Arcidiacono & Patrick Bayer & Jason R. Blevins & Paul B. Ellickson, 2016. "Estimation of Dynamic Discrete Choice Models in Continuous Time with an Application to Retail Competition," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 83(3), pages 889-931.
- Peter Arcidiacono & Patrick Bayer & Jason R. Blevins & Paul B. Ellickson, 2012. "Estimation of Dynamic Discrete Choice Models in Continuous Time with an Application to Retail Competition," NBER Working Papers 18449, National Bureau of Economic Research, Inc.
Pakes, Ariel, 2017. "Empirical tools and competition analysis: Past progress and current problems," International Journal of Industrial Organization, Elsevier, vol. 53(C), pages 241-266.
- Ariel Pakes, 2016. "Empirical Tools and Competition Analysis: Past Progress and Current Problems," NBER Working Papers 22086, National Bureau of Economic Research, Inc.
Tobias Salz & Emanuel Vespa, 2020. "Estimating dynamic games of oligopolistic competition: an experimental investigation," RAND Journal of Economics, RAND Corporation, vol. 51(2), pages 447-469, June.
- Tobias Salz & Emanuel Vespa, 2020. "Estimating Dynamic Games of Oligopolistic Competition: An Experimental Investigation," NBER Working Papers 26765, National Bureau of Economic Research, Inc.
Otsu, Taisuke & Pesendorfer, Martin & Takahashi, Yuya, 2013. "Testing for Equilibrium Multiplicity in Dynamic Markov Games," Discussion Paper Series of SFB/TR 15 Governance and the Efficiency of Economic Systems 423, Free University of Berlin, Humboldt University of Berlin, University of Bonn, University of Mannheim, University of Munich.
- Otsu, Taisuke & Pesendorfer, Martin & Takahashi, Yuya, 2013. "Testing for equilibrium multiplicity in dynamic Markov games," LSE Research Online Documents on Economics 101968, London School of Economics and Political Science, LSE Library.
Pesendorfer, Martin & Takahashi, Yuya & Otsu, Taisuke, 2014. "Testing Equilibrium Multiplicity in Dynamic Games," CEPR Discussion Papers 10111, C.E.P.R. Discussion Papers.
Taisuke Otsu & Martin Pesendorfer, 2021. "Equilibrium multiplicity in dynamic games: testing and estimation," STICERD - Econometrics Paper Series 618, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
- Otsu, Taisuke & Pesendorfer, Martin, 2023. "Equilibrium multiplicity in dynamic games: testing and estimation," LSE Research Online Documents on Economics 113588, London School of Economics and Political Science, LSE Library.
Abbring, Jaap & Campbell, J.R. & Tilly, J. & Yang, N., 2018. "Very Simple Markov-Perfect Industry Dynamics (revision of 2017-021) : Empirics," Discussion Paper 2018-040, Tilburg University, Center for Economic Research.
- Abbring, Jaap & Campbell, J.R. & Tilly, J. & Yang, N., 2018. "Very Simple Markov-Perfect Industry Dynamics (revision of 2017-021) : Empirics," Other publications TiSEM 3a12f099-900b-44ac-b692-a, Tilburg University, School of Economics and Management.
Jaap H. Abbring & Jeffrey R. Campbell & Jan Tilly & Nan Yang, 2018. "Very Simple Markov-Perfect Industry Dynamics: Empirics," Working Paper Series WP-2018-17, Federal Reserve Bank of Chicago.
Linli Xu & Jorge M. Silva-Risso & Kenneth C. Wilbur, 2018. "Dynamic Quality Ladder Model Predictions in Nonrandom Holdout Samples," Management Science, INFORMS, vol. 64(7), pages 3187-3207, July.
Johannes Van Biesebroeck & Aamir Hashmi, 2007. "Market Structure and Innovation: A Dynamic Analysis of the Global Automobile Industry," 2007 Meeting Papers 362, Society for Economic Dynamics.
- Aamir Rafique Hashmi & Johannes Van Biesebroeck, 2010. "Market Structure and Innovation: A Dynamic Analysis of the Global Automobile Industry," NBER Working Papers 15959, National Bureau of Economic Research, Inc.
- Van Biesebroeck, Johannes & Hashmi, Aamir Rafique, 2012. "The Relationship between Market Structure and Innovation in Industry Equilibrium: A Case Study of the Global Automobile Industr," CEPR Discussion Papers 8783, C.E.P.R. Discussion Papers.
Aguirregabiria, Victor & Mira, Pedro, 2010. "Dynamic discrete choice structural models: A survey," Journal of Econometrics, Elsevier, vol. 156(1), pages 38-67, May.
- Víctor Aguirregabiria & Pedro Mira, 2007. "Dynamic Discrete Choice Structural Models: A Survey," Working Papers wp2007_0711, CEMFI.
- Victor Aguirregabiria & Pedro mira, 2007. "Dynamic Discrete Choice Structural Models: A Survey," Working Papers tecipa-297, University of Toronto, Department of Economics.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:dyngam:v:13:y:2023:i:1:d:10.1007_s13235-023-00490-2. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data