IDEAS home Printed from https://ideas.repec.org/a/spr/dyngam/v13y2023i1d10.1007_s13235-023-00490-2.html
   My bibliography  Save this article

Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

Author

Listed:
  • Jayakumar Subramanian

    (Adobe Inc.)

  • Amit Sinha

    (McGill University)

  • Aditya Mahajan

    (McGill University)

Abstract

Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model-based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of sample complexity for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality-based bounds to show that $$\tilde{{\mathcal {O}}}( (1-\gamma )^{-4} \alpha ^{-2})$$ O ~ ( ( 1 - γ ) - 4 α - 2 ) samples per state–action pair are sufficient to obtain a $$\alpha $$ α -approximate Markov perfect equilibrium with high probability, where $$\gamma $$ γ is the discount factor, and the $$\tilde{{\mathcal {O}}}(\cdot )$$ O ~ ( · ) notation hides logarithmic terms. We then use Bernstein inequality-based bounds to show that $$\tilde{{\mathcal {O}}}( (1-\gamma )^{-1} \alpha ^{-2} )$$ O ~ ( ( 1 - γ ) - 1 α - 2 ) samples are sufficient. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and provide explicit bounds on the approximation error. We illustrate the results via a numerical example.

Suggested Citation

  • Jayakumar Subramanian & Amit Sinha & Aditya Mahajan, 2023. "Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games," Dynamic Games and Applications, Springer, vol. 13(1), pages 56-88, March.
  • Handle: RePEc:spr:dyngam:v:13:y:2023:i:1:d:10.1007_s13235-023-00490-2
    DOI: 10.1007/s13235-023-00490-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13235-023-00490-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13235-023-00490-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Herings, P. Jean-Jacques & Peeters, Ronald J. A. P., 2004. "Stationary equilibria in stochastic games: structure, selection, and computation," Journal of Economic Theory, Elsevier, vol. 118(1), pages 32-60, September.
    2. Patrick Bajari & C. Lanier Benkard & Jonathan Levin, 2007. "Estimating Dynamic Models of Imperfect Competition," Econometrica, Econometric Society, vol. 75(5), pages 1331-1370, September.
    3. Daron Acemoglu & James A. Robinson, 2001. "A Theory of Political Transitions," American Economic Review, American Economic Association, vol. 91(4), pages 938-963, September.
    4. Richard Ericson & Ariel Pakes, 1995. "Markov-Perfect Industry Dynamics: A Framework for Empirical Work," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 62(1), pages 53-82.
    5. Maskin, Eric & Tirole, Jean, 1988. "A Theory of Dynamic Oligopoly, I: Overview and Quantity Competition with Large Fixed Costs," Econometrica, Econometric Society, vol. 56(3), pages 549-569, May.
    6. Maskin, Eric & Tirole, Jean, 2001. "Markov Perfect Equilibrium: I. Observable Actions," Journal of Economic Theory, Elsevier, vol. 100(2), pages 191-219, October.
    7. Alfred Müller, 1997. "How Does the Value Function of a Markov Decision Process Depend on the Transition Probabilities?," Mathematics of Operations Research, INFORMS, vol. 22(4), pages 872-885, November.
    8. Victor Aguirregabiria & Pedro Mira, 2007. "Sequential Estimation of Dynamic Discrete Games," Econometrica, Econometric Society, vol. 75(1), pages 1-53, January.
    9. Martin Pesendorfer & Philipp Schmidt-Dengler, 2008. "Asymptotic Least Squares Estimators for Dynamic Games -super-1," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 75(3), pages 901-928.
    10. P. Herings & Ronald Peeters, 2010. "Homotopy methods to compute equilibria in game theory," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 42(1), pages 119-156, January.
    11. , & ,, 2010. "A theory of regular Markov perfect equilibria in dynamic stochastic games: genericity, stability, and purification," Theoretical Economics, Econometric Society, vol. 5(3), September.
    12. Maskin, Eric & Tirole, Jean, 1988. "A Theory of Dynamic Oligopoly, II: Price Competition, Kinked Demand Curves, and Edgeworth Cycles," Econometrica, Econometric Society, vol. 56(3), pages 571-599, May.
    13. Ariel Pakes & Michael Ostrovsky & Steven Berry, 2007. "Simple estimators for the parameters of discrete dynamic games (with entry/exit examples)," RAND Journal of Economics, RAND Corporation, vol. 38(2), pages 373-399, June.
    14. Mailath, George J. & Samuelson, Larry, 2006. "Repeated Games and Reputations: Long-Run Relationships," OUP Catalogue, Oxford University Press, number 9780195300796.
    15. K. Hinderer, 2005. "Lipschitz Continuity of Value Functions in Markovian Decision Processes," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 62(1), pages 3-22, September.
    16. Maskin, Eric & Tirole, Jean, 1988. "Corrigendum to 'A Theory of Dynamic Oligopoly, III, Cournot Competition' (vol. 31, no. 4)," European Economic Review, Elsevier, vol. 32(7), pages 1567-1568, September.
    17. A. J. Hoffman & R. M. Karp, 1966. "On Nonterminating Stochastic Games," Management Science, INFORMS, vol. 12(5), pages 359-370, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ulrich Doraszelski & Mark Satterthwaite, 2010. "Computable Markov‐perfect industry dynamics," RAND Journal of Economics, RAND Corporation, vol. 41(2), pages 215-243, June.
    2. , & ,, 2010. "A theory of regular Markov perfect equilibria in dynamic stochastic games: genericity, stability, and purification," Theoretical Economics, Econometric Society, vol. 5(3), September.
    3. Joao Macieira, 2010. "Oblivious Equilibrium in Dynamic Discrete Games," 2010 Meeting Papers 680, Society for Economic Dynamics.
    4. Victor Aguirregabiria & Margaret Slade, 2017. "Empirical models of firms and industries," Canadian Journal of Economics, Canadian Economics Association, vol. 50(5), pages 1445-1488, December.
    5. Aguirregabiria, Victor & Nevo, Aviv, 2010. "Recent developments in empirical IO: dynamic demand and dynamic games," MPRA Paper 27814, University Library of Munich, Germany.
    6. Carlos Daniel Santos, 2009. "Recovering the Sunk Costs of R&D: the Moulds Industry Case," CEP Discussion Papers dp0958, Centre for Economic Performance, LSE.
    7. Luo, Yao & Xiao, Ping & Xiao, Ruli, 2022. "Identification of dynamic games with unobserved heterogeneity and multiple equilibria," Journal of Econometrics, Elsevier, vol. 226(2), pages 343-367.
    8. Ron Borkovsky & Ulrich Doraszelski & Yaroslav Kryukov, 2012. "A dynamic quality ladder model with entry and exit: Exploring the equilibrium correspondence using the homotopy method," Quantitative Marketing and Economics (QME), Springer, vol. 10(2), pages 197-229, June.
    9. Paul S. Koh, 2022. "Estimating Dynamic Games with Unknown Information Structure," Papers 2205.03706, arXiv.org, revised May 2022.
    10. Peter Arcidiacono & Patrick Bayer & Jason R. Blevins & Paul B. Ellickson, 2016. "Estimation of Dynamic Discrete Choice Models in Continuous Time with an Application to Retail Competition," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 83(3), pages 889-931.
    11. Pakes, Ariel, 2017. "Empirical tools and competition analysis: Past progress and current problems," International Journal of Industrial Organization, Elsevier, vol. 53(C), pages 241-266.
    12. Tobias Salz & Emanuel Vespa, 2020. "Estimating dynamic games of oligopolistic competition: an experimental investigation," RAND Journal of Economics, RAND Corporation, vol. 51(2), pages 447-469, June.
    13. Otsu, Taisuke & Pesendorfer, Martin & Takahashi, Yuya, 2013. "Testing for Equilibrium Multiplicity in Dynamic Markov Games," Discussion Paper Series of SFB/TR 15 Governance and the Efficiency of Economic Systems 423, Free University of Berlin, Humboldt University of Berlin, University of Bonn, University of Mannheim, University of Munich.
    14. Pesendorfer, Martin & Takahashi, Yuya & Otsu, Taisuke, 2014. "Testing Equilibrium Multiplicity in Dynamic Games," CEPR Discussion Papers 10111, C.E.P.R. Discussion Papers.
    15. Taisuke Otsu & Martin Pesendorfer, 2021. "Equilibrium multiplicity in dynamic games: testing and estimation," STICERD - Econometrics Paper Series 618, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    16. Abbring, Jaap & Campbell, J.R. & Tilly, J. & Yang, N., 2018. "Very Simple Markov-Perfect Industry Dynamics (revision of 2017-021) : Empirics," Discussion Paper 2018-040, Tilburg University, Center for Economic Research.
    17. Jaap H. Abbring & Jeffrey R. Campbell & Jan Tilly & Nan Yang, 2018. "Very Simple Markov-Perfect Industry Dynamics: Empirics," Working Paper Series WP-2018-17, Federal Reserve Bank of Chicago.
    18. Linli Xu & Jorge M. Silva-Risso & Kenneth C. Wilbur, 2018. "Dynamic Quality Ladder Model Predictions in Nonrandom Holdout Samples," Management Science, INFORMS, vol. 64(7), pages 3187-3207, July.
    19. Johannes Van Biesebroeck & Aamir Hashmi, 2007. "Market Structure and Innovation: A Dynamic Analysis of the Global Automobile Industry," 2007 Meeting Papers 362, Society for Economic Dynamics.
    20. Aguirregabiria, Victor & Mira, Pedro, 2010. "Dynamic discrete choice structural models: A survey," Journal of Econometrics, Elsevier, vol. 156(1), pages 38-67, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:dyngam:v:13:y:2023:i:1:d:10.1007_s13235-023-00490-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.