IDEAS home Printed from https://ideas.repec.org/a/spr/cejnor/v14y2006i1p59-86.html
   My bibliography  Save this article

Interaction dynamics of two reinforcement learners

Author

Listed:
  • Walter Gutjahr

Abstract

The paper investigates a stochastic model where two agents (persons, companies, institutions, states, software agents or other) learn interactive behavior in a series of alternating moves. Each agent is assumed to perform “stimulus-response-consequence” learning, as studied in psychology. In the presented model, the response of one agent to the other agent's move is both the stimulus for the other agent's next move and part of the consequence for the other agent's previous move. After deriving general properties of the model, especially concerning convergence to limit cycles, we concentrate on an asymptotic case where the learning rate tends to zero (“slow learning”). In this case, the dynamics can be described by a system of deterministic differential equations. For reward structures derived from [2×2] bimatrix games, fixed points are determined, and for the special case of the prisoner's dilemma, the dynamics is analyzed in more detail on the assumptions that both agents start with the same or with different reaction probabilities. Copyright Springer-Verlag 2006

Suggested Citation

  • Walter Gutjahr, 2006. "Interaction dynamics of two reinforcement learners," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 14(1), pages 59-86, February.
  • Handle: RePEc:spr:cejnor:v:14:y:2006:i:1:p:59-86
    DOI: 10.1007/s10100-006-0160-y
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s10100-006-0160-y
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s10100-006-0160-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ron Smith & Martin Sola & Fabio Spagnolo, 2000. "The Prisoner's Dilemma and Regime-Switching in the Greek-Turkish Arms Race," Journal of Peace Research, Peace Research Institute Oslo, vol. 37(6), pages 737-750, November.
    2. Fudenberg, Drew & Levine, David, 1998. "Learning in games," European Economic Review, Elsevier, vol. 42(3-5), pages 631-639, May.
    3. Laslier, Jean-Francois & Topol, Richard & Walliser, Bernard, 2001. "A Behavioral Learning Process in Games," Games and Economic Behavior, Elsevier, vol. 37(2), pages 340-366, November.
    4. Ed Hopkins, 2002. "Two Competing Models of How People Learn in Games," Econometrica, Econometric Society, vol. 70(6), pages 2141-2166, November.
    5. Thomas Brenner, 1999. "Modelling Learning in Economics," Books, Edward Elgar Publishing, number 1815.
    6. Borgers, Tilman & Sarin, Rajiv, 1997. "Learning Through Reinforcement and Replicator Dynamics," Journal of Economic Theory, Elsevier, vol. 77(1), pages 1-14, November.
    7. Michel BenaÔm & J–rgen W. Weibull, 2003. "Deterministic Approximation of Stochastic Evolution in Games," Econometrica, Econometric Society, vol. 71(3), pages 873-903, May.
    8. Roth, Alvin E. & Erev, Ido, 1995. "Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term," Games and Economic Behavior, Elsevier, vol. 8(1), pages 164-212.
    9. Richard J. Herrnstein & Drazen Prelec, 1991. "Melioration: A Theory of Distributed Choice," Journal of Economic Perspectives, American Economic Association, vol. 5(3), pages 137-156, Summer.
    10. Brian Skyrms & Robin Pemantle, 2004. "Learning to Network," Levine's Bibliography 122247000000000436, UCLA Department of Economics.
    11. M. Posch & A. Pichler & K. Sigmund, 1998. "The Efficiency of Adapting Aspiration Levels," Working Papers ir98103, International Institute for Applied Systems Analysis.
    12. Erev, Ido & Bereby-Meyer, Yoella & Roth, Alvin E., 1999. "The effect of adding a constant to all payoffs: experimental investigation, and implications for reinforcement learning models," Journal of Economic Behavior & Organization, Elsevier, vol. 39(1), pages 111-128, May.
    13. Greenwald, Amy & Friedman, Eric J. & Shenker, Scott, 2001. "Learning in Network Contexts: Experimental Results from Simulations," Games and Economic Behavior, Elsevier, vol. 35(1-2), pages 80-123, April.
    14. John G. Cross, 1973. "A Stochastic Learning Model of Economic Behavior," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 87(2), pages 239-266.
    15. Drew Fudenberg & David K. Levine, 1998. "The Theory of Learning in Games," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262061945, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ianni, Antonella, 2014. "Learning strict Nash equilibria through reinforcement," Journal of Mathematical Economics, Elsevier, vol. 50(C), pages 148-155.
    2. Mengel, Friederike, 2012. "Learning across games," Games and Economic Behavior, Elsevier, vol. 74(2), pages 601-619.
    3. Erik Mohlin & Robert Ostling & Joseph Tao-yi Wang, 2014. "Learning by Imitation in Games: Theory, Field, and Laboratory," Economics Series Working Papers 734, University of Oxford, Department of Economics.
    4. Ianni, A., 2002. "Reinforcement learning and the power law of practice: some analytical results," Discussion Paper Series In Economics And Econometrics 203, Economics Division, School of Social Sciences, University of Southampton.
    5. Mohlin, Erik & Östling, Robert & Wang, Joseph Tao-yi, 2020. "Learning by similarity-weighted imitation in winner-takes-all games," Games and Economic Behavior, Elsevier, vol. 120(C), pages 225-245.
    6. Oyarzun, Carlos & Sarin, Rajiv, 2013. "Learning and risk aversion," Journal of Economic Theory, Elsevier, vol. 148(1), pages 196-225.
    7. Jonathan Newton, 2018. "Evolutionary Game Theory: A Renaissance," Games, MDPI, vol. 9(2), pages 1-67, May.
    8. Duffy, John, 2006. "Agent-Based Models and Human Subject Experiments," Handbook of Computational Economics, in: Leigh Tesfatsion & Kenneth L. Judd (ed.), Handbook of Computational Economics, edition 1, volume 2, chapter 19, pages 949-1011, Elsevier.
    9. Izquierdo, Luis R. & Izquierdo, Segismundo S. & Gotts, Nicholas M. & Polhill, J. Gary, 2007. "Transient and asymptotic dynamics of reinforcement learning in games," Games and Economic Behavior, Elsevier, vol. 61(2), pages 259-276, November.
    10. Jean-François Laslier & Bernard Walliser, 2015. "Stubborn learning," Theory and Decision, Springer, vol. 79(1), pages 51-93, July.
    11. Franke, Reiner, 2003. "Reinforcement learning in the El Farol model," Journal of Economic Behavior & Organization, Elsevier, vol. 51(3), pages 367-388, July.
    12. Beggs, A.W., 2005. "On the convergence of reinforcement learning," Journal of Economic Theory, Elsevier, vol. 122(1), pages 1-36, May.
    13. Battalio,R. & Samuelson,L. & Huyck,J. van, 1998. "Risk dominance, payoff dominance and probabilistic choice learning," Working papers 2, Wisconsin Madison - Social Systems.
    14. Mertikopoulos, Panayotis & Sandholm, William H., 2018. "Riemannian game dynamics," Journal of Economic Theory, Elsevier, vol. 177(C), pages 315-364.
    15. Tilman Börgers & Antonio J. Morales & Rajiv Sarin, 2004. "Expedient and Monotone Learning Rules," Econometrica, Econometric Society, vol. 72(2), pages 383-405, March.
    16. Atanasios Mitropoulos, 2001. "Learning Under Little Information: An Experiment on Mutual Fate Control," Game Theory and Information 0110003, University Library of Munich, Germany.
    17. Ido Erev & Eyal Ert & Alvin E. Roth, 2010. "A Choice Prediction Competition for Market Entry Games: An Introduction," Games, MDPI, vol. 1(2), pages 1-20, May.
    18. Oyarzun, Carlos & Sarin, Rajiv, 2012. "Mean and variance responsive learning," Games and Economic Behavior, Elsevier, vol. 75(2), pages 855-866.
    19. Andreas Flache & Michael W. Macy, 2002. "Stochastic Collusion and the Power Law of Learning," Journal of Conflict Resolution, Peace Science Society (International), vol. 46(5), pages 629-653, October.
    20. Jaromír Kovářík & Friederike Mengel & José Gabriel Romero, 2018. "Learning in network games," Quantitative Economics, Econometric Society, vol. 9(1), pages 85-139, March.
      • Kovarik, Jaromir & Mengel, Friederike & Romero, José Gabriel, 2012. "Learning in Network Games," IKERLANAK http://www-fae1-eao1-ehu-, Universidad del País Vasco - Departamento de Fundamentos del Análisis Económico I.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:cejnor:v:14:y:2006:i:1:p:59-86. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.