IDEAS home Printed from https://ideas.repec.org/a/eee/chsofr/v175y2023ip1s0960077923009335.html
   My bibliography  Save this article

Emergence of cooperation in two-agent repeated games with reinforcement learning

Author

Listed:
  • Ding, Zhen-Wei
  • Zheng, Guo-Zhong
  • Cai, Chao-Ran
  • Cai, Wei-Ran
  • Chen, Li
  • Zhang, Ji-Qiang
  • Wang, Xu-Ming

Abstract

Cooperation is the foundation of ecosystems and the human society, and the reinforcement learning provides crucial insight into the mechanism for its emergence. However, most previous work has mostly focused on the self-organization at the population level, the fundamental dynamics at the individual level remains unclear. Here, we investigate the evolution of cooperation in a two-agent system, where each agent pursues optimal policies according to the classical Q-learning algorithm in playing the strict prisoner’s dilemma. We reveal that a strong memory and long-sighted expectation yield the emergence of Coordinated Optimal Policies (COPs), where both agents act like “Win-Stay, Lose-Shift” (WSLS) to maintain a high level of cooperation. Otherwise, players become tolerant toward their co-player’s defection and the cooperation loses stability in the end where the policy “all Defection” (All-D) prevails. This suggests that tolerance could be a good precursor to a crisis in cooperation. Furthermore, our analysis shows that the Coordinated Optimal Modes (COMs) for different COPs gradually lose stability as memory weakens and expectation for the future decreases, where agents fail to predict co-player’s action in games and defection dominates. As a result, we give the constraint to expectations of future and memory strength for maintaining cooperation. In contrast to the previous work, the impact of exploration on cooperation is found not be consistent, but depends on composition of COMs. By clarifying these fundamental issues in this two-player system, we hope that our work could be helpful for understanding the emergence and stability of cooperation in more complex scenarios in reality.

Suggested Citation

  • Ding, Zhen-Wei & Zheng, Guo-Zhong & Cai, Chao-Ran & Cai, Wei-Ran & Chen, Li & Zhang, Ji-Qiang & Wang, Xu-Ming, 2023. "Emergence of cooperation in two-agent repeated games with reinforcement learning," Chaos, Solitons & Fractals, Elsevier, vol. 175(P1).
  • Handle: RePEc:eee:chsofr:v:175:y:2023:i:p1:s0960077923009335
    DOI: 10.1016/j.chaos.2023.114032
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0960077923009335
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.chaos.2023.114032?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kreps, David M. & Milgrom, Paul & Roberts, John & Wilson, Robert, 1982. "Rational cooperation in the finitely repeated prisoners' dilemma," Journal of Economic Theory, Elsevier, vol. 27(2), pages 245-252, August.
    2. Ashleigh S. Griffin & Stuart A. West & Angus Buckling, 2004. "Cooperation and competition in pathogenic bacteria," Nature, Nature, vol. 430(7003), pages 1024-1027, August.
    3. Gabriele Camera & Marco Casari, 2009. "Cooperation among Strangers under the Shadow of the Future," American Economic Review, American Economic Association, vol. 99(3), pages 979-1005, June.
    4. Yoella Bereby-Meyer & Alvin E. Roth, 2006. "The Speed of Learning in Noisy Games: Partial Reinforcement and the Sustainability of Cooperation," American Economic Review, American Economic Association, vol. 96(4), pages 1029-1042, September.
    5. Pedro Dal Bó & Guillaume R. Fréchette, 2019. "Strategy Choice in the Infinitely Repeated Prisoner's Dilemma," American Economic Review, American Economic Association, vol. 109(11), pages 3929-3952, November.
    6. Li, Dandan & Zhou, Kai & Sun, Mei & Han, Dun, 2023. "Investigating the effectiveness of individuals’ historical memory for the evolution of the prisoner’s dilemma game," Chaos, Solitons & Fractals, Elsevier, vol. 170(C).
    7. Hans-Theo Normann & Brian Wallace, 2012. "The impact of the termination rule on cooperation in a prisoner’s dilemma experiment," International Journal of Game Theory, Springer;Game Theory Society, vol. 41(3), pages 707-718, August.
    8. Usui, Yuki & Ueda, Masahiko, 2021. "Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner’s dilemma," Applied Mathematics and Computation, Elsevier, vol. 409(C).
    9. Jia, Danyang & Li, Tong & Zhao, Yang & Zhang, Xiaoqin & Wang, Zhen, 2022. "Empty nodes affect conditional cooperation under reinforcement learning," Applied Mathematics and Computation, Elsevier, vol. 413(C).
    10. Christian Hilbe & Krishnendu Chatterjee & Martin A. Nowak, 2018. "Publisher Correction: Partners and rivals in direct reciprocity," Nature Human Behaviour, Nature, vol. 2(7), pages 523-523, July.
    11. Andreoni, James A & Miller, John H, 1993. "Rational Cooperation in the Finitely Repeated Prisoner's Dilemma: Experimental Evidence," Economic Journal, Royal Economic Society, vol. 103(418), pages 570-585, May.
    12. Christian Hilbe & Krishnendu Chatterjee & Martin A. Nowak, 2018. "Partners and rivals in direct reciprocity," Nature Human Behaviour, Nature, vol. 2(7), pages 469-477, July.
    13. You, Tao & Yang, Haochun & Wang, Jian & Zhang, Peng & Chen, Jinchao & Zhang, Ying, 2023. "Cooperative behavior under the influence of multiple experienced guiders in Prisoner’s dilemma game," Applied Mathematics and Computation, Elsevier, vol. 458(C).
    14. Marie Devaine & Guillaume Hollard & Jean Daunizeau, 2014. "Theory of Mind: Did Evolution Fool Us?," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-12, February.
    15. Deng, Xinyang & Zhang, Zhipeng & Deng, Yong & Liu, Qi & Chang, Shuhua, 2016. "Self-adaptive win-stay-lose-shift reference selection mechanism promotes cooperation on a square lattice," Applied Mathematics and Computation, Elsevier, vol. 284(C), pages 322-331.
    16. J. Keith Murnighan & Alvin E. Roth, 1983. "Expecting Continued Play in Prisoner's Dilemma Games," Journal of Conflict Resolution, Peace Science Society (International), vol. 27(2), pages 279-300, June.
    17. Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
    18. J. M. Meylahn & L. Janssen & Hassan Zargarzadeh, 2022. "Limiting Dynamics for Q-Learning with Memory One in Symmetric Two-Player, Two-Action Games," Complexity, Hindawi, vol. 2022, pages 1-20, November.
    19. Momchil S. Tomov & Eric Schulz & Samuel J. Gershman, 2021. "Multi-task reinforcement learning in humans," Nature Human Behaviour, Nature, vol. 5(6), pages 764-773, June.
    20. Hilbe, Christian & Traulsen, Arne & Sigmund, Karl, 2015. "Partners or rivals? Strategies for the iterated prisoner's dilemma," Games and Economic Behavior, Elsevier, vol. 92(C), pages 41-52.
    21. Zhu, Wenqiang & Pan, Qiuhui & Song, Sha & He, Mingfeng, 2023. "Effects of exposure-based reward and punishment on the evolution of cooperation in prisoner’s dilemma game," Chaos, Solitons & Fractals, Elsevier, vol. 172(C).
    22. Marc Harper & Vincent Knight & Martin Jones & Georgios Koutsovoulos & Nikoleta E Glynatsi & Owen Campbell, 2017. "Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-33, December.
    23. Wolfram Barfuss & Janusz Meylahn, 2022. "Intrinsic fluctuations of reinforcement learning promote cooperation," Papers 2209.01013, arXiv.org, revised Feb 2023.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maria Bigoni & Marco Casari & Andrzej Skrzypacz & Giancarlo Spagnolo, 2015. "Time Horizon and Cooperation in Continuous Time," Econometrica, Econometric Society, vol. 83, pages 587-616, March.
    2. Pedro Dal Bo & Guillaume R. Frochette, 2011. "The Evolution of Cooperation in Infinitely Repeated Games: Experimental Evidence," American Economic Review, American Economic Association, vol. 101(1), pages 411-429, February.
    3. Kamei, Kenju, 2019. "Cooperation and Endogenous Repetition in an Infinitely Repeated Social Dilemma: Experimental Evidence," MPRA Paper 92097, University Library of Munich, Germany.
    4. Lugovskyy, Volodymyr & Puzzello, Daniela & Sorensen, Andrea & Walker, James & Williams, Arlington, 2017. "An experimental study of finitely and infinitely repeated linear public goods games," Games and Economic Behavior, Elsevier, vol. 102(C), pages 286-302.
    5. Todd Guilfoos & Andreas Pape, 2016. "Predicting human cooperation in the Prisoner’s Dilemma using case-based decision theory," Theory and Decision, Springer, vol. 80(1), pages 1-32, January.
    6. Anujit Chakraborty, 2022. "Motives Behind Cooperation in Finitely Repeated Prisoner's Dilemma," Working Papers 353, University of California, Davis, Department of Economics.
    7. Chakraborty, Anujit, 2023. "Motives behind cooperation in finitely repeated prisoner's dilemma," Games and Economic Behavior, Elsevier, vol. 141(C), pages 105-132.
    8. Caleb Cox & Matthew Jones & Kevin Pflum & Paul Healy, 2015. "Revealed reputations in the finitely repeated prisoners’ dilemma," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 58(3), pages 441-484, April.
    9. Kamei, Kenju, 2016. "Information Disclosure and Cooperation in a Finitely-repeated Dilemma: Experimental Evidence," MPRA Paper 75100, University Library of Munich, Germany.
    10. Goeschl, Timo & Jarke, Johannes, 2014. "Trust, but verify? When trustworthiness is observable only through (costly) monitoring," WiSo-HH Working Paper Series 20, University of Hamburg, Faculty of Business, Economics and Social Sciences, WISO Research Laboratory.
    11. Kartal, Melis & Müller, Wieland & Tremewan, James, 2021. "Building trust: The costs and benefits of gradualism," Games and Economic Behavior, Elsevier, vol. 130(C), pages 258-275.
    12. Pedro Dal Bó, 2007. "Tacit collusion under interest rate fluctuations," RAND Journal of Economics, RAND Corporation, vol. 38(2), pages 533-540, June.
    13. Kenju Kamei, 2019. "Cooperation and endogenous repetition in an infinitely repeated social dilemma," International Journal of Game Theory, Springer;Game Theory Society, vol. 48(3), pages 797-834, September.
    14. Miriam Al Lily, 2023. "Establishing human connections: experimental evidence from the helping game," International Journal of Game Theory, Springer;Game Theory Society, vol. 52(3), pages 805-832, September.
    15. Wang, Xianjia & Yang, Zhipeng & Liu, Yanli & Chen, Guici, 2023. "A reinforcement learning-based strategy updating model for the cooperative evolution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 618(C).
    16. repec:cup:judgdm:v:4:y:2009:i:5:p:363-384 is not listed on IDEAS
    17. Pedro Dal Bó, 2005. "Cooperation under the Shadow of the Future: Experimental Evidence from Infinitely Repeated Games," American Economic Review, American Economic Association, vol. 95(5), pages 1591-1604, December.
    18. Maria Kleshnina & Christian Hilbe & Štěpán Šimsa & Krishnendu Chatterjee & Martin A. Nowak, 2023. "The effect of environmental information on evolution of cooperation in stochastic games," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    19. Ernesto Reuben & Sigrid Suetens, 2012. "Revisiting strategic versus non-strategic cooperation," Experimental Economics, Springer;Economic Science Association, vol. 15(1), pages 24-43, March.
    20. repec:tiu:tiucen:200922 is not listed on IDEAS
    21. repec:dgr:kubcen:200922 is not listed on IDEAS
    22. Howard Kunreuther & Gabriel Silvasi & Eric T. Bradlow & Dylan Small, 2007. "Deterministic and Stochastic Prisoner's Dilemma Games: Experiments in Interdependent Security," NBER Technical Working Papers 0341, National Bureau of Economic Research, Inc.
    23. Molnar, Grant & Hammond, Caroline & Fu, Feng, 2023. "Reactive means in the iterated Prisoner’s dilemma," Applied Mathematics and Computation, Elsevier, vol. 458(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:chsofr:v:175:y:2023:i:p1:s0960077923009335. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Thayer, Thomas R. (email available below). General contact details of provider: https://www.journals.elsevier.com/chaos-solitons-and-fractals .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.