Emergence of cooperation in two-agent repeated games with reinforcement learning

My bibliography Save this article

Emergence of cooperation in two-agent repeated games with reinforcement learning

Author

Listed:

Ding, Zhen-Wei
Zheng, Guo-Zhong
Cai, Chao-Ran
Cai, Wei-Ran
Chen, Li
Zhang, Ji-Qiang
Wang, Xu-Ming

Registered:

Abstract

Cooperation is the foundation of ecosystems and the human society, and the reinforcement learning provides crucial insight into the mechanism for its emergence. However, most previous work has mostly focused on the self-organization at the population level, the fundamental dynamics at the individual level remains unclear. Here, we investigate the evolution of cooperation in a two-agent system, where each agent pursues optimal policies according to the classical Q-learning algorithm in playing the strict prisoner’s dilemma. We reveal that a strong memory and long-sighted expectation yield the emergence of Coordinated Optimal Policies (COPs), where both agents act like “Win-Stay, Lose-Shift” (WSLS) to maintain a high level of cooperation. Otherwise, players become tolerant toward their co-player’s defection and the cooperation loses stability in the end where the policy “all Defection” (All-D) prevails. This suggests that tolerance could be a good precursor to a crisis in cooperation. Furthermore, our analysis shows that the Coordinated Optimal Modes (COMs) for different COPs gradually lose stability as memory weakens and expectation for the future decreases, where agents fail to predict co-player’s action in games and defection dominates. As a result, we give the constraint to expectations of future and memory strength for maintaining cooperation. In contrast to the previous work, the impact of exploration on cooperation is found not be consistent, but depends on composition of COMs. By clarifying these fundamental issues in this two-player system, we hope that our work could be helpful for understanding the emergence and stability of cooperation in more complex scenarios in reality.

Suggested Citation

Ding, Zhen-Wei & Zheng, Guo-Zhong & Cai, Chao-Ran & Cai, Wei-Ran & Chen, Li & Zhang, Ji-Qiang & Wang, Xu-Ming, 2023. "Emergence of cooperation in two-agent repeated games with reinforcement learning," Chaos, Solitons & Fractals, Elsevier, vol. 175(P1).

Handle: RePEc:eee:chsofr:v:175:y:2023:i:p1:s0960077923009335
DOI: 10.1016/j.chaos.2023.114032

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Kreps, David M. & Milgrom, Paul & Roberts, John & Wilson, Robert, 1982. "Rational cooperation in the finitely repeated prisoners' dilemma," Journal of Economic Theory, Elsevier, vol. 27(2), pages 245-252, August.
- David Kreps & Paul Milgrom & John Roberts & Bob Wilson, 2010. "Rational Cooperation in the Finitely Repeated Prisoners' Dilemma," Levine's Working Paper Archive 239, David K. Levine.
Ashleigh S. Griffin & Stuart A. West & Angus Buckling, 2004. "Cooperation and competition in pathogenic bacteria," Nature, Nature, vol. 430(7003), pages 1024-1027, August.
Gabriele Camera & Marco Casari, 2009. "Cooperation among Strangers under the Shadow of the Future," American Economic Review, American Economic Association, vol. 99(3), pages 979-1005, June.
Yoella Bereby-Meyer & Alvin E. Roth, 2006. "The Speed of Learning in Noisy Games: Partial Reinforcement and the Sustainability of Cooperation," American Economic Review, American Economic Association, vol. 96(4), pages 1029-1042, September.
- Roth, Alvin & Bereby-Meyer, Yoella, 2006. "The Speed of Learning in Noisy Games: Partial Reinforcement and the Sustainability of Cooperation," Scholarly Articles 2580381, Harvard University Department of Economics.
Pedro Dal Bó & Guillaume R. Fréchette, 2019. "Strategy Choice in the Infinitely Repeated Prisoner's Dilemma," American Economic Review, American Economic Association, vol. 109(11), pages 3929-3952, November.
- Dal Bó, Pedro & Fréchette, Guillaume R., 2013. "Strategy choice in the infinitely repeated prisoners' dilemma," Discussion Papers, Research Unit: Economics of Change SP II 2013-311, WZB Berlin Social Science Center.
Li, Dandan & Zhou, Kai & Sun, Mei & Han, Dun, 2023. "Investigating the effectiveness of individuals’ historical memory for the evolution of the prisoner’s dilemma game," Chaos, Solitons & Fractals, Elsevier, vol. 170(C).
Hans-Theo Normann & Brian Wallace, 2012. "The impact of the termination rule on cooperation in a prisoner’s dilemma experiment," International Journal of Game Theory, Springer;Game Theory Society, vol. 41(3), pages 707-718, August.
- Normann, Hans-Theo & Wallace, Brian, 2011. "The impact of the termination rule on cooperation in a prisoner's dilemma experiment," DICE Discussion Papers 19, Heinrich Heine University Düsseldorf, Düsseldorf Institute for Competition Economics (DICE).
Usui, Yuki & Ueda, Masahiko, 2021. "Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner’s dilemma," Applied Mathematics and Computation, Elsevier, vol. 409(C).
Jia, Danyang & Li, Tong & Zhao, Yang & Zhang, Xiaoqin & Wang, Zhen, 2022. "Empty nodes affect conditional cooperation under reinforcement learning," Applied Mathematics and Computation, Elsevier, vol. 413(C).
Christian Hilbe & Krishnendu Chatterjee & Martin A. Nowak, 2018. "Publisher Correction: Partners and rivals in direct reciprocity," Nature Human Behaviour, Nature, vol. 2(7), pages 523-523, July.
Andreoni, James A & Miller, John H, 1993. "Rational Cooperation in the Finitely Repeated Prisoner's Dilemma: Experimental Evidence," Economic Journal, Royal Economic Society, vol. 103(418), pages 570-585, May.
- Andreoni, J. & Miller, J.H., 1991. "Rational Cooperative in the Finitely Repeated Prisoner's Dilemma: Experimental Evidence," Working papers 9102, Wisconsin Madison - Social Systems.
- James Andreoni & John H Miller, 1997. "Rational Cooperation in the finitely repeated prisoner's dilemma: experimental evidence," Levine's Working Paper Archive 670, David K. Levine.
Christian Hilbe & Krishnendu Chatterjee & Martin A. Nowak, 2018. "Partners and rivals in direct reciprocity," Nature Human Behaviour, Nature, vol. 2(7), pages 469-477, July.
You, Tao & Yang, Haochun & Wang, Jian & Zhang, Peng & Chen, Jinchao & Zhang, Ying, 2023. "Cooperative behavior under the influence of multiple experienced guiders in Prisoner’s dilemma game," Applied Mathematics and Computation, Elsevier, vol. 458(C).
Marie Devaine & Guillaume Hollard & Jean Daunizeau, 2014. "Theory of Mind: Did Evolution Fool Us?," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-12, February.
Deng, Xinyang & Zhang, Zhipeng & Deng, Yong & Liu, Qi & Chang, Shuhua, 2016. "Self-adaptive win-stay-lose-shift reference selection mechanism promotes cooperation on a square lattice," Applied Mathematics and Computation, Elsevier, vol. 284(C), pages 322-331.
J. Keith Murnighan & Alvin E. Roth, 1983. "Expecting Continued Play in Prisoner's Dilemma Games," Journal of Conflict Resolution, Peace Science Society (International), vol. 27(2), pages 279-300, June.
Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
J. M. Meylahn & L. Janssen & Hassan Zargarzadeh, 2022. "Limiting Dynamics for Q-Learning with Memory One in Symmetric Two-Player, Two-Action Games," Complexity, Hindawi, vol. 2022, pages 1-20, November.
Momchil S. Tomov & Eric Schulz & Samuel J. Gershman, 2021. "Multi-task reinforcement learning in humans," Nature Human Behaviour, Nature, vol. 5(6), pages 764-773, June.
Hilbe, Christian & Traulsen, Arne & Sigmund, Karl, 2015. "Partners or rivals? Strategies for the iterated prisoner's dilemma," Games and Economic Behavior, Elsevier, vol. 92(C), pages 41-52.
Zhu, Wenqiang & Pan, Qiuhui & Song, Sha & He, Mingfeng, 2023. "Effects of exposure-based reward and punishment on the evolution of cooperation in prisoner’s dilemma game," Chaos, Solitons & Fractals, Elsevier, vol. 172(C).
Marc Harper & Vincent Knight & Martin Jones & Georgios Koutsovoulos & Nikoleta E Glynatsi & Owen Campbell, 2017. "Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-33, December.
Wolfram Barfuss & Janusz Meylahn, 2022. "Intrinsic fluctuations of reinforcement learning promote cooperation," Papers 2209.01013, arXiv.org, revised Feb 2023.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Maria Bigoni & Marco Casari & Andrzej Skrzypacz & Giancarlo Spagnolo, 2015. "Time Horizon and Cooperation in Continuous Time," Econometrica, Econometric Society, vol. 83, pages 587-616, March.
- Maria Bigoni & Marco Casari & Andrzej Skrzypacz & Giancarlo Spagnolo, 2011. "Time Horizon and Cooperation in Continuous Time," EIEF Working Papers Series 1116, Einaudi Institute for Economics and Finance (EIEF), revised Jan 2013.
- Bigoni, Maria & Casari, Marco & Skrzypacz, Andrzej & Spagnolo, Giancarlo, 2013. "Time Horizon and Cooperation in Continuous Time," Research Papers 2088r, Stanford University, Graduate School of Business.
- M. Bigoni & M. Casari & A. Skrzypacz & G. Spagnolo, 2011. "Time Horizon and Cooperation in Continuous Time," Working Papers wp796, Dipartimento Scienze Economiche, Universita' di Bologna.
- Bigoni, Maria & Casari, Marco & Skrzypacz, Andrzej & Spagnolo, Giancarlo, 2011. "Time Horizon and Cooperation in Continuous Time," Research Papers 2088, Stanford University, Graduate School of Business.
Pedro Dal Bo & Guillaume R. Frochette, 2011. "The Evolution of Cooperation in Infinitely Repeated Games: Experimental Evidence," American Economic Review, American Economic Association, vol. 101(1), pages 411-429, February.
- Pedro Dal Bo & Guillaume R. Frechette, 2007. "The Evolution of Cooperation in Infinitely Repeated Games: Experimental Evidence," Working Papers 2007-7, Brown University, Department of Economics.
Kamei, Kenju, 2019. "Cooperation and Endogenous Repetition in an Infinitely Repeated Social Dilemma: Experimental Evidence," MPRA Paper 92097, University Library of Munich, Germany.
Lugovskyy, Volodymyr & Puzzello, Daniela & Sorensen, Andrea & Walker, James & Williams, Arlington, 2017. "An experimental study of finitely and infinitely repeated linear public goods games," Games and Economic Behavior, Elsevier, vol. 102(C), pages 286-302.
Todd Guilfoos & Andreas Pape, 2016. "Predicting human cooperation in the Prisoner’s Dilemma using case-based decision theory," Theory and Decision, Springer, vol. 80(1), pages 1-32, January.
- Todd Guilfoos & Andreas Duus Pape, 2016. "Predicting human cooperation in the Prisoner’s Dilemma using case-based decision theory," Theory and Decision, Springer, vol. 80(1), pages 1-32, January.
Anujit Chakraborty, 2022. "Motives Behind Cooperation in Finitely Repeated Prisoner's Dilemma," Working Papers 353, University of California, Davis, Department of Economics.
Chakraborty, Anujit, 2023. "Motives behind cooperation in finitely repeated prisoner's dilemma," Games and Economic Behavior, Elsevier, vol. 141(C), pages 105-132.
Caleb Cox & Matthew Jones & Kevin Pflum & Paul Healy, 2015. "Revealed reputations in the finitely repeated prisoners’ dilemma," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 58(3), pages 441-484, April.
Kamei, Kenju, 2016. "Information Disclosure and Cooperation in a Finitely-repeated Dilemma: Experimental Evidence," MPRA Paper 75100, University Library of Munich, Germany.
Goeschl, Timo & Jarke, Johannes, 2014. "Trust, but verify? When trustworthiness is observable only through (costly) monitoring," WiSo-HH Working Paper Series 20, University of Hamburg, Faculty of Business, Economics and Social Sciences, WISO Research Laboratory.
Kartal, Melis & Müller, Wieland & Tremewan, James, 2021. "Building trust: The costs and benefits of gradualism," Games and Economic Behavior, Elsevier, vol. 130(C), pages 258-275.
Pedro Dal Bó, 2007. "Tacit collusion under interest rate fluctuations," RAND Journal of Economics, RAND Corporation, vol. 38(2), pages 533-540, June.
- Pedro Dal Bó, 2001. "Tacit Collusion under Interest Rate Fluctuations," Theory workshop papers 357966000000000030, UCLA Department of Economics.
- Pedro Dal Bó, 2002. "Tacit Collusion Under Intrest Rate Fluctuations," Working Papers 2002-21, Brown University, Department of Economics.
Kenju Kamei, 2019. "Cooperation and endogenous repetition in an infinitely repeated social dilemma," International Journal of Game Theory, Springer;Game Theory Society, vol. 48(3), pages 797-834, September.
- Kenju Kamei, 2017. "Cooperation and Endogenous Repetition in an Infinitely Repeated Social Dilemma," Working Papers 2017_08, Durham University Business School.
Miriam Al Lily, 2023. "Establishing human connections: experimental evidence from the helping game," International Journal of Game Theory, Springer;Game Theory Society, vol. 52(3), pages 805-832, September.
Wang, Xianjia & Yang, Zhipeng & Liu, Yanli & Chen, Guici, 2023. "A reinforcement learning-based strategy updating model for the cooperative evolution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 618(C).
repec:cup:judgdm:v:4:y:2009:i:5:p:363-384 is not listed on IDEAS
Pedro Dal Bó, 2005. "Cooperation under the Shadow of the Future: Experimental Evidence from Infinitely Repeated Games," American Economic Review, American Economic Association, vol. 95(5), pages 1591-1604, December.
- Pedro Dal Bó, 2002. "Cooperation Under the Shadow of the Future: Experimental Evidence from Infinitely Repeated Games," Working Papers 2002-20, Brown University, Department of Economics.
Maria Kleshnina & Christian Hilbe & Štěpán Šimsa & Krishnendu Chatterjee & Martin A. Nowak, 2023. "The effect of environmental information on evolution of cooperation in stochastic games," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
Ernesto Reuben & Sigrid Suetens, 2012. "Revisiting strategic versus non-strategic cooperation," Experimental Economics, Springer;Economic Science Association, vol. 15(1), pages 24-43, March.
- Reuben, E. & Suetens, S., 2009. "Revisiting Strategic versus Non-strategic Cooperation," Discussion Paper 2009-22, Tilburg University, Center for Economic Research.
- Reuben, E. & Suetens, S., 2009. "Revisiting Strategic versus Non-strategic Cooperation," Other publications TiSEM 4ed16b68-4a46-4565-a6ba-6, Tilburg University, School of Economics and Management.
- Reuben, Ernesto & Suetens, Sigrid, 2009. "Revisiting Strategic versus Non-Strategic Cooperation," IZA Discussion Papers 4107, Institute of Labor Economics (IZA).
repec:tiu:tiucen:200922 is not listed on IDEAS
repec:dgr:kubcen:200922 is not listed on IDEAS
Howard Kunreuther & Gabriel Silvasi & Eric T. Bradlow & Dylan Small, 2007. "Deterministic and Stochastic Prisoner's Dilemma Games: Experiments in Interdependent Security," NBER Technical Working Papers 0341, National Bureau of Economic Research, Inc.
Molnar, Grant & Hammond, Caroline & Fu, Feng, 2023. "Reactive means in the iterated Prisoner’s dilemma," Applied Mathematics and Computation, Elsevier, vol. 458(C).

More about this item

Keywords

Nonlinear dynamics; Cooperation; Repeated game; Reinforcement learning;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:chsofr:v:175:y:2023:i:p1:s0960077923009335. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Thayer, Thomas R. (email available below). General contact details of provider: https://www.journals.elsevier.com/chaos-solitons-and-fractals .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Emergence of cooperation in two-agent repeated games with reinforcement learning

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data