Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma

My bibliography Save this article

Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma

Author

Listed:

Marc Harper
Vincent Knight
Martin Jones
Georgios Koutsovoulos
Nikoleta E Glynatsi
Owen Campbell

Registered:

Abstract

We present tournament results and several powerful strategies for the Iterated Prisoner’s Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

Suggested Citation

Marc Harper & Vincent Knight & Martin Jones & Georgios Koutsovoulos & Nikoleta E Glynatsi & Owen Campbell, 2017. "Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-33, December.

Handle: RePEc:plo:pone00:0188046
DOI: 10.1371/journal.pone.0188046

Download full text from publisher

References listed on IDEAS

Banks, Jeffrey S. & Sundaram, Rangarajan K., 1990. "Repeated games, finite automata, and complexity," Games and Economic Behavior, Elsevier, vol. 2(2), pages 97-117, June.
- Banks, J.S. & Sundaram, R.K., 1989. "Repeated Games, Finite Automata, And Complexity," RCER Working Papers 183, University of Rochester - Center for Economic Research (RCER).
Christoph Adami & Arend Hintze, 2013. "Evolutionary instability of zero-determinant strategies demonstrates that winning is not everything," Nature Communications, Nature, vol. 4(1), pages 1-8, October.
Jonathan Bendor & Roderick M. Kramer & Suzanne Stout, 1991. "When in Doubt..," Journal of Conflict Resolution, Peace Science Society (International), vol. 35(4), pages 691-719, December.
Jiawei Li, 2007. "How to Design a Strategy to Win an IPD Tournament," World Scientific Book Chapters, in: The Iterated Prisoners' Dilemma 20 Years On, chapter 4, pages 89-104, World Scientific Publishing Co. Pte. Ltd..
Christian Hilbe & Martin A Nowak & Arne Traulsen, 2013. "Adaptive Dynamics of Extortion and Compliance," PLOS ONE, Public Library of Science, vol. 8(11), pages 1-9, November.
Nachbar, John H., 1992. "Evolution in the finitely repeated prisoner's dilemma," Journal of Economic Behavior & Organization, Elsevier, vol. 19(3), pages 307-326, December.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Molnar, Grant & Hammond, Caroline & Fu, Feng, 2023. "Reactive means in the iterated Prisoner’s dilemma," Applied Mathematics and Computation, Elsevier, vol. 458(C).
Ueda, Masahiko, 2023. "Memory-two strategies forming symmetric mutual reinforcement learning equilibrium in repeated prisoners’ dilemma game," Applied Mathematics and Computation, Elsevier, vol. 444(C).
Vincent Knight & Marc Harper & Nikoleta E Glynatsi & Owen Campbell, 2018. "Evolution reinforces cooperation with the emergence of self-recognition mechanisms: An empirical study of strategies in the Moran process for the iterated prisoner’s dilemma," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-33, October.
Usui, Yuki & Ueda, Masahiko, 2021. "Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner’s dilemma," Applied Mathematics and Computation, Elsevier, vol. 409(C).
Ding, Zhen-Wei & Zheng, Guo-Zhong & Cai, Chao-Ran & Cai, Wei-Ran & Chen, Li & Zhang, Ji-Qiang & Wang, Xu-Ming, 2023. "Emergence of cooperation in two-agent repeated games with reinforcement learning," Chaos, Solitons & Fractals, Elsevier, vol. 175(P1).

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Vincent Knight & Marc Harper & Nikoleta E Glynatsi & Owen Campbell, 2018. "Evolution reinforces cooperation with the emergence of self-recognition mechanisms: An empirical study of strategies in the Moran process for the iterated prisoner’s dilemma," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-33, October.
Masahiko Ueda & Toshiyuki Tanaka, 2020. "Linear algebraic structure of zero-determinant strategies in repeated games," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-13, April.
Christopher Lee & Marc Harper & Dashiell Fryer, 2015. "The Art of War: Beyond Memory-one Strategies in Population Games," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-16, March.
Samuelson, Larry, 1996. "Bounded rationality and game theory," The Quarterly Review of Economics and Finance, Elsevier, vol. 36(Supplemen), pages 17-35.
Amnon Rapoport & Darryl A Seale & Andrew M Colman, 2015. "Is Tit-for-Tat the Answer? On the Conclusions Drawn from Axelrod's Tournaments," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-11, July.
Yali Dong & Cong Li & Yi Tao & Boyu Zhang, 2015. "Evolution of Conformity in Social Dilemmas," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-12, September.
Christos Ioannou, 2014. "Coevolution of finite automata with errors," Journal of Evolutionary Economics, Springer, vol. 24(3), pages 541-571, July.
Taha, Mohammad A. & Ghoneim, Ayman, 2021. "Zero-determinant strategies in infinitely repeated three-player prisoner's dilemma game," Chaos, Solitons & Fractals, Elsevier, vol. 152(C).
Westhoff, Frank H. & Yarbrough, Beth V. & Yarbrough, Robert M., 1996. "Complexity, organization, and Stuart Kauffman's The Origins of Order," Journal of Economic Behavior & Organization, Elsevier, vol. 29(1), pages 1-25, January.
Yohsuke Murase & Seung Ki Baek, 2021. "Friendly-rivalry solution to the iterated n-person public-goods game," PLOS Computational Biology, Public Library of Science, vol. 17(1), pages 1-17, January.
Evans, Alecia & Sesmero, Juan, 2022. "Cooperation in Social Dilemmas with Correlated Noisy Payoffs: Theory and Experimental Evidence," 2021 Annual Meeting, August 1-3, Austin, Texas 322804, Agricultural and Applied Economics Association.
Olivier Compte & Andrew Postlewaite, 2007. "Effecting Cooperation," PIER Working Paper Archive 09-019, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 29 May 2009.
- Olivier Compte & Andrew Postlewaite, 2010. "Plausible Cooperation,Third Version," PIER Working Paper Archive 13-008, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 01 Dec 2012.
- Andrew Postlewaite & Olivier Compte, 2009. "Plausible Cooperation, Second Version," PIER Working Paper Archive 10-039, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 16 Dec 2010.
van Damme, E.E.C., 1995. "Game theory : The next stage," Other publications TiSEM 7779b0f9-bef5-45c7-ae6b-7, Tilburg University, School of Economics and Management.
- van Damme, E.E.C., 1999. "Game theory : The next stage," Other publications TiSEM 9b1f2bbf-2e19-42e7-894a-4, Tilburg University, School of Economics and Management.
- van Damme, E.E.C., 1995. "Game theory : The next stage," Discussion Paper 1995-73, Tilburg University, Center for Economic Research.
Ho, Teck-Hua, 1996. "Finite automata play repeated prisoner's dilemma with information processing costs," Journal of Economic Dynamics and Control, Elsevier, vol. 20(1-3), pages 173-207.
Hubie Chen, 2013. "Bounded rationality, strategy simplification, and equilibrium," International Journal of Game Theory, Springer;Game Theory Society, vol. 42(3), pages 593-611, August.
Fisman, Raymond & Khanna, Tarun, 1999. "Is trust a historical residue? Information flows and trust levels," Journal of Economic Behavior & Organization, Elsevier, vol. 38(1), pages 79-92, January.
Sarah C. Rice, 2012. "Reputation and Uncertainty in Online Markets: An Experimental Study," Information Systems Research, INFORMS, vol. 23(2), pages 436-452, June.
repec:cup:judgdm:v:1:y:2006:i::p:76-85 is not listed on IDEAS
Bart S. Vanneste & Douglas H. Frank, 2014. "Forgiveness in Vertical Relationships: Incentive and Termination Effects," Organization Science, INFORMS, vol. 25(6), pages 1807-1822, December.
Spiegler, Ran, 2004. "Simplicity of beliefs and delay tactics in a concession game," Games and Economic Behavior, Elsevier, vol. 47(1), pages 200-220, April.
- Ran Spiegler, 2003. "Simplicity of Beliefs and Delay Tactics in a Concession Game," Levine's Working Paper Archive 506439000000000208, David K. Levine.
Thomas Chadefaux & Dirk Helbing, 2012. "The Rationality of Prejudices," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-6, February.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0188046. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data