Reinforcement learning in a prisoner's dilemma

My bibliography Save this article

Reinforcement learning in a prisoner's dilemma

Author

Listed:

Dolgopolov, Arthur

Registered:

Abstract

I characterize the outcomes of a class of model-free reinforcement learning algorithms, such as stateless Q-learning, in a prisoner's dilemma. The behavior is studied in the limit as players stop experimenting after sufficiently exploring their options. A closed form relationship between the learning rate and game payoffs reveals whether the players will learn to cooperate or defect. The findings have implications for algorithmic collusion and also apply to asymmetric learners with different experimentation rules.

Suggested Citation

Dolgopolov, Arthur, 2024. "Reinforcement learning in a prisoner's dilemma," Games and Economic Behavior, Elsevier, vol. 144(C), pages 84-103.

Handle: RePEc:eee:gamebe:v:144:y:2024:i:c:p:84-103
DOI: 10.1016/j.geb.2024.01.004

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Calvano, Emilio & Calzolari, Giacomo & Denicolò, Vincenzo & Pastorello, Sergio, 2023. "Algorithmic collusion: Genuine or spurious?," International Journal of Industrial Organization, Elsevier, vol. 90(C).
Glenn Ellison, 2000. "Basins of Attraction, Long-Run Stochastic Stability, and the Speed of Step-by-Step Evolution," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 67(1), pages 17-45.
Jonathan Newton, 2018. "Evolutionary Game Theory: A Renaissance," Games, MDPI, vol. 9(2), pages 1-67, May.
Erev, Ido & Roth, Alvin E, 1998. "Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria," American Economic Review, American Economic Association, vol. 88(4), pages 848-881, September.
Heinrich H. Nax, 2019. "Uncoupled Aspiration Adaptation Dynamics Into the Core," German Economic Review, Verein für Socialpolitik, vol. 20(2), pages 243-256, May.
Newton, Jonathan & Sawa, Ryoji, 2015. "A one-shot deviation principle for stability in matching problems," Journal of Economic Theory, Elsevier, vol. 157(C), pages 1-27.
- Newton, Jonathan & Sawa, Ryoji, 2013. "A one-shot deviation principle for stability in matching problems," Working Papers 2013-09, University of Sydney, School of Economics, revised Jul 2014.
Calvano, Emilio & Calzolari, Giacomo & Denicoló, Vincenzo & Pastorello, Sergio, 2021. "Algorithmic collusion with imperfect monitoring," International Journal of Industrial Organization, Elsevier, vol. 79(C).
- Calzolari, Giacomo & Calvano, Emilio & Denicolo, Vincenzo & Pastorello, Sergio, 2021. "Algorithmic collusion with imperfect monitoring," CEPR Discussion Papers 15738, C.E.P.R. Discussion Papers.
Arnoud V. den Boer & Janusz M. Meylahn & Maarten Pieter Schinkel, 2022. "Artificial Collusion: Examining Supracompetitive Pricing by Q-learning Algorithms," Tinbergen Institute Discussion Papers 22-067/VII, Tinbergen Institute.
Ennio Bilancini & Leonardo Boncinelli, 2020. "The evolution of conventions under condition-dependent mistakes," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 69(2), pages 497-521, March.
- Ennio Bilancini & Leonardo Boncinelli, 2016. "The Evolution of Conventions under Condition-Dependent Mistakes," Working Papers - Economics wp2016_11.rdf, Universita' degli Studi di Firenze, Dipartimento di Scienze per l'Economia e l'Impresa.
Matthias Hettich, 2021. "Algorithmic Collusion: Insights from Deep Learning," CQE Working Papers 9421, Center for Quantitative Economics (CQE), University of Muenster.
Stephanie Assad & Robert Clark & Daniel Ershov & Lei Xu, 2022. "Identifying Algorithmic Pricing Technology Adoption in Retail Gasoline Markets," AEA Papers and Proceedings, American Economic Association, vol. 112, pages 457-460, May.
Young, H Peyton, 1993. "The Evolution of Conventions," Econometrica, Econometric Society, vol. 61(1), pages 57-84, January.
Sergiu Hart & Andreu Mas-Colell, 2013. "Uncoupled Dynamics Do Not Lead To Nash Equilibrium," World Scientific Book Chapters, in: Simple Adaptive Strategies From Regret-Matching to Uncoupled Dynamics, chapter 7, pages 153-163, World Scientific Publishing Co. Pte. Ltd..
- Sergiu Hart & Andreu Mas-Colell, 2003. "Uncoupled Dynamics Do Not Lead to Nash Equilibrium," American Economic Review, American Economic Association, vol. 93(5), pages 1830-1836, December.
Emilio Calvano & Giacomo Calzolari & Vincenzo Denicolò & Sergio Pastorello, 2020. "Artificial Intelligence, Algorithmic Pricing, and Collusion," American Economic Review, American Economic Association, vol. 110(10), pages 3267-3297, October.
- Calzolari, Giacomo & Calvano, Emilio & Denicolo, Vincenzo & Pastorello, Sergio, 2018. "Artificial intelligence, algorithmic pricing and collusion," CEPR Discussion Papers 13405, C.E.P.R. Discussion Papers.
Roth, Alvin E. & Erev, Ido, 1995. "Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term," Games and Economic Behavior, Elsevier, vol. 8(1), pages 164-212.
Milgrom, Paul & Roberts, John, 1990. "Rationalizability, Learning, and Equilibrium in Games with Strategic Complementarities," Econometrica, Econometric Society, vol. 58(6), pages 1255-1277, November.
Matthias Blonski & Peter Ockenfels & Giancarlo Spagnolo, 2011. "Equilibrium Selection in the Repeated Prisoner's Dilemma: Axiomatic Approach and Experimental Evidence," American Economic Journal: Microeconomics, American Economic Association, vol. 3(3), pages 164-192, August.
, P. & , Peyton, 2006. "Regret testing: learning to play Nash equilibrium without knowing you have an opponent," Theoretical Economics, Econometric Society, vol. 1(3), pages 341-367, September.
Waltman, Ludo & Kaymak, Uzay, 2008. "Q-learning agents in a Cournot oligopoly model," Journal of Economic Dynamics and Control, Elsevier, vol. 32(10), pages 3275-3293, October.
Mengel, Friederike, 2014. "Learning by (limited) forward looking players," Journal of Economic Behavior & Organization, Elsevier, vol. 108(C), pages 59-77.
- Mengel, F., 2008. "Learning by (limited) forward looking players," Research Memorandum 053, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
Heinrich Nax & Bary Pradelski, 2015. "Evolutionary dynamics and equitable core selection in assignment games," International Journal of Game Theory, Springer;Game Theory Society, vol. 44(4), pages 903-932, November.
John Asker & Chaim Fershtman & Ariel Pakes, 2021. "Artificial Intelligence and Pricing: The Impact of Algorithm Design," NBER Working Papers 28535, National Bureau of Economic Research, Inc.
- Fershtman, Chaim & Asker, John & Pakes, Ariel, 2021. "Artificial intelligence and Pricing: The Impact of Algorithm Design," CEPR Discussion Papers 15880, C.E.P.R. Discussion Papers.
John Asker & Chaim Fershtman & Ariel Pakes, 2022. "Artificial Intelligence, Algorithm Design, and Pricing," AEA Papers and Proceedings, American Economic Association, vol. 112, pages 452-456, May.
Bilancini, Ennio & Boncinelli, Leonardo & Nax, Heinrich H., 2021. "What noise matters? Experimental evidence for stochastic deviations in social norms," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 90(C).
Joseph E Harrington, 2018. "Developing Competition Law For Collusion By Autonomous Artificial Agents," Journal of Competition Law and Economics, Oxford University Press, vol. 14(3), pages 331-363.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Lv, Shaojie & Li, Jiaying & Zhao, Changheng, 2025. "Reinforcement learning in spatial public goods games with environmental feedbacks," Chaos, Solitons & Fractals, Elsevier, vol. 195(C).
Zexin Ye, 2025. "Algorithmic Collusion under Observed Demand Shocks," Papers 2502.15084, arXiv.org, revised May 2025.
Zhang Xu & Wei Zhao, 2024. "On Mechanism Underlying Algorithmic Collusion," Papers 2409.01147, arXiv.org.
Abada, Ibrahim & Lambin, Xavier & Tchakarov, Nikolay, 2024. "Collusion by mistake: Does algorithmic sophistication drive supra-competitive profits?," European Journal of Operational Research, Elsevier, vol. 318(3), pages 927-953.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Jonathan Newton, 2018. "Evolutionary Game Theory: A Renaissance," Games, MDPI, vol. 9(2), pages 1-67, May.
Bilancini, Ennio & Boncinelli, Leonardo & Newton, Jonathan, 2020. "Evolution and Rawlsian social choice in matching," Games and Economic Behavior, Elsevier, vol. 123(C), pages 68-80.
Zhang Xu & Wei Zhao, 2024. "On Mechanism Underlying Algorithmic Collusion," Papers 2409.01147, arXiv.org.
Nax, Heinrich H., 2015. "Equity dynamics in bargaining without information exchange," LSE Research Online Documents on Economics 65426, London School of Economics and Political Science, LSE Library.
Heinrich Nax, 2015. "Equity dynamics in bargaining without information exchange," Journal of Evolutionary Economics, Springer, vol. 25(5), pages 1011-1026, November.
Lucila Porto, 2022. "Q-Learning algorithms in a Hotelling model," Asociación Argentina de Economía Política: Working Papers 4587, Asociación Argentina de Economía Política.
Emilio Calvano & Giacomo Calzolari & Vincenzo Denicolò & Sergio Pastorello, 2019. "Algorithmic Pricing What Implications for Competition Policy?," Review of Industrial Organization, Springer;The Industrial Organization Society, vol. 55(1), pages 155-171, August.
Epivent, Andréa & Lambin, Xavier, 2024. "On algorithmic collusion and reward–punishment schemes," Economics Letters, Elsevier, vol. 237(C).
John Asker & Chaim Fershtman & Ariel Pakes, 2024. "The impact of artificial intelligence design on pricing," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 33(2), pages 276-304, March.
Sawa, Ryoji, 2021. "A stochastic stability analysis with observation errors in normal form games," Games and Economic Behavior, Elsevier, vol. 129(C), pages 570-589.
Abada, Ibrahim & Lambin, Xavier & Tchakarov, Nikolay, 2024. "Collusion by mistake: Does algorithmic sophistication drive supra-competitive profits?," European Journal of Operational Research, Elsevier, vol. 318(3), pages 927-953.
Mäs, Michael & Nax, Heinrich H., 2016. "A behavioral study of “noise” in coordination games," LSE Research Online Documents on Economics 65422, London School of Economics and Political Science, LSE Library.
Eugenio Vicario, 2021. "Imitation and Local Interactions: Long Run Equilibrium Selection," Games, MDPI, vol. 12(2), pages 1-19, April.
Sawa, Ryoji, 2019. "Stochastic stability under logit choice in coalitional bargaining problems," Games and Economic Behavior, Elsevier, vol. 113(C), pages 633-650.
Sawa, Ryoji & Wu, Jiabin, 2018. "Reference-dependent preferences, super-dominance and stochastic stability," Journal of Mathematical Economics, Elsevier, vol. 78(C), pages 96-104.
Mäs, Michael & Nax, Heinrich H., 2016. "A behavioral study of “noise” in coordination games," Journal of Economic Theory, Elsevier, vol. 162(C), pages 195-208.
Heinrich Nax & Bary Pradelski, 2015. "Evolutionary dynamics and equitable core selection in assignment games," International Journal of Game Theory, Springer;Game Theory Society, vol. 44(4), pages 903-932, November.
Nax, Heinrich H. & Pradelski, Bary S. R., 2015. "Evolutionary dynamics and equitable core selection in assignment games," LSE Research Online Documents on Economics 65428, London School of Economics and Political Science, LSE Library.
Maria Montero & Alex Possajennikov, 2021. "An Adaptive Model of Demand Adjustment in Weighted Majority Games," Games, MDPI, vol. 13(1), pages 1-17, December.
- Maria Montero & Alex Possajennikov, 2021. "An Adaptive Model of Demand Adjustment in Weighted Majority Games," Discussion Papers 2021-06, The Centre for Decision Research and Experimental Economics, School of Economics, University of Nottingham.
Jean-François Laslier & Bernard Walliser, 2015. "Stubborn learning," Theory and Decision, Springer, vol. 79(1), pages 51-93, July.
- Jean-François Laslier & Bernard Walliser, 2011. "Stubborn Learning," Working Papers hal-00609501, HAL.
- Jean-François Laslier & Bernard Walliser, 2011. "Stubborn Learning," PSE Working Papers hal-00609501, HAL.
- Jean-François Laslier & Bernard Walliser, 2015. "Stubborn learning," Post-Print halshs-01310229, HAL.
- Jean-François Laslier & Bernard Walliser, 2015. "Stubborn learning," PSE-Ecole d'économie de Paris (Postprint) halshs-01310229, HAL.

More about this item

Keywords

; ; ; ; ;

JEL classification:

C72 - Mathematical and Quantitative Methods - - Game Theory and Bargaining Theory - - - Noncooperative Games
C73 - Mathematical and Quantitative Methods - - Game Theory and Bargaining Theory - - - Stochastic and Dynamic Games; Evolutionary Games
D43 - Microeconomics - - Market Structure, Pricing, and Design - - - Oligopoly and Other Forms of Market Imperfection
D83 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Search; Learning; Information and Knowledge; Communication; Belief; Unawareness
L41 - Industrial Organization - - Antitrust Issues and Policies - - - Monopolization; Horizontal Anticompetitive Practices

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:gamebe:v:144:y:2024:i:c:p:84-103. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/622836 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Reinforcement learning in a prisoner's dilemma

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

JEL classification:

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data