Q-learning with biased policy rules

My bibliography Save this paper

Q-learning with biased policy rules

Author

Listed:

Olivier Compte
(Paris School of Economics)

Registered:

Abstract

In dynamic environments, Q-learning is an automaton that (i) provides estimates (Q-values) of the continuation values associated with each available action; and (ii) follows the naive policy of almost always choosing the action with highest Q-value. We consider a family of automata that are based on Q-values but whose policy may systematically favor some actions over others, for example through a bias that favors cooperation. In the spirit of Compte and Postlewaite [2018], we look for equilibrium biases within this family of Q-based automata. We examine classic games under various monitoring technologies and find that equilibrium biases may strongly foster collusion.

Suggested Citation

Olivier Compte, 2023. "Q-learning with biased policy rules," Papers 2304.12647, arXiv.org, revised Oct 2023.

Handle: RePEc:arx:papers:2304.12647

Download full text from publisher

References listed on IDEAS

Ignacio Esponda & Demian Pouzo, 2016. "Berk–Nash Equilibrium: A Framework for Modeling Agents With Misspecified Models," Econometrica, Econometric Society, vol. 84, pages 1093-1130, May.
Drew Fudenberg & David K. Levine, 2008. "Reputation And Equilibrium Selection In Games With A Patient Player," World Scientific Book Chapters, in: Drew Fudenberg & David K Levine (ed.), A Long-Run Collaboration On Long-Run Games, chapter 7, pages 123-142, World Scientific Publishing Co. Pte. Ltd..
- Fudenberg, Drew & Levine, David K, 1989. "Reputation and Equilibrium Selection in Games with a Patient Player," Econometrica, Econometric Society, vol. 57(4), pages 759-778, July.
- Drew Fudenberg & David Levine, 1987. "Reputation and Equilibrium Selection in Games With a Patient Player," Working papers 461, Massachusetts Institute of Technology (MIT), Department of Economics.
- Drew Fudenberg & David K. Levine, 1995. "Reputation and Equilibrium Selection in Games with a Patient Player," Levine's Working Paper Archive 103, David K. Levine.
- D. Fudenberg & David K. Levine, 1989. "Reputation and Equilibrium Selection in Games with a Patient Player," Levine's Working Paper Archive 508, David K. Levine.
Jehiel, Philippe, 2005. "Analogy-based expectation equilibrium," Journal of Economic Theory, Elsevier, vol. 123(2), pages 81-104, August.
- Philippe Jeniel, 2001. "Analogy-Based Expectation Equilibrium," Economics Working Papers 0003, Institute for Advanced Study, School of Social Science.
- Philippe Jehiel, 2005. "Analogy-Based Expectation Equilibrium," Levine's Bibliography 784828000000000106, UCLA Department of Economics.
- Philippe Jehiel, 2005. "Analogy-based Expectation Equilibrium," Post-Print halshs-00754070, HAL.
Drew Fudenberg & Eric Maskin, 2008. "The Folk Theorem In Repeated Games With Discounting Or With Incomplete Information," World Scientific Book Chapters, in: Drew Fudenberg & David K Levine (ed.), A Long-Run Collaboration On Long-Run Games, chapter 11, pages 209-230, World Scientific Publishing Co. Pte. Ltd..
- Fudenberg, Drew & Maskin, Eric, 1986. "The Folk Theorem in Repeated Games with Discounting or with Incomplete Information," Econometrica, Econometric Society, vol. 54(3), pages 533-554, May.
Sekiguchi, Tadashi, 1997. "Efficiency in Repeated Prisoner's Dilemma with Private Monitoring," Journal of Economic Theory, Elsevier, vol. 76(2), pages 345-361, October.
Compte, Olivier, 2002. "On Sustaining Cooperation without Public Observations," Journal of Economic Theory, Elsevier, vol. 102(1), pages 106-150, January.
Emilio Calvano & Giacomo Calzolari & Vincenzo Denicolò & Sergio Pastorello, 2020. "Artificial Intelligence, Algorithmic Pricing, and Collusion," American Economic Review, American Economic Association, vol. 110(10), pages 3267-3297, October.
- Calzolari, Giacomo & Calvano, Emilio & Denicolo, Vincenzo & Pastorello, Sergio, 2018. "Artificial intelligence, algorithmic pricing and collusion," CEPR Discussion Papers 13405, C.E.P.R. Discussion Papers.
John Asker & Chaim Fershtman & Ariel Pakes, 2022. "Artificial Intelligence, Algorithm Design, and Pricing," AEA Papers and Proceedings, American Economic Association, vol. 112, pages 452-456, May.
Herbert A. Simon, 1955. "A Behavioral Model of Rational Choice," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 69(1), pages 99-118.
Martino Banchio & Andrzej Skrzypacz, 2022. "Artificial Intelligence and Auction Design," NBER Chapters, in: Economics of Artificial Intelligence, National Bureau of Economic Research, Inc.
- Martino Banchio & Andrzej Skrzypacz, 2022. "Artificial Intelligence and Auction Design," Papers 2202.05947, arXiv.org.
Zach Y. Brown & Alexander MacKay, 2023. "Competition in Pricing Algorithms," American Economic Journal: Microeconomics, American Economic Association, vol. 15(2), pages 109-156, May.
- Zach Y. Brown & Alexander MacKay, 2021. "Competition in Pricing Algorithms," NBER Working Papers 28860, National Bureau of Economic Research, Inc.
Karandikar, Rajeeva & Mookherjee, Dilip & Ray, Debraj & Vega-Redondo, Fernando, 1998. "Evolving Aspirations and Cooperation," Journal of Economic Theory, Elsevier, vol. 80(2), pages 292-331, June.
- Debraj Ray & Dilip Mookherjee & Fernando Vega Redondo & Rajeeva L. Karandikar, 1996. "Evolving aspirations and cooperation," Working Papers. Serie AD 1996-06, Instituto Valenciano de Investigaciones Económicas, S.A. (Ivie).
Osborne, Martin J & Rubinstein, Ariel, 1998. "Games with Procedurally Rational Players," American Economic Review, American Economic Association, vol. 88(4), pages 834-847, September.
- Martin J. Osborne & Ariel Rubinstein, 1997. "Games with Procedurally Rational Players," Department of Economics Working Papers 1997-02, McMaster University.
- Osborne, M-J & Rubinstein, A, 1997. "Games with Procedurally Rational Players," Papers 4-97, Tel Aviv.
Piccione, Michele, 2002. "The Repeated Prisoner's Dilemma with Imperfect Private Monitoring," Journal of Economic Theory, Elsevier, vol. 102(1), pages 70-83, January.
Martino Banchio & Giacomo Mantegazza, 2022. "Artificial Intelligence and Spontaneous Collusion," Papers 2202.05946, arXiv.org, revised Sep 2023.
Takuo Sugaya, 2022. "Folk Theorem in Repeated Games with Private Monitoring [Collusion in Dynamic Bertrand Oligopoly with Correlated Private Signals and Communication]," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 89(4), pages 2201-2256.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Jehiel, Philippe & Samuelson, Larry, 2023. "The analogical foundations of cooperation," Journal of Economic Theory, Elsevier, vol. 208(C).
- Philippe Jehiel & Larry Samuelson, 2022. "The Analogical Foundations of Cooperation," PSE Working Papers halshs-03754101, HAL.
- Philippe Jehiel & Larry Samuelson, 2022. "The Analogical Foundations of Cooperation," Working Papers halshs-03754101, HAL.
- Philippe Jehiel & Larry Samuelson, 2023. "The analogical foundations of cooperation," Post-Print halshs-04331552, HAL.
- Philippe Jehiel & Larry Samuelson, 2023. "The analogical foundations of cooperation," PSE-Ecole d'économie de Paris (Postprint) halshs-04331552, HAL.
Inkoo Cho & Noah Williams, 2024. "Collusive Outcomes Without Collusion," Papers 2403.07177, arXiv.org.
Philippe Jehiel, 2022. "Analogy-Based Expectation Equilibrium and Related Concepts:Theory, Applications, and Beyond," Working Papers halshs-03735680, HAL.
- Philippe Jehiel, 2022. "Analogy-Based Expectation Equilibrium and Related Concepts:Theory, Applications, and Beyond," PSE Working Papers halshs-03735680, HAL.
Kandori, Michihiro, 2002. "Introduction to Repeated Games with Private Monitoring," Journal of Economic Theory, Elsevier, vol. 102(1), pages 1-15, January.
- Michihiro Kandori, 2001. "Introduction to Repeated Games with Private Monitoring," CIRJE F-Series CIRJE-F-114, CIRJE, Faculty of Economics, University of Tokyo.
Martino Banchio & Andrzej Skrzypacz, 2022. "Artificial Intelligence and Auction Design," Papers 2202.05947, arXiv.org.
Miyagawa, Eiichi & Miyahara, Yasuyuki & Sekiguchi, Tadashi, 2008. "The folk theorem for repeated games with observation costs," Journal of Economic Theory, Elsevier, vol. 139(1), pages 192-221, March.
- Eiichi Miyagawa & Yasuyuki Miyahara & Tadashi Sekiguchi, 2004. "The Folk Theorem for Repeated Games with Observation Costs," KIER Working Papers 597, Kyoto University, Institute of Economic Research.
- Yasuyuki Miyahara & Tadashi Sekiguchi & Eiichi Miyagawa, 2007. "The Folk Theorem for Repeated Games with Observation Costs," 2007 Meeting Papers 751, Society for Economic Dynamics.
Ely, Jeffrey C. & Valimaki, Juuso, 2002. "A Robust Folk Theorem for the Prisoner's Dilemma," Journal of Economic Theory, Elsevier, vol. 102(1), pages 84-105, January.
- Jeffrey C. Ely & Juuso Valimaki, 1999. "A Robust Folk Theorem for the Prisoner's Dilemma," Discussion Papers 1264, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
- Jeffrey Ely, 2000. "A Robust Folk Theorem for the Prisoners' Dilemma," Econometric Society World Congress 2000 Contributed Papers 0210, Econometric Society.
Olivier Compte & Andrew Postlewaite, 2007. "Effecting Cooperation," PIER Working Paper Archive 09-019, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 29 May 2009.
- Olivier Compte & Andrew Postlewaite, 2010. "Plausible Cooperation,Third Version," PIER Working Paper Archive 13-008, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 01 Dec 2012.
- Andrew Postlewaite & Olivier Compte, 2009. "Plausible Cooperation, Second Version," PIER Working Paper Archive 10-039, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 16 Dec 2010.
Jehiel, Philippe, 2005. "Analogy-based expectation equilibrium," Journal of Economic Theory, Elsevier, vol. 123(2), pages 81-104, August.
- Philippe Jeniel, 2001. "Analogy-Based Expectation Equilibrium," Economics Working Papers 0003, Institute for Advanced Study, School of Social Science.
- Philippe Jehiel, 2005. "Analogy-Based Expectation Equilibrium," Levine's Bibliography 784828000000000106, UCLA Department of Economics.
- Philippe Jehiel, 2005. "Analogy-based Expectation Equilibrium," Post-Print halshs-00754070, HAL.
Gossner, Olivier & Hörner, Johannes, 2010. "When is the lowest equilibrium payoff in a repeated game equal to the minmax payoff?," Journal of Economic Theory, Elsevier, vol. 145(1), pages 63-84, January.
- Olivier Gossner & Johannes Hörner, 2010. "When is the lowest equilibrium payoff in a repeated game equal to the minmax payoff?," Post-Print halshs-00754488, HAL.
Stanley Reiter, 1999. "Coordination of Economic Activity: An Example," Discussion Papers 1263, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
Hörner, Johannes & Lovo, Stefano & Tomala, Tristan, 2011. "Belief-free equilibria in games with incomplete information: Characterization and existence," Journal of Economic Theory, Elsevier, vol. 146(5), pages 1770-1795, September.
- Stefano Lovo & Tristan Tomala & Johannes Hörner, 2008. "Belief-free equilibria in games with incomplete information: characterization and existence," Working Papers hal-00489877, HAL.
- Stefano Lovo & Johannes Hörner & Tristan Tomala, 2011. "Belief-free equilibria in games with incomplete information: characterization and existence," Post-Print hal-00630299, HAL.
- Johannes Horner & Stefano Lovo & Tristan Tomala, 2009. "Belief-free Equilibria in Games with Incomplete Information: Characterization and Existence," Cowles Foundation Discussion Papers 1739, Cowles Foundation for Research in Economics, Yale University.
- Lovo, Stefano & Tomala, Tristan & Hörner, Johannes, 2009. "Belief-free equilibria in games with incomplete information: characterization and existence," HEC Research Papers Series 921, HEC Paris.
Compte, Olivier & Postlewaite, Andrew, 2015. "Plausible cooperation," Games and Economic Behavior, Elsevier, vol. 91(C), pages 45-59.
- Olivier Compte & Andrew Postlewaite, 2015. "Plausible cooperation," Post-Print halshs-01204780, HAL.
- Olivier Compte & Andrew Postlewaite, 2015. "Plausible cooperation," PSE - Labex "OSE-Ouvrir la Science Economique" halshs-01204780, HAL.
- Olivier Compte & Andrew Postlewaite, 2015. "Plausible cooperation," PSE-Ecole d'économie de Paris (Postprint) halshs-01204780, HAL.
Martino Banchio & Giacomo Mantegazza, 2022. "Artificial Intelligence and Spontaneous Collusion," Papers 2202.05946, arXiv.org, revised Sep 2023.
Yamamoto, Yuichi, 2009. "A limit characterization of belief-free equilibrium payoffs in repeated games," Journal of Economic Theory, Elsevier, vol. 144(2), pages 802-824, March.
Marco Battaglini & Stephen Coate, 2008. "A Dynamic Theory of Public Spending, Taxation, and Debt," American Economic Review, American Economic Association, vol. 98(1), pages 201-236, March.
- Marco Battaglini & Stephen Coate, 2006. "A Dynamic Theory of Public Spending, Taxation and Debt," NBER Working Papers 12100, National Bureau of Economic Research, Inc.
- Stephen Coate & Marco Battaglini, 2007. "A Dynamic Theory of Public Spending, Taxation and Debt," 2007 Meeting Papers 573, Society for Economic Dynamics.
- Marco Battaglini & Stephen Coate, 2007. "A Dynamic Theory of Public Spending, Taxation and Debt," Discussion Papers 1441, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
- Marco Battaglini & Stephen Coate, 2006. "A Dynamic Theory of Public Spending, Taxation and Debt," NajEcon Working Paper Reviews 321307000000000026, www.najecon.org.
- Marco Battaglini & Steve Coate, 2006. "A Dynamic Theory of Public Spending, Taxation and Debt," Levine's Bibliography 122247000000001094, UCLA Department of Economics.
- Battaglini, Marco & Coate, Stephen, 2007. "A Dynamic Theory of Public Spending, Taxation and Debt," Working Papers 07-04, Cornell University, Center for Analytic Economics.
Spiegler, Ran, 2021. "Modeling players with random “data access”," Journal of Economic Theory, Elsevier, vol. 198(C).
Mira Frick & Ryota Iijima & Yuhta Ishii, 2018. "Dispersed Behavior and Perceptions in Assortative Societies," Cowles Foundation Discussion Papers 2128, Cowles Foundation for Research in Economics, Yale University.
- Mira Frick & Ryota Iijima & Yuhta Ishii, 2018. "Dispersed Behavior and Perceptions in Assortative Societies," Cowles Foundation Discussion Papers 2128R2, Cowles Foundation for Research in Economics, Yale University, revised Oct 2021.
- Mira Frick & Ryota Iijima & Yuhta Ishii, 2018. "Dispersed Behavior and Perceptions in Assortative Societies," Cowles Foundation Discussion Papers 2128R, Cowles Foundation for Research in Economics, Yale University, revised Mar 2019.
- Frick, Mira & , & Ishii, Yuhta, 2021. "Dispersed Behavior and Perceptions in Assortative Societies," CEPR Discussion Papers 16789, C.E.P.R. Discussion Papers.
Mira Frick & Ryota Iijima & Yuhta Ishii, 2018. "Dispersed Behavior and Perceptions in Assortative Societies," Cowles Foundation Discussion Papers 2128R3, Cowles Foundation for Research in Economics, Yale University, revised Jun 2022.
Nuh Aygün Dalkıran, 2016. "Order of limits in reputations," Theory and Decision, Springer, vol. 81(3), pages 393-411, September.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-DES-2023-05-29 (Economic Design)
NEP-GTH-2023-05-29 (Game Theory)
NEP-MIC-2023-05-29 (Microeconomics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2304.12647. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Q-learning with biased policy rules

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data