IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2304.12647.html
   My bibliography  Save this paper

Q-learning with biased policy rules

Author

Listed:
  • Olivier Compte

    (Paris School of Economics)

Abstract

In dynamic environments, Q-learning is an automaton that (i) provides estimates (Q-values) of the continuation values associated with each available action; and (ii) follows the naive policy of almost always choosing the action with highest Q-value. We consider a family of automata that are based on Q-values but whose policy may systematically favor some actions over others, for example through a bias that favors cooperation. In the spirit of Compte and Postlewaite [2018], we look for equilibrium biases within this family of Q-based automata. We examine classic games under various monitoring technologies and find that equilibrium biases may strongly foster collusion.

Suggested Citation

  • Olivier Compte, 2023. "Q-learning with biased policy rules," Papers 2304.12647, arXiv.org, revised Oct 2023.
  • Handle: RePEc:arx:papers:2304.12647
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2304.12647
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ignacio Esponda & Demian Pouzo, 2016. "Berk–Nash Equilibrium: A Framework for Modeling Agents With Misspecified Models," Econometrica, Econometric Society, vol. 84, pages 1093-1130, May.
    2. Drew Fudenberg & David K. Levine, 2008. "Reputation And Equilibrium Selection In Games With A Patient Player," World Scientific Book Chapters, in: Drew Fudenberg & David K Levine (ed.), A Long-Run Collaboration On Long-Run Games, chapter 7, pages 123-142, World Scientific Publishing Co. Pte. Ltd..
    3. Jehiel, Philippe, 2005. "Analogy-based expectation equilibrium," Journal of Economic Theory, Elsevier, vol. 123(2), pages 81-104, August.
    4. Drew Fudenberg & Eric Maskin, 2008. "The Folk Theorem In Repeated Games With Discounting Or With Incomplete Information," World Scientific Book Chapters, in: Drew Fudenberg & David K Levine (ed.), A Long-Run Collaboration On Long-Run Games, chapter 11, pages 209-230, World Scientific Publishing Co. Pte. Ltd..
    5. Sekiguchi, Tadashi, 1997. "Efficiency in Repeated Prisoner's Dilemma with Private Monitoring," Journal of Economic Theory, Elsevier, vol. 76(2), pages 345-361, October.
    6. Compte, Olivier, 2002. "On Sustaining Cooperation without Public Observations," Journal of Economic Theory, Elsevier, vol. 102(1), pages 106-150, January.
    7. Emilio Calvano & Giacomo Calzolari & Vincenzo Denicolò & Sergio Pastorello, 2020. "Artificial Intelligence, Algorithmic Pricing, and Collusion," American Economic Review, American Economic Association, vol. 110(10), pages 3267-3297, October.
    8. John Asker & Chaim Fershtman & Ariel Pakes, 2022. "Artificial Intelligence, Algorithm Design, and Pricing," AEA Papers and Proceedings, American Economic Association, vol. 112, pages 452-456, May.
    9. Herbert A. Simon, 1955. "A Behavioral Model of Rational Choice," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 69(1), pages 99-118.
    10. Martino Banchio & Andrzej Skrzypacz, 2022. "Artificial Intelligence and Auction Design," NBER Chapters, in: Economics of Artificial Intelligence, National Bureau of Economic Research, Inc.
    11. Zach Y. Brown & Alexander MacKay, 2023. "Competition in Pricing Algorithms," American Economic Journal: Microeconomics, American Economic Association, vol. 15(2), pages 109-156, May.
    12. Karandikar, Rajeeva & Mookherjee, Dilip & Ray, Debraj & Vega-Redondo, Fernando, 1998. "Evolving Aspirations and Cooperation," Journal of Economic Theory, Elsevier, vol. 80(2), pages 292-331, June.
    13. Osborne, Martin J & Rubinstein, Ariel, 1998. "Games with Procedurally Rational Players," American Economic Review, American Economic Association, vol. 88(4), pages 834-847, September.
    14. Piccione, Michele, 2002. "The Repeated Prisoner's Dilemma with Imperfect Private Monitoring," Journal of Economic Theory, Elsevier, vol. 102(1), pages 70-83, January.
    15. Martino Banchio & Giacomo Mantegazza, 2022. "Artificial Intelligence and Spontaneous Collusion," Papers 2202.05946, arXiv.org, revised Sep 2023.
    16. Takuo Sugaya, 2022. "Folk Theorem in Repeated Games with Private Monitoring," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 89(4), pages 2201-2256.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jehiel, Philippe & Samuelson, Larry, 2023. "The analogical foundations of cooperation," Journal of Economic Theory, Elsevier, vol. 208(C).
    2. Inkoo Cho & Noah Williams, 2024. "Collusive Outcomes Without Collusion," Papers 2403.07177, arXiv.org.
    3. Philippe Jehiel, 2022. "Analogy-Based Expectation Equilibrium and Related Concepts:Theory, Applications, and Beyond," Working Papers halshs-03735680, HAL.
    4. Carmona, Guilherme & Laohakunakorn, Krittanai, 2023. "The folk theorem for the prisoner's dilemma with endogenous private monitoring," Journal of Economic Theory, Elsevier, vol. 213(C).
    5. Kandori, Michihiro, 2002. "Introduction to Repeated Games with Private Monitoring," Journal of Economic Theory, Elsevier, vol. 102(1), pages 1-15, January.
    6. Martino Banchio & Andrzej Skrzypacz, 2022. "Artificial Intelligence and Auction Design," Papers 2202.05947, arXiv.org.
    7. Miyagawa, Eiichi & Miyahara, Yasuyuki & Sekiguchi, Tadashi, 2008. "The folk theorem for repeated games with observation costs," Journal of Economic Theory, Elsevier, vol. 139(1), pages 192-221, March.
    8. Ely, Jeffrey C. & Valimaki, Juuso, 2002. "A Robust Folk Theorem for the Prisoner's Dilemma," Journal of Economic Theory, Elsevier, vol. 102(1), pages 84-105, January.
    9. Olivier Compte & Andrew Postlewaite, 2007. "Effecting Cooperation," PIER Working Paper Archive 09-019, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 29 May 2009.
    10. Jehiel, Philippe, 2005. "Analogy-based expectation equilibrium," Journal of Economic Theory, Elsevier, vol. 123(2), pages 81-104, August.
    11. Gossner, Olivier & Hörner, Johannes, 2010. "When is the lowest equilibrium payoff in a repeated game equal to the minmax payoff?," Journal of Economic Theory, Elsevier, vol. 145(1), pages 63-84, January.
    12. Stanley Reiter, 1999. "Coordination of Economic Activity: An Example," Discussion Papers 1263, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
    13. Hörner, Johannes & Lovo, Stefano & Tomala, Tristan, 2011. "Belief-free equilibria in games with incomplete information: Characterization and existence," Journal of Economic Theory, Elsevier, vol. 146(5), pages 1770-1795, September.
    14. Martino Banchio & Giacomo Mantegazza, 2022. "Artificial Intelligence and Spontaneous Collusion," Papers 2202.05946, arXiv.org, revised Sep 2023.
    15. Yamamoto, Yuichi, 2009. "A limit characterization of belief-free equilibrium payoffs in repeated games," Journal of Economic Theory, Elsevier, vol. 144(2), pages 802-824, March.
    16. Marco Battaglini & Stephen Coate, 2008. "A Dynamic Theory of Public Spending, Taxation, and Debt," American Economic Review, American Economic Association, vol. 98(1), pages 201-236, March.
    17. Spiegler, Ran, 2021. "Modeling players with random “data access”," Journal of Economic Theory, Elsevier, vol. 198(C).
    18. Mira Frick & Ryota Iijima & Yuhta Ishii, 2018. "Dispersed Behavior and Perceptions in Assortative Societies," Cowles Foundation Discussion Papers 2128, Cowles Foundation for Research in Economics, Yale University.
    19. Mira Frick & Ryota Iijima & Yuhta Ishii, 2018. "Dispersed Behavior and Perceptions in Assortative Societies," Cowles Foundation Discussion Papers 2128R3, Cowles Foundation for Research in Economics, Yale University, revised Jun 2022.
    20. Nuh Aygün Dalkıran, 2016. "Order of limits in reputations," Theory and Decision, Springer, vol. 81(3), pages 393-411, September.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2304.12647. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.