IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1908.01135.html
   My bibliography  Save this paper

Multiplayer Bandit Learning, from Competition to Cooperation

Author

Listed:
  • Simina Br^anzei
  • Yuval Peres

Abstract

The stochastic multi-armed bandit model captures the tradeoff between exploration and exploitation. We study the effects of competition and cooperation on this tradeoff. Suppose there are $k$ arms and two players, Alice and Bob. In every round, each player pulls an arm, receives the resulting reward, and observes the choice of the other player but not their reward. Alice's utility is $\Gamma_A + \lambda \Gamma_B$ (and similarly for Bob), where $\Gamma_A$ is Alice's total reward and $\lambda \in [-1, 1]$ is a cooperation parameter. At $\lambda = -1$ the players are competing in a zero-sum game, at $\lambda = 1$, they are fully cooperating, and at $\lambda = 0$, they are neutral: each player's utility is their own reward. The model is related to the economics literature on strategic experimentation, where usually players observe each other's rewards. With discount factor $\beta$, the Gittins index reduces the one-player problem to the comparison between a risky arm, with a prior $\mu$, and a predictable arm, with success probability $p$. The value of $p$ where the player is indifferent between the arms is the Gittins index $g = g(\mu,\beta) > m$, where $m$ is the mean of the risky arm. We show that competing players explore less than a single player: there is $p^* \in (m, g)$ so that for all $p > p^*$, the players stay at the predictable arm. However, the players are not myopic: they still explore for some $p > m$. On the other hand, cooperating players explore more than a single player. We also show that neutral players learn from each other, receiving strictly higher total rewards than they would playing alone, for all $ p\in (p^*, g)$, where $p^*$ is the threshold from the competing case. Finally, we show that competing and neutral players eventually settle on the same arm in every Nash equilibrium, while this can fail for cooperating players.

Suggested Citation

  • Simina Br^anzei & Yuval Peres, 2019. "Multiplayer Bandit Learning, from Competition to Cooperation," Papers 1908.01135, arXiv.org, revised Jan 2024.
  • Handle: RePEc:arx:papers:1908.01135
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1908.01135
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Aoyagi, Masaki, 1998. "Mutual Observability and the Convergence of Actions in a Multi-Person Two-Armed Bandit Model," Journal of Economic Theory, Elsevier, vol. 82(2), pages 405-424, October.
    2. Nicolas Klein & Sven Rady, 2011. "Negatively Correlated Bandits," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 78(2), pages 693-732.
    3. Godfrey Keller & Sven Rady & Martin Cripps, 2005. "Strategic Experimentation with Exponential Bandits," Econometrica, Econometric Society, vol. 73(1), pages 39-68, January.
    4. Drew Fudenberg & David Levine, 2008. "Subgame–Perfect Equilibria of Finite– and Infinite–Horizon Games," World Scientific Book Chapters, in: Drew Fudenberg & David K Levine (ed.), A Long-Run Collaboration On Long-Run Games, chapter 1, pages 3-20, World Scientific Publishing Co. Pte. Ltd..
    5. Dinah Rosenberg & Eilon Solan & Nicolas Vieille, 2007. "Social Learning in One-Arm Bandit Problems," Econometrica, Econometric Society, vol. 75(6), pages 1591-1611, November.
    6. David Besanko & Jianjun Wu, 2013. "The Impact of Market Structure and Learning on the Tradeoff between R&D Competition and Cooperation," Journal of Industrial Economics, Wiley Blackwell, vol. 61(1), pages 166-201, March.
    7. Kremer, Ilan & Mansour, Yishay & Perry, Motty, 2013. "Implementing the "Wisdom of the Crowd"," Economic Research Papers 270435, University of Warwick - Department of Economics.
    8. Patrick Bolton & Christopher Harris, 1999. "Strategic Experimentation," Econometrica, Econometric Society, vol. 67(2), pages 349-374, March.
    9. Martin A. Nowak & Corina E. Tarnita & Edward O. Wilson, 2010. "The evolution of eusociality," Nature, Nature, vol. 466(7310), pages 1057-1062, August.
    10. Rothschild, Michael, 1974. "A two-armed bandit theory of market pricing," Journal of Economic Theory, Elsevier, vol. 9(2), pages 185-202, October.
    11. Heidhues, Paul & Rady, Sven & Strack, Philipp, 2015. "Strategic experimentation with private payoffs," Journal of Economic Theory, Elsevier, vol. 159(PA), pages 531-551.
    12. Ilan Kremer & Yishay Mansour & Motty Perry, 2014. "Implementing the "Wisdom of the Crowd"," Journal of Political Economy, University of Chicago Press, vol. 122(5), pages 988-1012.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Heidhues, Paul & Rady, Sven & Strack, Philipp, 2015. "Strategic experimentation with private payoffs," Journal of Economic Theory, Elsevier, vol. 159(PA), pages 531-551.
    2. Kaustav Das & Nicolas Klein & Katharina Schmid, 2020. "Strategic experimentation with asymmetric players," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 69(4), pages 1147-1175, June.
    3. Rosenberg, Dinah & Salomon, Antoine & Vieille, Nicolas, 2013. "On games of strategic experimentation," Games and Economic Behavior, Elsevier, vol. 82(C), pages 31-51.
    4. Camargo, Braz, 2014. "Learning in society," Games and Economic Behavior, Elsevier, vol. 87(C), pages 381-396.
    5. Klein, Nicolas, 2013. "Strategic learning in teams," Games and Economic Behavior, Elsevier, vol. 82(C), pages 636-657.
    6. Gustavo Manso & Farzad Pourbabaee, 2022. "The Impact of Connectivity on the Production and Diffusion of Knowledge," Papers 2202.00729, arXiv.org.
    7. Johannes Hoelzemann & Nicolas Klein, 2021. "Bandits in the lab," Quantitative Economics, Econometric Society, vol. 12(3), pages 1021-1051, July.
    8. Keller, Godfrey & Novák, Vladimír & Willems, Tim, 2019. "A note on optimal experimentation under risk aversion," Journal of Economic Theory, Elsevier, vol. 179(C), pages 476-487.
    9. Kaustav Das, 2014. "Strategic Experimentation with Competition and Private Arrival of Information," Discussion Papers 1404, University of Exeter, Department of Economics.
    10. , & ,, 2010. "Strategic experimentation with Poisson bandits," Theoretical Economics, Econometric Society, vol. 5(2), May.
    11. Besanko, David & Tong, Jian & Wu, Jianjun, 2016. "Subsidizing research programs with "if" and "when" uncertainty in the face of severe informational constraints," Discussion Paper Series In Economics And Econometrics 1605, Economics Division, School of Social Sciences, University of Southampton.
    12. Deimen, Inga & Wirtz, Julia, 2022. "Control, cost, and confidence: Perseverance and procrastination in the face of failure," Games and Economic Behavior, Elsevier, vol. 134(C), pages 52-74.
    13. Caroline D. Thomas, 2021. "Strategic Experimentation with Congestion," American Economic Journal: Microeconomics, American Economic Association, vol. 13(1), pages 1-82, February.
    14. Xie, Yinxi & Xie, Yang, 2017. "Machiavellian experimentation," Journal of Comparative Economics, Elsevier, vol. 45(4), pages 685-711.
    15. Fudenberg, Drew & He, Kevin, 2021. "Player-compatible learning and player-compatible equilibrium," Journal of Economic Theory, Elsevier, vol. 194(C).
    16. Wagner, Peter A. & Klein, Nicolas, 2022. "Strategic investment and learning with private information," Journal of Economic Theory, Elsevier, vol. 204(C).
    17. Boyarchenko, Svetlana, 2021. "Inefficiency of sponsored research," Journal of Mathematical Economics, Elsevier, vol. 95(C).
    18. Chen, Chia-Hui & Ishida, Junichiro, 2018. "Hierarchical experimentation," Journal of Economic Theory, Elsevier, vol. 177(C), pages 365-404.
    19. Nicolas Klein & Sven Rady, 2011. "Negatively Correlated Bandits," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 78(2), pages 693-732.
    20. Svetlana Boyarchenko, 2020. "Super- and submodularity of stopping games with random observations," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 70(4), pages 983-1022, November.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1908.01135. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.