IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2509.15090.html
   My bibliography  Save this paper

Emergent Alignment via Competition

Author

Listed:
  • Natalie Collina
  • Surbhi Goel
  • Aaron Roth
  • Emily Ryu
  • Mirah Shi

Abstract

Aligning AI systems with human values remains a fundamental challenge, but does our inability to create perfectly aligned models preclude obtaining the benefits of alignment? We study a strategic setting where a human user interacts with multiple differently misaligned AI agents, none of which are individually well-aligned. Our key insight is that when the users utility lies approximately within the convex hull of the agents utilities, a condition that becomes easier to satisfy as model diversity increases, strategic competition can yield outcomes comparable to interacting with a perfectly aligned model. We model this as a multi-leader Stackelberg game, extending Bayesian persuasion to multi-round conversations between differently informed parties, and prove three results: (1) when perfect alignment would allow the user to learn her Bayes-optimal action, she can also do so in all equilibria under the convex hull condition (2) under weaker assumptions requiring only approximate utility learning, a non-strategic user employing quantal response achieves near-optimal utility in all equilibria and (3) when the user selects the best single AI after an evaluation period, equilibrium guarantees remain near-optimal without further distributional assumptions. We complement the theory with two sets of experiments.

Suggested Citation

  • Natalie Collina & Surbhi Goel & Aaron Roth & Emily Ryu & Mirah Shi, 2025. "Emergent Alignment via Competition," Papers 2509.15090, arXiv.org.
  • Handle: RePEc:arx:papers:2509.15090
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2509.15090
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Au, Pak Hung & Kawai, Keiichi, 2020. "Competitive information disclosure by multiple senders," Games and Economic Behavior, Elsevier, vol. 119(C), pages 56-78.
    2. Ronen Gradwohl & Niklas Hahn & Martin Hoefer & Rann Smorodinsky, 2022. "Reaping the Informational Surplus in Bayesian Persuasion," American Economic Journal: Microeconomics, American Economic Association, vol. 14(4), pages 296-317, November.
    3. Li, Fei & Norman, Peter, 2018. "On Bayesian persuasion with multiple senders," Economics Letters, Elsevier, vol. 170(C), pages 66-70.
    4. McKelvey Richard D. & Palfrey Thomas R., 1995. "Quantal Response Equilibria for Normal Form Games," Games and Economic Behavior, Elsevier, vol. 10(1), pages 6-38, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wu, Wenhao & Ye, Bohan, 2023. "Competition in persuasion: An experiment," Games and Economic Behavior, Elsevier, vol. 138(C), pages 72-89.
    2. Alp Atakan & Mehmet Ekmekci & Ludovic Renou, 2021. "Cross-verification and Persuasive Cheap Talk," Papers 2102.13562, arXiv.org, revised Apr 2021.
    3. Frédéric Koessler & Marie Laclau & Tristan Tomala, 2022. "Interactive Information Design," Mathematics of Operations Research, INFORMS, vol. 47(1), pages 153-175, February.
    4. Teddy Mekonnen & Bobak Pakzad-Hurson, 2024. "Competition, Persuasion, and Search," Papers 2411.11183, arXiv.org, revised Sep 2025.
    5. Atakan, Alp & Ekmekci, Mehmet & Renou, Ludovic, 2024. "Cross-verification and persuasive cheap talk," Journal of Economic Theory, Elsevier, vol. 222(C).
    6. Raphael Boleslavsky & Silvana Krasteva, 2025. "Limits of Disclosure in Search Markets," Papers 2506.06319, arXiv.org, revised Jun 2025.
    7. Niloufar Mirzavand Boroujeni & Krishnamurthy Iyer & William L. Cooper, 2025. "Decentralized Signaling Mechanisms," Papers 2504.14163, arXiv.org.
    8. Kemal Kivanc Akoz & Arseniy Samsonov, 2023. "Bargaining over information structures," Discussion Papers 2301, Budapest University of Technology and Economics, Quantitative Social and Management Sciences.
    9. Ronen Gradwohl & Niklas Hahn & Martin Hoefer & Rann Smorodinsky, 2020. "Reaping the Informational Surplus in Bayesian Persuasion," Papers 2006.02048, arXiv.org.
    10. Shih-Tang Su & Vijay G. Subramanian, 2022. "Order of Commitments in Bayesian Persuasion with Partial-informed Senders," Papers 2202.06479, arXiv.org.
    11. Ju Hu & Xi Weng, 2021. "Robust persuasion of a privately informed receiver," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 72(3), pages 909-953, October.
    12. Quan Li & Kang Rong, 2024. "Full disclosure in competitive Bayesian persuasion," International Journal of Game Theory, Springer;Game Theory Society, vol. 53(2), pages 525-545, June.
    13. Bosch-Domènech, Antoni & Vriend, Nicolaas J., 2013. "On the role of non-equilibrium focal points as coordination devices," Journal of Economic Behavior & Organization, Elsevier, vol. 94(C), pages 52-67.
    14. Kraemer, Carlo & Noth, Markus & Weber, Martin, 2006. "Information aggregation with costly information and random ordering: Experimental evidence," Journal of Economic Behavior & Organization, Elsevier, vol. 59(3), pages 423-432, March.
    15. Goeree, Jacob K. & Holt, Charles A. & Palfrey, Thomas R., 2002. "Quantal Response Equilibrium and Overbidding in Private-Value Auctions," Journal of Economic Theory, Elsevier, vol. 104(1), pages 247-272, May.
    16. Emmanuel Dechenaux & Dan Kovenock & Roman Sheremeta, 2015. "A survey of experimental research on contests, all-pay auctions and tournaments," Experimental Economics, Springer;Economic Science Association, vol. 18(4), pages 609-669, December.
    17. Steven N. Durlauf & Yannis M. Ioannides, 2010. "Social Interactions," Annual Review of Economics, Annual Reviews, vol. 2(1), pages 451-478, September.
    18. Marco Cipriani & Antonio Guarino, 2009. "Herd Behavior in Financial Markets: An Experiment with Financial Market Professionals," Journal of the European Economic Association, MIT Press, vol. 7(1), pages 206-233, March.
    19. Dutta, Rohan & Levine, David Knudsen & Modica, Salvatore, 2018. "Collusion constrained equilibrium," Theoretical Economics, Econometric Society, vol. 13(1), January.
    20. Ghidoni, Riccardo & Suetens, Sigrid, 2019. "Empirical Evidence on Repeated Sequential Games," Other publications TiSEM ff3a441f-e196-4e45-ba59-c, Tilburg University, School of Economics and Management.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2509.15090. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.