Emergent Alignment via Competition

My bibliography Save this paper

Emergent Alignment via Competition

Author

Listed:

Natalie Collina
Surbhi Goel
Aaron Roth
Emily Ryu
Mirah Shi

Registered:

Abstract

Aligning AI systems with human values remains a fundamental challenge, but does our inability to create perfectly aligned models preclude obtaining the benefits of alignment? We study a strategic setting where a human user interacts with multiple differently misaligned AI agents, none of which are individually well-aligned. Our key insight is that when the users utility lies approximately within the convex hull of the agents utilities, a condition that becomes easier to satisfy as model diversity increases, strategic competition can yield outcomes comparable to interacting with a perfectly aligned model. We model this as a multi-leader Stackelberg game, extending Bayesian persuasion to multi-round conversations between differently informed parties, and prove three results: (1) when perfect alignment would allow the user to learn her Bayes-optimal action, she can also do so in all equilibria under the convex hull condition (2) under weaker assumptions requiring only approximate utility learning, a non-strategic user employing quantal response achieves near-optimal utility in all equilibria and (3) when the user selects the best single AI after an evaluation period, equilibrium guarantees remain near-optimal without further distributional assumptions. We complement the theory with two sets of experiments.

Suggested Citation

Natalie Collina & Surbhi Goel & Aaron Roth & Emily Ryu & Mirah Shi, 2025. "Emergent Alignment via Competition," Papers 2509.15090, arXiv.org.

Handle: RePEc:arx:papers:2509.15090

Download full text from publisher

References listed on IDEAS

Au, Pak Hung & Kawai, Keiichi, 2020. "Competitive information disclosure by multiple senders," Games and Economic Behavior, Elsevier, vol. 119(C), pages 56-78.
Ronen Gradwohl & Niklas Hahn & Martin Hoefer & Rann Smorodinsky, 2022. "Reaping the Informational Surplus in Bayesian Persuasion," American Economic Journal: Microeconomics, American Economic Association, vol. 14(4), pages 296-317, November.
Li, Fei & Norman, Peter, 2018. "On Bayesian persuasion with multiple senders," Economics Letters, Elsevier, vol. 170(C), pages 66-70.
McKelvey Richard D. & Palfrey Thomas R., 1995. "Quantal Response Equilibria for Normal Form Games," Games and Economic Behavior, Elsevier, vol. 10(1), pages 6-38, July.
- McKelvey, Richard D. & Palfrey, Thomas R., 1994. "Quantal Response Equilibria For Normal Form Games," Working Papers 883, California Institute of Technology, Division of the Humanities and Social Sciences.
- R. McKelvey & T. Palfrey, 2010. "Quantal Response Equilibria for Normal Form Games," Levine's Working Paper Archive 510, David K. Levine.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Wu, Wenhao & Ye, Bohan, 2023. "Competition in persuasion: An experiment," Games and Economic Behavior, Elsevier, vol. 138(C), pages 72-89.
Alp Atakan & Mehmet Ekmekci & Ludovic Renou, 2021. "Cross-verification and Persuasive Cheap Talk," Papers 2102.13562, arXiv.org, revised Apr 2021.
- Renou, Ludovic & Atakan, Alp & Ekmekci, Mehmet, 2021. "Cross-verification and Persuasive Cheap Talk," CEPR Discussion Papers 16801, C.E.P.R. Discussion Papers.
Frédéric Koessler & Marie Laclau & Tristan Tomala, 2022. "Interactive Information Design," Mathematics of Operations Research, INFORMS, vol. 47(1), pages 153-175, February.
- Tomala, Tristan & Koessler, Frederic & Laclau, Marie, 2018. "Interactive Information Design," HEC Research Papers Series 1260, HEC Paris, revised 02 May 2018.
- Frédéric Koessler & Marie Laclau & Tristan Tomala, 2021. "Interactive Information Design," PSE-Ecole d'économie de Paris (Postprint) halshs-01791918, HAL.
- Frédéric Koessler & Marie Laclau & Tristan Tomala, 2021. "Interactive Information Design," Post-Print halshs-01791918, HAL.
- Frédéric Koessler & Marie Laclau & Tristan Tomala, 2018. "Interactive Information Design," Working Papers hal-01933896, HAL.
Teddy Mekonnen & Bobak Pakzad-Hurson, 2024. "Competition, Persuasion, and Search," Papers 2411.11183, arXiv.org, revised Sep 2025.
Atakan, Alp & Ekmekci, Mehmet & Renou, Ludovic, 2024. "Cross-verification and persuasive cheap talk," Journal of Economic Theory, Elsevier, vol. 222(C).
Raphael Boleslavsky & Silvana Krasteva, 2025. "Limits of Disclosure in Search Markets," Papers 2506.06319, arXiv.org, revised Jun 2025.
Niloufar Mirzavand Boroujeni & Krishnamurthy Iyer & William L. Cooper, 2025. "Decentralized Signaling Mechanisms," Papers 2504.14163, arXiv.org.
Kemal Kivanc Akoz & Arseniy Samsonov, 2023. "Bargaining over information structures," Discussion Papers 2301, Budapest University of Technology and Economics, Quantitative Social and Management Sciences.
Ronen Gradwohl & Niklas Hahn & Martin Hoefer & Rann Smorodinsky, 2020. "Reaping the Informational Surplus in Bayesian Persuasion," Papers 2006.02048, arXiv.org.
Shih-Tang Su & Vijay G. Subramanian, 2022. "Order of Commitments in Bayesian Persuasion with Partial-informed Senders," Papers 2202.06479, arXiv.org.
Ju Hu & Xi Weng, 2021. "Robust persuasion of a privately informed receiver," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 72(3), pages 909-953, October.
Quan Li & Kang Rong, 2024. "Full disclosure in competitive Bayesian persuasion," International Journal of Game Theory, Springer;Game Theory Society, vol. 53(2), pages 525-545, June.
Bosch-Domènech, Antoni & Vriend, Nicolaas J., 2013. "On the role of non-equilibrium focal points as coordination devices," Journal of Economic Behavior & Organization, Elsevier, vol. 94(C), pages 52-67.
- Antoni Bosch-Domènech & Nicolaas J. Vriend, 2008. "On the role of non-equilibrium focal points as coordination devices," Economics Working Papers 1064, Department of Economics and Business, Universitat Pompeu Fabra.
- Antoni Bosch-Domènech & Nicolaas J. Vriend, 2008. "On the Role of Non-equilibrium Focal Points as Coordination Devices," Working Papers 621, Queen Mary University of London, School of Economics and Finance.
Kraemer, Carlo & Noth, Markus & Weber, Martin, 2006. "Information aggregation with costly information and random ordering: Experimental evidence," Journal of Economic Behavior & Organization, Elsevier, vol. 59(3), pages 423-432, March.
- Kraemer, Carlo & Nöth, Markus & Weber, Martin, 2000. "Information Aggregation with Costly Information and Random Ordering: Experimental Evidence," Sonderforschungsbereich 504 Publications 00-35, Sonderforschungsbereich 504, Universität Mannheim;Sonderforschungsbereich 504, University of Mannheim.
- Kraemer, Carlo & Nöth, Markus & Weber, Martin, 2000. "Information aggregation with costly information and random ordering : experimental evidence," Papers 00-35, Sonderforschungsbreich 504.
Goeree, Jacob K. & Holt, Charles A. & Palfrey, Thomas R., 2002. "Quantal Response Equilibrium and Overbidding in Private-Value Auctions," Journal of Economic Theory, Elsevier, vol. 104(1), pages 247-272, May.
- Palfrey, Thomas R. & Goeree, Jacob & Holt, Charles, 2000. "Quantal Response Equilibrium and Overbidding in Private-value Auctions," Working Papers 1073, California Institute of Technology, Division of the Humanities and Social Sciences.
- Jacob K. Goeree & Charles A. Holt & Thomas R. Palfrey, 2000. "Quantal Response Equilibrium and Overbidding in Private-Value Auctions," Virginia Economics Online Papers 345, University of Virginia, Department of Economics.
- Thomas Palfrey, 2002. "Quantal Response Equilibrium and Overbidding in Private Value Auctions," Theory workshop papers 357966000000000089, UCLA Department of Economics.
Emmanuel Dechenaux & Dan Kovenock & Roman Sheremeta, 2015. "A survey of experimental research on contests, all-pay auctions and tournaments," Experimental Economics, Springer;Economic Science Association, vol. 18(4), pages 609-669, December.
- Emmanuel Dechenaux & Dan Kovenock & Roman Sheremeta, 2012. "A Survey of Experimental Research on Contests, All-Pay Auctions and Tournaments," Working Papers 12-22, Chapman University, Economic Science Institute.
- Dechenaux, Emmanuel & Kovenock, Dan & Sheremeta, Roman, 2014. "A Survey of Experimental Research on Contests, All-Pay Auctions and Tournaments," MPRA Paper 59714, University Library of Munich, Germany.
- Dechenaux, Emmanuel & Kovenock, Dan & Sheremeta, Roman M., 2012. "A survey of experimental research on contests, all-pay auctions and tournaments," Discussion Papers, Research Professorship & Project "The Future of Fiscal Federalism" SP II 2012-109, WZB Berlin Social Science Center.
Steven N. Durlauf & Yannis M. Ioannides, 2010. "Social Interactions," Annual Review of Economics, Annual Reviews, vol. 2(1), pages 451-478, September.
- Steven N. Durlauf & Yannis M. Ioannides, 2009. "Social Interactions," Discussion Papers Series, Department of Economics, Tufts University 0739, Department of Economics, Tufts University.
Marco Cipriani & Antonio Guarino, 2009. "Herd Behavior in Financial Markets: An Experiment with Financial Market Professionals," Journal of the European Economic Association, MIT Press, vol. 7(1), pages 206-233, March.
- Marco Cipriani & Antonio Guarino, 2008. "Herd Behavior in Financial Markets: An Experiment with Financial Market Professionals," IMF Working Papers 2008/141, International Monetary Fund.
- Marco Cipriani & Antonio Guarino, 2008. "Herd Behavior in Financial Markets: An Experiment with Financial Market Professionals," Working Papers 2009-16, The George Washington University, Institute for International Economic Policy.
- Antonio Guarino & Marco Cipriani, 2008. "Herd Behavior in Financial Markets: An Experiment with Financial Market Professionals," WEF Working Papers 0047, ESRC World Economy and Finance Research Programme, Birkbeck, University of London.
Dutta, Rohan & Levine, David Knudsen & Modica, Salvatore, 2018. "Collusion constrained equilibrium," Theoretical Economics, Econometric Society, vol. 13(1), January.
- Rohan Dutta & David K Levine & Salvatore Modica, 2016. "Collusion Constrained Equilibrium," Levine's Working Paper Archive 786969000000001288, David K. Levine.
Ghidoni, Riccardo & Suetens, Sigrid, 2019. "Empirical Evidence on Repeated Sequential Games," Other publications TiSEM ff3a441f-e196-4e45-ba59-c, Tilburg University, School of Economics and Management.
- Suetens, Sigrid & Ghidoni, Riccardo, 2019. "Empirical evidence on repeated sequential games," CEPR Discussion Papers 13809, C.E.P.R. Discussion Papers.
- Ghidoni, Riccardo & Suetens, Sigrid, 2019. "Empirical Evidence on Repeated Sequential Games," Discussion Paper 2019-016, Tilburg University, Center for Economic Research.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-GTH-2025-10-20 (Game Theory)
NEP-MIC-2025-10-20 (Microeconomics)
NEP-UPT-2025-10-20 (Utility Models and Prospect Theory)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2509.15090. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Emergent Alignment via Competition

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data