Policy search with rare significant events: Choosing the right partner to cooperate with

My bibliography Save this article

Policy search with rare significant events: Choosing the right partner to cooperate with

Author

Listed:

Paul Ecoffet
Nicolas Fontbonne
Jean-Baptiste André
Nicolas Bredeche

Registered:

Abstract

This paper focuses on a class of reinforcement learning problems where significant events are rare and limited to a single positive reward per episode. A typical example is that of an agent who has to choose a partner to cooperate with, while a large number of partners are simply not interested in cooperating, regardless of what the agent has to offer. We address this problem in a continuous state and action space with two different kinds of search methods: a gradient policy search method and a direct policy search method using an evolution strategy. We show that when significant events are rare, gradient information is also scarce, making it difficult for policy gradient search methods to find an optimal policy, with or without a deep neural architecture. On the other hand, we show that direct policy search methods are invariant to the rarity of significant events, which is yet another confirmation of the unique role evolutionary algorithms has to play as a reinforcement learning method.

Suggested Citation

Paul Ecoffet & Nicolas Fontbonne & Jean-Baptiste André & Nicolas Bredeche, 2022. "Policy search with rare significant events: Choosing the right partner to cooperate with," PLOS ONE, Public Library of Science, vol. 17(4), pages 1-18, April.

Handle: RePEc:plo:pone00:0266841
DOI: 10.1371/journal.pone.0266841

Download full text from publisher

References listed on IDEAS

John M. McNamara & Zoltan Barta & Lutz Fromhage & Alasdair I. Houston, 2008. "The coevolution of choosiness and cooperation," Nature, Nature, vol. 451(7175), pages 189-192, January.
Jorgen W. Weibull, 1997. "Evolutionary Game Theory," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262731215, December.
Drew Fudenberg & David K. Levine, 1998. "The Theory of Learning in Games," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262061945, December.
- Drew Fudenberg & David K. Levine, 1996. "The Theory of Learning in Games," Levine's Working Paper Archive 624, David K. Levine.
repec:fth:iniesr:487 is not listed on IDEAS
repec:hhs:iuiwop:487 is not listed on IDEAS

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Sandholm,W.H., 2003. "Excess payoff dynamics, potential dynamics, and stable games," Working papers 5, Wisconsin Madison - Social Systems.
- Bill Sandholm, 2003. "Excess Payoff Dynamics, Potential Dynamics, and Stable Games," Theory workshop papers 505798000000000042, UCLA Department of Economics.
Michel BenaÔm & J–rgen W. Weibull, 2003. "Deterministic Approximation of Stochastic Evolution in Games," Econometrica, Econometric Society, vol. 71(3), pages 873-903, May.
- Benaim, Michel & Weibull, Jörgen W., 2000. "Deterministic Approximation of Stochastic Evolution in Games," Working Paper Series 534, Research Institute of Industrial Economics, revised 30 Oct 2001.
Nobuyuki Hanaki, 2007. "Individual and Social Learning," Computational Economics, Springer;Society for Computational Economics, vol. 29(3), pages 421-421, May.
- Nobuyuki Hanaki, 2005. "Individual and Social Learning," Computational Economics, Springer;Society for Computational Economics, vol. 26(3), pages 31-50, November.
Alexander Aurell & Gustav Karreskog, 2020. "Stochastic Stability of a Recency Weighted Sampling Dynamic," Papers 2009.12910, arXiv.org, revised Jun 2021.
Kyle Hyndman & Antoine Terracol & Jonathan Vaksmann, 2009. "Learning and sophistication in coordination games," Experimental Economics, Springer;Economic Science Association, vol. 12(4), pages 450-472, December.
- Kyle Hydman & Antoine Terracol & Jonathan Vaksmann, 2009. "Learning and Sophistication in Coordination Games," Post-Print hal-00607232, HAL.
- Kyle Hydman & Antoine Terracol & Jonathan Vaksmann, 2009. "Learning and Sophistication in Coordination Games," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-00607232, HAL.
- Kyle Hydman & Antoine Terracol & Jonathan Vaksmann, 2009. "Learning and Sophistication in Coordination Games," PSE-Ecole d'économie de Paris (Postprint) hal-00607232, HAL.
Francesco Squintani, 1999. "Moral Hazard," Discussion Papers 1269, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
Alger, Ingela, 2022. "Evolutionarily stable preferences," TSE Working Papers 22-1355, Toulouse School of Economics (TSE), revised Dec 2022.
- Ingela Alger, 2023. "Evolutionarily stable preferences," Working Papers hal-03929518, HAL.
- Ingela Alger, 2022. "Evolutionarily stable preferences," Working Papers hal-03770354, HAL.
- Alger, Ingela, 2022. "Evolutionarily stable preferences," IAST Working Papers 22-144, Institute for Advanced Study in Toulouse (IAST), revised Dec 2022.
- Ingela Alger, 2023. "Evolutionarily stable preferences," Post-Print hal-04042260, HAL.
repec:wvu:wpaper:10-18 is not listed on IDEAS
Antonio Cabrales & Roberto Serrano, 2007. "Implemetation in Adaptive Better-Response Dynamics," Working Papers wp2007_0708, CEMFI.
- Roberto Serrano & Antonio Cabrales, 2007. "Implementation in Adaptive Better-Response Dynamics," Working Papers 2007-10, Brown University, Department of Economics.
- Cabrales, Antonio & Serrano, Roberto, 2007. "Implementation in adaptive better-response dynamics," UC3M Working papers. Economics we075731, Universidad Carlos III de Madrid. Departamento de EconomÃa.
- Antonio Cabrales & Roberto Serrano, 2007. "Implementation in adaptive better-response dynamics," Working Papers 2007-16, Instituto Madrileño de Estudios Avanzados (IMDEA) Ciencias Sociales.
John P. Conley & Myrna Wooders, 2005. "Memetics & Voting: How Nature May Make us Public Spirited," Vanderbilt University Department of Economics Working Papers 0514, Vanderbilt University Department of Economics.
Mengel, Friederike, 2012. "Learning across games," Games and Economic Behavior, Elsevier, vol. 74(2), pages 601-619.
- Friederike Mengel, 2007. "Learning Across Games," Working Papers. Serie AD 2007-05, Instituto Valenciano de Investigaciones Económicas, S.A. (Ivie).
Hofbauer,J. & Sandholm,W.H., 2001. "Evolution and learning in games with randomly disturbed payoffs," Working papers 5, Wisconsin Madison - Social Systems.
- Josef Hofbauer & William H. Sandholm, 2001. "Evolution and Learning in Games with Randomly Disturbed Payoffs," Vienna Economics Papers vie0205, University of Vienna, Department of Economics.
Ingela Alger & Laurent Lehmann, 2023. "Evolution of Semi-Kantian Preferences in Two-Player Assortative Interactions with Complete and Incomplete Information and Plasticity," Dynamic Games and Applications, Springer, vol. 13(4), pages 1288-1319, December.
- Alger, Ingela & Lehmann, Laurent, 2023. "Evolution of semi-Kantian preferences in two-player assortative interactions with complete and incomplete information and plasticity," TSE Working Papers 23-1405, Toulouse School of Economics (TSE), revised May 2023.
- Laurent Lehmann & Ingela Alger, 2023. "Evolution of semi-Kantian preferences in two-player assortative interactions with complete and incomplete information and plasticity," Working Papers hal-04141955, HAL.
- Alger, Ingela & Lehmann, Laurent, 2023. "Evolution of semi-Kantian preferences in two-player assortative interactions with complete and incomplete information and plasticity," IAST Working Papers 23-148, Institute for Advanced Study in Toulouse (IAST), revised May 2023.
- Ingela Alger & Laurent Lehmann, 2023. "Evolution of semi-kantian preferences in two-player assortative interactions with complete and incomplete information and plasticity," Post-Print hal-04378838, HAL.
Jean Rabanal & Daniel Friedman, 2014. "Incomplete Information, Dynamic Stability and the Evolution of Preferences: Two Examples," Dynamic Games and Applications, Springer, vol. 4(4), pages 448-467, December.
Dufwenberg, Martin & Gneezy, Uri, 2000. "Price competition and market concentration: an experimental study," International Journal of Industrial Organization, Elsevier, vol. 18(1), pages 7-22, January.
- Dufwenberg, M. & Gneezy, U., 1998. "Price competition and market concentration : An experimental study," Other publications TiSEM deaedded-143d-4998-8a6e-7, Tilburg University, School of Economics and Management.
- Dufwenberg, Martin & Gneezy, Uri, 1998. "Price Competition and Market Concentration: An Experimental Study," Working Paper Series 1998:8, Uppsala University, Department of Economics.
- Dufwenberg, M. & Gneezy, U., 1998. "Price competition and market concentration : An experimental study," Discussion Paper 1998-27, Tilburg University, Center for Economic Research.
- Dufwenberg, M. & Gneezy, U., 1998. "Price Competition and Market COncentration: An Experimental Study," Papers 1998-08, Uppsala - Working Paper Series.
- Dufwenberg, Martin & Gneezy, Uri, 1999. "Price Competition and Market Concentration: An experimental Study," Research Papers in Economics 1999:4, Stockholm University, Department of Economics.
Hofbauer,J. & Sandholm,W.H., 2001. "Evolution and learning in games with randomly disturbed payoffs," Working papers 5, Wisconsin Madison - Social Systems.
- Josef Hofbauer & William H. Sandholm, 2001. "Evolution and Learning in Games with Randomly Disturbed Payoffs," Vienna Economics Papers 0205, University of Vienna, Department of Economics.
Marc Harper & Dashiell Fryer, 2015. "Lyapunov Functions for Time-Scale Dynamics on Riemannian Geometries of the Simplex," Dynamic Games and Applications, Springer, vol. 5(3), pages 318-333, September.
Fanelli, Domenico, 2010. "The Role of Socially Concerned Consumers in the Coexistence of Ethical and Standard Firms," MPRA Paper 20117, University Library of Munich, Germany.
Hart, Sergiu, 2002. "Evolutionary dynamics and backward induction," Games and Economic Behavior, Elsevier, vol. 41(2), pages 227-264, November.
- Sergiu Hart, 1999. "Evolutionary Dynamics and Backward Induction," Game Theory and Information 9905002, University Library of Munich, Germany, revised 23 Mar 2000.
DeMichelis, Stefano & Dhillon, Amrita, "undated". "Learning in Elections and Voter Turnout Equilibria," Economic Research Papers 269378, University of Warwick - Department of Economics.
- DeMichelis, Stefano & Dhillon, Amrita, 2001. "Learning in elections and voter turnout equilibria," The Warwick Economics Research Paper Series (TWERPS) 608, University of Warwick, Department of Economics.
Berger, Ulrich & Hofbauer, Josef, 2006. "Irrational behavior in the Brown-von Neumann-Nash dynamics," Games and Economic Behavior, Elsevier, vol. 56(1), pages 1-6, July.
- Ulrich Berger & Josef Hofbauer, 2004. "Irrational behavior in the Brown-von Neumann-Nash dynamics," Game Theory and Information 0409002, University Library of Munich, Germany, revised 09 Sep 2004.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0266841. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Policy search with rare significant events: Choosing the right partner to cooperate with

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data