IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0266841.html
   My bibliography  Save this article

Policy search with rare significant events: Choosing the right partner to cooperate with

Author

Listed:
  • Paul Ecoffet
  • Nicolas Fontbonne
  • Jean-Baptiste André
  • Nicolas Bredeche

Abstract

This paper focuses on a class of reinforcement learning problems where significant events are rare and limited to a single positive reward per episode. A typical example is that of an agent who has to choose a partner to cooperate with, while a large number of partners are simply not interested in cooperating, regardless of what the agent has to offer. We address this problem in a continuous state and action space with two different kinds of search methods: a gradient policy search method and a direct policy search method using an evolution strategy. We show that when significant events are rare, gradient information is also scarce, making it difficult for policy gradient search methods to find an optimal policy, with or without a deep neural architecture. On the other hand, we show that direct policy search methods are invariant to the rarity of significant events, which is yet another confirmation of the unique role evolutionary algorithms has to play as a reinforcement learning method.

Suggested Citation

  • Paul Ecoffet & Nicolas Fontbonne & Jean-Baptiste André & Nicolas Bredeche, 2022. "Policy search with rare significant events: Choosing the right partner to cooperate with," PLOS ONE, Public Library of Science, vol. 17(4), pages 1-18, April.
  • Handle: RePEc:plo:pone00:0266841
    DOI: 10.1371/journal.pone.0266841
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0266841
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0266841&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0266841?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John M. McNamara & Zoltan Barta & Lutz Fromhage & Alasdair I. Houston, 2008. "The coevolution of choosiness and cooperation," Nature, Nature, vol. 451(7175), pages 189-192, January.
    2. Jorgen W. Weibull, 1997. "Evolutionary Game Theory," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262731215, December.
    3. Drew Fudenberg & David K. Levine, 1998. "The Theory of Learning in Games," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262061945, December.
    4. repec:fth:iniesr:487 is not listed on IDEAS
    5. repec:hhs:iuiwop:487 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tom Johnston & Michael Savery & Alex Scott & Bassel Tarbush, 2023. "Game Connectivity and Adaptive Dynamics," Papers 2309.10609, arXiv.org, revised Oct 2024.
    2. Waters, George A., 2009. "Chaos in the cobweb model with a new learning dynamic," Journal of Economic Dynamics and Control, Elsevier, vol. 33(6), pages 1201-1216, June.
    3. Sandholm,W.H., 2003. "Excess payoff dynamics, potential dynamics, and stable games," Working papers 5, Wisconsin Madison - Social Systems.
    4. Michael Foley & Rory Smead & Patrick Forber & Christoph Riedl, 2021. "Avoiding the bullies: The resilience of cooperation among unequals," PLOS Computational Biology, Public Library of Science, vol. 17(4), pages 1-19, April.
    5. Michel BenaÔm & J–rgen W. Weibull, 2003. "Deterministic Approximation of Stochastic Evolution in Games," Econometrica, Econometric Society, vol. 71(3), pages 873-903, May.
    6. Nobuyuki Hanaki, 2007. "Individual and Social Learning," Computational Economics, Springer;Society for Computational Economics, vol. 29(3), pages 421-421, May.
    7. Xie, Fang & Horan, Richard D., 2008. "Disease and Behavioral Dynamics for Brucellosis in Elk and Cattle in the Greater Yellowstone Area," 2008 Annual Meeting, July 27-29, 2008, Orlando, Florida 6404, American Agricultural Economics Association (New Name 2008: Agricultural and Applied Economics Association).
    8. Alexander Aurell & Gustav Karreskog, 2020. "Stochastic Stability of a Recency Weighted Sampling Dynamic," Papers 2009.12910, arXiv.org, revised Jun 2021.
    9. Kyle Hyndman & Antoine Terracol & Jonathan Vaksmann, 2009. "Learning and sophistication in coordination games," Experimental Economics, Springer;Economic Science Association, vol. 12(4), pages 450-472, December.
    10. Berger, Ulrich & Hofbauer, Josef, 2006. "Irrational behavior in the Brown-von Neumann-Nash dynamics," Games and Economic Behavior, Elsevier, vol. 56(1), pages 1-6, July.
    11. Francesco Squintani, 1999. "Moral Hazard," Discussion Papers 1269, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
    12. Griffin, Christopher & Mummah, Riley & deForest, Russ, 2021. "A finite population destroys a traveling wave in spatial replicator dynamics," Chaos, Solitons & Fractals, Elsevier, vol. 146(C).
    13. Alger, Ingela, 2022. "Evolutionarily stable preferences," TSE Working Papers 22-1355, Toulouse School of Economics (TSE), revised Dec 2022.
    14. repec:wvu:wpaper:10-18 is not listed on IDEAS
    15. Antonio Cabrales & Roberto Serrano, 2007. "Implemetation in Adaptive Better-Response Dynamics," Working Papers wp2007_0708, CEMFI.
    16. Veller, Carl & Hayward, Laura K., 2016. "Finite-population evolution with rare mutations in asymmetric games," Journal of Economic Theory, Elsevier, vol. 162(C), pages 93-113.
    17. John P. Conley & Myrna Wooders, 2005. "Memetics & Voting: How Nature May Make us Public Spirited," Vanderbilt University Department of Economics Working Papers 0514, Vanderbilt University Department of Economics.
    18. Mengel, Friederike, 2012. "Learning across games," Games and Economic Behavior, Elsevier, vol. 74(2), pages 601-619.
    19. Hofbauer,J. & Sandholm,W.H., 2001. "Evolution and learning in games with randomly disturbed payoffs," Working papers 5, Wisconsin Madison - Social Systems.
    20. Satoshi Kawanishi, 2000. "Relative Performance Evaluations in a Model of Financial Intermediation," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 3(4), pages 801-830, October.
    21. Ingela Alger & Laurent Lehmann, 2023. "Evolution of Semi-Kantian Preferences in Two-Player Assortative Interactions with Complete and Incomplete Information and Plasticity," Dynamic Games and Applications, Springer, vol. 13(4), pages 1288-1319, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0266841. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.