IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v618y2023ics0378437123002546.html
   My bibliography  Save this article

A reinforcement learning-based strategy updating model for the cooperative evolution

Author

Listed:
  • Wang, Xianjia
  • Yang, Zhipeng
  • Liu, Yanli
  • Chen, Guici

Abstract

The emergence of cooperation between competing agents has been commonly studied through evolutionary games, but such cooperation often requires a mechanism or a third party to be activated and kept alive. To investigate how a mechanism affects the evolution of cooperation, this paper proposes an innovative reinforcement learning-based strategy updating model. The model consists of two symmetrical sets of convolutional neural networks. Besides, the agents’ strategies updating rules are defined: firstly, the agents learn and predict the environment and the behaviors of neighboring agents, then estimate their future payoffs based on this information, and finally determine their strategies based on these estimated payoffs. Through investigating the behavior characteristics and the stable states of the network for highly intelligent agents with memory learning and prediction ability in the evolution of the prisoner’s dilemma game, the results demonstrate that the game initiators who adopt the mixed optimal payoff approach can increase the number of cooperators and facilitate “global cooperation” and “repaying kindness with kindness”. Although the temptation factor has little effect on the population, increasing the discount factor can expand the scale of the cooperative cluster and even achieve dynamic stability. Additionally, a smaller size of minibatch is beneficial for the evolution of cooperation in a smaller experience replay pool. A larger size of minibatch is more conducive to the evolution of cooperation with an increasing capacity of the experience replay pool. This research provides a novel perspective from reinforcement learning to understand the evolution of cooperation.

Suggested Citation

  • Wang, Xianjia & Yang, Zhipeng & Liu, Yanli & Chen, Guici, 2023. "A reinforcement learning-based strategy updating model for the cooperative evolution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 618(C).
  • Handle: RePEc:eee:phsmap:v:618:y:2023:i:c:s0378437123002546
    DOI: 10.1016/j.physa.2023.128699
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437123002546
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2023.128699?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wang, Xianjia & Lv, Shaojie, 2019. "The roles of particle swarm intelligence in the prisoner’s dilemma based on continuous and mixed strategy systems on scale-free networks," Applied Mathematics and Computation, Elsevier, vol. 355(C), pages 213-220.
    2. Oriol Vinyals & Igor Babuschkin & Wojciech M. Czarnecki & Michaël Mathieu & Andrew Dudzik & Junyoung Chung & David H. Choi & Richard Powell & Timo Ewalds & Petko Georgiev & Junhyuk Oh & Dan Horgan & M, 2019. "Grandmaster level in StarCraft II using multi-agent reinforcement learning," Nature, Nature, vol. 575(7782), pages 350-354, November.
    3. Wang, Xianjia & Lv, Shaojie & Quan, Ji, 2017. "The evolution of cooperation in the Prisoner’s Dilemma and the Snowdrift game based on Particle Swarm Optimization," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 482(C), pages 286-295.
    4. Pan, Jianchen & Zhang, Lan & Han, Wenchen & Huang, Changwei, 2023. "Heterogeneous investment promotes cooperation in spatial public goods game on hypergraphs," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 609(C).
    5. Hisashi Ohtsuki & Christoph Hauert & Erez Lieberman & Martin A. Nowak, 2006. "A simple rule for the evolution of cooperation on graphs and social networks," Nature, Nature, vol. 441(7092), pages 502-505, May.
    6. Kelsey R. McDonald & William F. Broderick & Scott A. Huettel & John M. Pearson, 2019. "Bayesian nonparametric models characterize instantaneous strategies in a competitive dynamic game," Nature Communications, Nature, vol. 10(1), pages 1-12, December.
    7. Abhijit Gosavi, 2009. "Reinforcement Learning: A Tutorial Survey and Recent Advances," INFORMS Journal on Computing, INFORMS, vol. 21(2), pages 178-192, May.
    8. Martin A. Nowak & Akira Sasaki & Christine Taylor & Drew Fudenberg, 2004. "Emergence of cooperation and evolutionary stability in finite populations," Nature, Nature, vol. 428(6983), pages 646-650, April.
    9. Ren, Guangming & Wang, Xingyuan, 2014. "Robustness of cooperation in memory-based prisoner’s dilemma game on a square lattice," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 408(C), pages 40-46.
    10. Zhu, Peican & Wang, Xiaoyu & Jia, Danyang & Guo, Yangming & Li, Shudong & Chu, Chen, 2020. "Investigating the co-evolution of node reputation and edge-strategy in prisoner's dilemma game," Applied Mathematics and Computation, Elsevier, vol. 386(C).
    11. Lee, Hyun-Rok & Lee, Taesik, 2021. "Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response," European Journal of Operational Research, Elsevier, vol. 291(1), pages 296-308.
    12. Gao, Liyan & Pan, Qiuhui & He, Mingfeng, 2022. "Advanced defensive cooperators promote cooperation in the prisoner’s dilemma game," Chaos, Solitons & Fractals, Elsevier, vol. 155(C).
    13. Izquierdo, Luis R. & Izquierdo, Segismundo S. & Gotts, Nicholas M. & Polhill, J. Gary, 2007. "Transient and asymptotic dynamics of reinforcement learning in games," Games and Economic Behavior, Elsevier, vol. 61(2), pages 259-276, November.
    14. Zhen Wang & Marko Jusup & Lei Shi & Joung-Hun Lee & Yoh Iwasa & Stefano Boccaletti, 2018. "Exploiting a cognitive bias promotes cooperation in social dilemma experiments," Nature Communications, Nature, vol. 9(1), pages 1-7, December.
    15. Usui, Yuki & Ueda, Masahiko, 2021. "Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner’s dilemma," Applied Mathematics and Computation, Elsevier, vol. 409(C).
    16. Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
    17. Jia, Danyang & Li, Tong & Zhao, Yang & Zhang, Xiaoqin & Wang, Zhen, 2022. "Empty nodes affect conditional cooperation under reinforcement learning," Applied Mathematics and Computation, Elsevier, vol. 413(C).
    18. Han, Jia-Xu & Wang, Rui-Wu, 2023. "Complex interactions promote the frequency of cooperation in snowdrift game," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 609(C).
    19. Lv, Shaojie & Song, Feifei, 2022. "Particle swarm intelligence and the evolution of cooperation in the spatial public goods game with punishment," Applied Mathematics and Computation, Elsevier, vol. 412(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ding, Zhen-Wei & Zheng, Guo-Zhong & Cai, Chao-Ran & Cai, Wei-Ran & Chen, Li & Zhang, Ji-Qiang & Wang, Xu-Ming, 2023. "Emergence of cooperation in two-agent repeated games with reinforcement learning," Chaos, Solitons & Fractals, Elsevier, vol. 175(P1).
    2. Cheng, Jiangjiang & Mei, Wenjun & Su, Wei & Chen, Ge, 2023. "Evolutionary games on networks: Phase transition, quasi-equilibrium, and mathematical principles," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 611(C).
    3. Feng, Meiling & Li, Xuezhu & Zhao, Dawei & Xia, Chengyi, 2023. "Evolutionary dynamics with the second-order reputation in the networked N-player trust game," Chaos, Solitons & Fractals, Elsevier, vol. 175(P2).
    4. Wang, Xianjia & Chen, Wenman, 2020. "Evolutionary dynamics in spatial threshold public goods game with the asymmetric return rate mechanism," Chaos, Solitons & Fractals, Elsevier, vol. 136(C).
    5. Song, Shenpeng & Feng, Yuhao & Xu, Wenzhe & Li, Hui-Jia & Wang, Zhen, 2022. "Evolutionary prisoner’s dilemma game on signed networks based on structural balance theory," Chaos, Solitons & Fractals, Elsevier, vol. 164(C).
    6. Jia, Danyang & Li, Tong & Zhao, Yang & Zhang, Xiaoqin & Wang, Zhen, 2022. "Empty nodes affect conditional cooperation under reinforcement learning," Applied Mathematics and Computation, Elsevier, vol. 413(C).
    7. Xie, Yunya & Bai, Yu & Zhang, Yankun & Peng, Zhengyin, 2024. "Trust-induced cooperation under the complex interaction of networks and emotions," Chaos, Solitons & Fractals, Elsevier, vol. 182(C).
    8. Peng Liu & Haoxiang Xia, 2015. "Structure and evolution of co-authorship network in an interdisciplinary research field," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(1), pages 101-134, April.
    9. Zhao, Zhengwu & Zhang, Chunyan, 2023. "The mechanisms of labor division from the perspective of task urgency and game theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 630(C).
    10. Lessard, Sabin & Lahaie, Philippe, 2009. "Fixation probability with multiple alleles and projected average allelic effect on selection," Theoretical Population Biology, Elsevier, vol. 75(4), pages 266-277.
    11. Huang, Shaoxu & Liu, Xuesong & Hu, Yuhan & Fu, Xiao, 2023. "The influence of aggressive behavior on cooperation evolution in social dilemma," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 630(C).
    12. Wakano, Joe Yuichiro & Ohtsuki, Hisashi & Kobayashi, Yutaka, 2013. "A mathematical description of the inclusive fitness theory," Theoretical Population Biology, Elsevier, vol. 84(C), pages 46-55.
    13. Hao, Weijuan & Hu, Yuhan, 2024. "The implications of deep cooperation strategy for the evolution of cooperation in social dilemmas," Applied Mathematics and Computation, Elsevier, vol. 470(C).
    14. Dimitris Iliopoulos & Arend Hintze & Christoph Adami, 2010. "Critical Dynamics in the Evolution of Stochastic Strategies for the Iterated Prisoner's Dilemma," PLOS Computational Biology, Public Library of Science, vol. 6(10), pages 1-8, October.
    15. McAvoy, Alex & Fraiman, Nicolas & Hauert, Christoph & Wakeley, John & Nowak, Martin A., 2018. "Public goods games in populations with fluctuating size," Theoretical Population Biology, Elsevier, vol. 121(C), pages 72-84.
    16. Du, Chunpeng & Guo, Keyu & Lu, Yikang & Jin, Haoyu & Shi, Lei, 2023. "Aspiration driven exit-option resolves social dilemmas in the network," Applied Mathematics and Computation, Elsevier, vol. 438(C).
    17. Shuo Sun & Rundong Wang & Bo An, 2021. "Reinforcement Learning for Quantitative Trading," Papers 2109.13851, arXiv.org.
    18. Sarkar, Bijan, 2021. "The cooperation–defection evolution on social networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 584(C).
    19. Bai, Pengzhou & Qiang, Bingzhuang & Zou, Kuan & Huang, Changwei, 2024. "Preferential selection based on adaptive attractiveness induce by reinforcement learning promotes cooperation," Chaos, Solitons & Fractals, Elsevier, vol. 180(C).
    20. Li, Wenqing & Ni, Shaoquan, 2022. "Train timetabling with the general learning environment and multi-agent deep reinforcement learning," Transportation Research Part B: Methodological, Elsevier, vol. 157(C), pages 230-251.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:618:y:2023:i:c:s0378437123002546. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.