IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v13y2025i11p1738-d1663584.html
   My bibliography  Save this article

A Collaborative Multi-Agent Reinforcement Learning Approach for Non-Stationary Environments with Unknown Change Points

Author

Listed:
  • Suyu Wang

    (School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China
    Institute of Intelligent Mining and Robotics, Beijing 100083, China)

  • Quan Yue

    (School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China)

  • Zhenlei Xu

    (School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China)

  • Peihong Qiao

    (School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China)

  • Zhentao Lyu

    (School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China)

  • Feng Gao

    (Beijing Huatie Information Technology Co., Ltd., Beijing 100081, China
    Signal & Communication Research Institute, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China)

Abstract

Reinforcement learning has achieved significant success in sequential decision-making problems but exhibits poor adaptability in non-stationary environments with unknown dynamics, a challenge particularly pronounced in multi-agent scenarios. This study aims to enhance the adaptive capability of multi-agent systems in such volatile environments. We propose a novel cooperative Multi-Agent Reinforcement Learning (MARL) algorithm based on MADDPG, termed MACPH, which innovatively incorporates three mechanisms: a Composite Experience Replay Buffer (CERB) mechanism that balances recent and important historical experiences through a dual-buffer structure and mixed sampling; an Adaptive Parameter Space Noise (APSN) mechanism that perturbs actor network parameters and dynamically adjusts the perturbation intensity to achieve coherent and state-dependent exploration; and a Huber loss function mechanism to mitigate the impact of outliers in Temporal Difference errors and enhance training stability. The study was conducted in standard and non-stationary navigation and communication task scenarios. Ablation studies confirmed the positive contributions of each component and their synergistic effects. In non-stationary scenarios featuring abrupt environmental changes, experiments demonstrate that MACPH outperforms baseline algorithms such as DDPG, MADDPG, and MATD3 in terms of reward performance, adaptation speed, learning stability, and robustness. The proposed MACPH algorithm offers an effective solution for multi-agent reinforcement learning applications in complex non-stationary environments.

Suggested Citation

  • Suyu Wang & Quan Yue & Zhenlei Xu & Peihong Qiao & Zhentao Lyu & Feng Gao, 2025. "A Collaborative Multi-Agent Reinforcement Learning Approach for Non-Stationary Environments with Unknown Change Points," Mathematics, MDPI, vol. 13(11), pages 1-25, May.
  • Handle: RePEc:gam:jmathe:v:13:y:2025:i:11:p:1738-:d:1663584
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/13/11/1738/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/13/11/1738/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Li, Jiawen & Dai, Jichao & Cui, Haoyang, 2025. "Bionic cooperative load frequency control in interconnected grids: A multi-agent deep Meta reinforcement learning approach," Applied Energy, Elsevier, vol. 379(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Jiawen & Zhou, Tao, 2025. "Fully autonomous load frequency control for integrated energy system with massive energy prosumers using multi-agent deep meta reinforcement learning," Renewable and Sustainable Energy Reviews, Elsevier, vol. 213(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:11:p:1738-:d:1663584. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.