Author
Listed:
- Suyu Wang
(School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China
Institute of Intelligent Mining and Robotics, Beijing 100083, China)
- Quan Yue
(School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China)
- Zhenlei Xu
(School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China)
- Peihong Qiao
(School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China)
- Zhentao Lyu
(School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China)
- Feng Gao
(Beijing Huatie Information Technology Co., Ltd., Beijing 100081, China
Signal & Communication Research Institute, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China)
Abstract
Reinforcement learning has achieved significant success in sequential decision-making problems but exhibits poor adaptability in non-stationary environments with unknown dynamics, a challenge particularly pronounced in multi-agent scenarios. This study aims to enhance the adaptive capability of multi-agent systems in such volatile environments. We propose a novel cooperative Multi-Agent Reinforcement Learning (MARL) algorithm based on MADDPG, termed MACPH, which innovatively incorporates three mechanisms: a Composite Experience Replay Buffer (CERB) mechanism that balances recent and important historical experiences through a dual-buffer structure and mixed sampling; an Adaptive Parameter Space Noise (APSN) mechanism that perturbs actor network parameters and dynamically adjusts the perturbation intensity to achieve coherent and state-dependent exploration; and a Huber loss function mechanism to mitigate the impact of outliers in Temporal Difference errors and enhance training stability. The study was conducted in standard and non-stationary navigation and communication task scenarios. Ablation studies confirmed the positive contributions of each component and their synergistic effects. In non-stationary scenarios featuring abrupt environmental changes, experiments demonstrate that MACPH outperforms baseline algorithms such as DDPG, MADDPG, and MATD3 in terms of reward performance, adaptation speed, learning stability, and robustness. The proposed MACPH algorithm offers an effective solution for multi-agent reinforcement learning applications in complex non-stationary environments.
Suggested Citation
Suyu Wang & Quan Yue & Zhenlei Xu & Peihong Qiao & Zhentao Lyu & Feng Gao, 2025.
"A Collaborative Multi-Agent Reinforcement Learning Approach for Non-Stationary Environments with Unknown Change Points,"
Mathematics, MDPI, vol. 13(11), pages 1-25, May.
Handle:
RePEc:gam:jmathe:v:13:y:2025:i:11:p:1738-:d:1663584
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:11:p:1738-:d:1663584. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.