Author
Listed:
- Renzhi Lu
(Huazhong University of Science and Technology, School of Artificial Intelligence and Automation, Engineering Research Center of Autonomous Intelligent Unmanned Systems, Chinese Ministry of Education, State Key Laboratory of Digital Manufacturing Equipment and Technology, Institute of Artificial Intelligence)
- Zonghe Shao
(Huazhong University of Science and Technology, School of Artificial Intelligence and Automation, Engineering Research Center of Autonomous Intelligent Unmanned Systems, Chinese Ministry of Education, State Key Laboratory of Digital Manufacturing Equipment and Technology, Institute of Artificial Intelligence)
- Yuemin Ding
(University of Navarra, Tecnun School of Engineering
Ikerbasque Foundation)
- Ruijuan Chen
(Wuhan Textile University, Research Center for Applied Mathematics and Interdisciplinary Science, School of Mathematics and Statistics)
- Dongrui Wu
(Huazhong University of Science and Technology, Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation)
- Housheng Su
(Huazhong University of Science and Technology, Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation)
- Tao Yang
(Northeastern University, State Key Laboratory of Synthetical Automation for Process Industrie)
- Fumin Zhang
(Hong Kong University of Science and Technology, Department of Electronic and Computer Engineering)
- Jun Wang
(City University of Hong Kong, Department of Computer Science and Department of Data Science)
- Yang Shi
(University of Victoria, Department of Mechanical Engineering)
- Zhong-Ping Jiang
(New York University, Department of Electrical and Computer Engineering)
- Han Ding
(Huazhong University of Science and Technology, State Key Laboratory of Digital Manufacturing Equipment and Technology, School of Mechanical Science and Engineering)
- Hai-Tao Zhang
(Huazhong University of Science and Technology, School of Artificial Intelligence and Automation, Engineering Research Center of Autonomous Intelligent Unmanned Systems, Chinese Ministry of Education, State Key Laboratory of Digital Manufacturing Equipment and Technology, Institute of Artificial Intelligence)
Abstract
Reward maximization is a fundamental principle in both the survival and evolution of biological organisms. In particular, in the contexts of cognitive science and embodied agents, reward-driven behavior has been widely regarded as important in terms of governing complex cognitive abilities such as perception, imitation, and learning. Among the frameworks that are aimed at establishing such abilities, reinforcement learning (RL), which leverages reward maximization to facilitate intelligent decision-making in embodied agents, has been proven to be particularly promising. Importantly, the inherent complexity and uncertainty of real-world tasks pose significant challenges when designing effective reward functions for embodied RL agents. Conventional methods typically rely on manually engineered or externally tuned reward signals, and therefore require significant domain expertise, associated with considerable human efforts and a long convergence time; these issues may even trigger mission failure. This work introduces a bilevel optimization framework that discovers optimal reward functions for embodied reinforcement learning agents through a mechanism called regret minimization. The approach accelerates policy optimization and enhances adaptability across diverse tasks. These findings can support the broader adoption of embodied RL agents in the behavioral and computational sciences and neurosciences, thereby paving the way for artificial general intelligence.
Suggested Citation
Renzhi Lu & Zonghe Shao & Yuemin Ding & Ruijuan Chen & Dongrui Wu & Housheng Su & Tao Yang & Fumin Zhang & Jun Wang & Yang Shi & Zhong-Ping Jiang & Han Ding & Hai-Tao Zhang, 2025.
"Discovery of the reward function for embodied reinforcement learning agents,"
Nature Communications, Nature, vol. 16(1), pages 1-15, December.
Handle:
RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-66009-y
DOI: 10.1038/s41467-025-66009-y
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-66009-y. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.