Author
Listed:
- Wang, Wenyuan
- Liu, Huakun
- Peng, Yun
- Cao, Zhen
- Yu, Pengxi
- Lu, Zanxin
Abstract
In container terminal yards, operational efficiency is significantly hindered by the mismatch between stochastic workload variations and static configurations of Yard Cranes (YCs) across multiple blocks. The real-time YC scheduling problem (YCSP-r) aims to develop adaptive, instantaneous scheduling policies that dynamically respond to workload fluctuations. In this paper, we propose a multi-agent reinforcement learning (RL) method to address the YCSP-r. Specifically, the YCSP-r is formulated as a Markov Decision Process (MDP) within an asynchronous timestep framework. Considering the non-negligible redeployment cost of YCs in real-time operations, the MDP is designed to balance redeployment costs with overall operational efficiency. A general simulator for the YC scheduling system is developed to execute action decisions and provides performance feedback. Proximal Policy Optimization (PPO) is employed to train the scheduling policy. A multi-agent shared-policy framework and a global–local mixed state structure is tailored to mitigate the challenges posed by high dimensional state and action spaces, thereby enhancing both convergence and training stability. To evaluate the solution quality, a mixed integer programming model for YCSP-r is developed and solved by a commercial solver as a benchmark for comparison. The proposed approach is further compared with other advanced RL and heuristic methods. Experimental results demonstrate that the proposed PPO-based approach is able to provide high-quality solutions in real time—typically within seconds—meeting the practical demands of container terminal operations. Notably, compared to a static YC deployment strategy, our scheduling strategy achieves a substantial 12.29% reduction in operational costs. We believe that our study provides valuable insights for port managers in developing practical and reliable YC scheduling solutions.
Suggested Citation
Wang, Wenyuan & Liu, Huakun & Peng, Yun & Cao, Zhen & Yu, Pengxi & Lu, Zanxin, 2026.
"Yard crane real-time scheduling among multi-block at terminal: A reinforcement learning based proximal policy optimization approach,"
Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 207(C).
Handle:
RePEc:eee:transe:v:207:y:2026:i:c:s136655452500657x
DOI: 10.1016/j.tre.2025.104635
Download full text from publisher
As the access to this document is restricted, you may want to
for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:transe:v:207:y:2026:i:c:s136655452500657x. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/600244/description#description .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.