Author
Abstract
As the final and most labor-intensive segment of the logistics chain, last-mile delivery grapples with inherent challenges: dynamic traffic conditions, fluctuating order volumes, and the conflicting demands of timeliness, cost control, and resource efficiency. Conventional dispatch approaches-such as heuristic algorithms and static optimization models-exhibit limited adaptability to real-time fluctuations, often resulting in suboptimal resource utilization and elevated operational costs. To address these gaps, this study proposes a reinforcement learning (RL) framework integrated with multi-dimensional reward shaping (RS) to enhance dynamic last-mile delivery dispatch efficiency. First, we formalize the dispatch problem as a Markov Decision Process (MDP) that explicitly incorporates real-time factors (e.g., traffic congestion, order urgency, and vehicle status) into the state space. Second, we design a domain-specific RS function that introduces intermediate rewards (e.g., on-time arrival bonuses, empty-running penalties) to mitigate the sparsity of traditional terminal rewards and accelerate RL agent convergence. Experiments were conducted on a real-world dataset from a logistics enterprise in Chengdu (June-August 2024), comparing the proposed RS-PPO framework against two baselines: the classic Savings Algorithm (SA) and standard PPO without reward shaping (PPO-noRS). Results demonstrate that RS-PPO improves the on-time delivery rate (OTR) by 18.2% (vs. SA) and 9.5% (vs. PPO-noRS), reduces the average delivery cost (ADC) by 12.7% (vs. SA) and 7.3% (vs. PPO-noRS), and shortens convergence time by 40.3% (vs. PPO-noRS). Additionally, RS-PPO boosts vehicle utilization rate (VUR) by 29.8% (vs. SA) and 13.4% (vs. PPO-noRS). This framework provides a practical, data-driven solution for logistics enterprises seeking to balance service quality, cost efficiency, and sustainability-aligning with global last-mile optimization trends.
Suggested Citation
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:dba:ejbema:v:1:y:2025:i:4:p:122-130. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joseph Clark (email available below). General contact details of provider: https://pinnaclepubs.com/index.php/EJBEM .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.