IDEAS home Printed from https://ideas.repec.org/a/eee/transe/v203y2025ics1366554525004089.html

A multi-agent deep reinforcement learning approach for multi-echelon inventory optimization and its application to the beer game

Author

Listed:
  • Hu, Junkai
  • Xia, Li
  • Huang, Teng
  • Wu, Haoran

Abstract

Cost-effective inventory management in supply chain networks remains challenging since uncertainties may be distributed along with multi-echelon inventories. This paper investigates a decentralized multi-echelon inventory optimization problem with stochastic demands, focusing on a serial four-echelon supply chain in the beer game. In this problem, participants at each stage independently execute their replenishment policies based on local information, while collectively minimizing the total cost of the entire supply chain. We formulate this problem as a decentralized partially observable Markov decision process and develop a novel multi-agent deep reinforcement learning (MADRL) algorithm to address it, without relying on any stylized problem assumptions. Additionally, given that MADRL usually trains agents in simulated environments and the policy obtained may perform poorly in target environments due to environmental discrepancies, we propose a domain randomization technique integrated into MADRL, called DR-MADRL, to enhance the robustness of learned policies. Numerical experiments show that, when simulated and target environments align, MADRL’s policies come close to optimum and exhibit structural properties similar to the base-stock policy that is optimal analytically. The superiority of MADRL is further demonstrated as it surpasses existing approximate or heuristic methods in instances devoid of analytically optimal policies. Meanwhile, the agents demonstrate certain behavioral characteristics, which could aid in identifying or developing new replenishment policies. For scenarios with environmental discrepancies, DR-MADRL’s policy demonstrates substantial resilience to supply chain variations. These results show the effectiveness of MADRL and DR-MADRL in addressing the multi-echelon inventory optimization problem.

Suggested Citation

  • Hu, Junkai & Xia, Li & Huang, Teng & Wu, Haoran, 2025. "A multi-agent deep reinforcement learning approach for multi-echelon inventory optimization and its application to the beer game," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 203(C).
  • Handle: RePEc:eee:transe:v:203:y:2025:i:c:s1366554525004089
    DOI: 10.1016/j.tre.2025.104367
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1366554525004089
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.tre.2025.104367?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Kevin Geevers & Lotte Hezewijk & Martijn R. K. Mes, 2024. "Multi-echelon inventory optimization using deep reinforcement learning," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 32(3), pages 653-683, September.
    2. Lee, Junhyeok & Shin, Youngchul & Moon, Ilkyeong, 2024. "A hybrid deep reinforcement learning approach for a proactive transshipment of fresh food in the online–offline channel system," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 187(C).
    3. Andrew J. Clark & Herbert Scarf, 2004. "Optimal Policies for a Multi-Echelon Inventory Problem," Management Science, INFORMS, vol. 50(12_supple), pages 1782-1790, December.
    4. Stranieri, Francesco & Fadda, Edoardo & Stella, Fabio, 2024. "Combining deep reinforcement learning and multi-stage stochastic programming to address the supply chain inventory management problem," International Journal of Production Economics, Elsevier, vol. 268(C).
    5. Fangruo Chen & Yu-Sheng Zheng, 1994. "Lower Bounds for Multi-Echelon Stochastic Inventory Systems," Management Science, INFORMS, vol. 40(11), pages 1426-1443, November.
    6. Afshin Oroojlooyjadid & MohammadReza Nazari & Lawrence V. Snyder & Martin Takáč, 2022. "A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization," Manufacturing & Service Operations Management, INFORMS, vol. 24(1), pages 285-304, January.
    7. de Kok, Ton & Grob, Christopher & Laumanns, Marco & Minner, Stefan & Rambau, Jörg & Schade, Konrad, 2018. "A typology and literature review on stochastic multi-echelon inventory models," European Journal of Operational Research, Elsevier, vol. 269(3), pages 955-983.
    8. Kenneth F. Simpson, 1958. "In-Process Inventories," Operations Research, INFORMS, vol. 6(6), pages 863-873, December.
    9. Stephen C. Graves, 1985. "A Multi-Echelon Inventory Model for a Repairable Item with One-for-One Replenishment," Management Science, INFORMS, vol. 31(10), pages 1247-1256, October.
    10. Hau L. Lee & V. Padmanabhan & Seungjin Whang, 1997. "Information Distortion in a Supply Chain: The Bullwhip Effect," Management Science, INFORMS, vol. 43(4), pages 546-558, April.
    11. Fangruo Chen & Yu-Sheng Zheng, 1998. "Near-Optimal Echelon-Stock (R, nQ) Policies in Multistage Serial Systems," Operations Research, INFORMS, vol. 46(4), pages 592-602, August.
    12. Dony S. Kurian & V. Madhusudanan Pillai & J. Gautham & Akash Raut, 2023. "Data-driven imitation learning-based approach for order size determination in supply chains," European Journal of Industrial Engineering, Inderscience Enterprises Ltd, vol. 17(3), pages 379-407.
    13. Yang Deng & Andy H. F. Chow & Yimo Yan & Zicheng Su & Zhili Zhou & Yong-Hong Kuo, 2025. "Hierarchical production control and distribution planning under retail uncertainty with reinforcement learning," International Journal of Production Research, Taylor & Francis Journals, vol. 63(12), pages 4504-4522, June.
    14. Guillermo Gallego & Paul Zipkin, 1999. "Stock Positioning and Performance Estimation in Serial Production-Transportation Systems," Manufacturing & Service Operations Management, INFORMS, vol. 1(1), pages 77-88.
    15. Yan, Yimo & Chow, Andy H.F. & Ho, Chin Pang & Kuo, Yong-Hong & Wu, Qihao & Ying, Chengshuo, 2022. "Reinforcement learning for logistics and supply chain management: Methodologies, state of the art, and future opportunities," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 162(C).
    16. Lee, Hyun-Rok & Lee, Taesik, 2021. "Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response," European Journal of Operational Research, Elsevier, vol. 291(1), pages 296-308.
    17. John D. Sterman, 1989. "Modeling Managerial Behavior: Misperceptions of Feedback in a Dynamic Decision Making Experiment," Management Science, INFORMS, vol. 35(3), pages 321-339, March.
    18. Boute, Robert N. & Gijsbrechts, Joren & van Jaarsveld, Willem & Vanvuchelen, Nathalie, 2022. "Deep reinforcement learning for inventory control: A roadmap," European Journal of Operational Research, Elsevier, vol. 298(2), pages 401-412.
    19. Joren Gijsbrechts & Robert N. Boute & Jan A. Van Mieghem & Dennis J. Zhang, 2022. "Can Deep Reinforcement Learning Improve Inventory Management? Performance on Lost Sales, Dual-Sourcing, and Multi-Echelon Problems," Manufacturing & Service Operations Management, INFORMS, vol. 24(3), pages 1349-1368, May.
    20. Paul Glasserman & Sridhar Tayur, 1995. "Sensitivity Analysis for Base-Stock Levels in Multiechelon Production-Inventory Systems," Management Science, INFORMS, vol. 41(2), pages 263-281, February.
    21. Rachel Croson & Karen Donohue, 2006. "Behavioral Causes of the Bullwhip Effect and the Observed Value of Inventory Information," Management Science, INFORMS, vol. 52(3), pages 323-336, March.
    22. Kaj Rosling, 1989. "Optimal Inventory Policies for Assembly Systems Under Random Demands," Operations Research, INFORMS, vol. 37(4), pages 565-579, August.
    23. Daniel S. Bernstein & Robert Givan & Neil Immerman & Shlomo Zilberstein, 2002. "The Complexity of Decentralized Control of Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(4), pages 819-840, November.
    24. Francesco Stranieri & Fabio Stella & Chaaben Kouki, 2024. "Performance of deep reinforcement learning algorithms in two-echelon inventory control systems," International Journal of Production Research, Taylor & Francis Journals, vol. 62(17), pages 6211-6226, September.
    25. Manupati, Vijaya Kumar & Schoenherr, Tobias & Subramanian, Nachiappan & Ramkumar, M. & Soni, Bhanushree & Panigrahi, Suraj, 2021. "A multi-echelon dynamic cold chain for managing vaccine distribution," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 156(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fleuren, Tijn, 2025. "Stochastic approaches for production-inventory planning : Applications to high-tech supply chains," Other publications TiSEM 1fe1bbe5-fd90-4077-8606-d, Tilburg University, School of Economics and Management.
    2. de Kok, Ton & Grob, Christopher & Laumanns, Marco & Minner, Stefan & Rambau, Jörg & Schade, Konrad, 2018. "A typology and literature review on stochastic multi-echelon inventory models," European Journal of Operational Research, Elsevier, vol. 269(3), pages 955-983.
    3. Bergsma, Ritsaart & de Ruijt, Corné & Bhulai, Sandjai, 2025. "A systematic review of machine learning approaches in inventory control optimization," Operations Research Perspectives, Elsevier, vol. 15(C).
    4. Barnes-Schuster, Dawn & Bassok, Yehuda & Anupindi, Ravi, 2006. "Optimizing delivery lead time/inventory placement in a two-stage production/distribution system," European Journal of Operational Research, Elsevier, vol. 174(3), pages 1664-1684, November.
    5. Vanteddu, Gangaraju & Chinnam, Ratna Babu & Gushikin, Oleg, 2011. "Supply chain focus dependent supplier selection problem," International Journal of Production Economics, Elsevier, vol. 129(1), pages 204-216, January.
    6. Lingxiu Dong & Hau L. Lee, 2003. "Optimal Policies and Approximations for a Serial Multiechelon Inventory System with Time-Correlated Demand," Operations Research, INFORMS, vol. 51(6), pages 969-980, December.
    7. Tan Wang & L. Jeff Hong, 2023. "Large-Scale Inventory Optimization: A Recurrent Neural Networks–Inspired Simulation Approach," INFORMS Journal on Computing, INFORMS, vol. 35(1), pages 196-215, January.
    8. Noel Watson & Yu-Sheng Zheng, 2005. "Decentralized Serial Supply Chains Subject to Order Delays and Information Distortion: Exploiting Real-Time Sales Data," Manufacturing & Service Operations Management, INFORMS, vol. 7(2), pages 152-168, May.
    9. Li, Xiuhui & Wang, Qinan, 2007. "Coordination mechanisms of supply chain systems," European Journal of Operational Research, Elsevier, vol. 179(1), pages 1-16, May.
    10. Cui, Geng & Imura, Naoto & Nishinari, Katsuhiro & Ezaki, Takahiro, 2025. "On order smoothing interpolating the order-up-to and constant order policies," Omega, Elsevier, vol. 136(C).
    11. Fleuren, Tijn & Merzifonluoglu, Yasemin & Sotirov, Renata & Hendriks, Maarten, 2025. "Production–inventory planning in high-tech low-volume manufacturing supply chains," International Journal of Production Economics, Elsevier, vol. 288(C).
    12. van Houtum, G. J. & Inderfurth, K. & Zijm, W. H. M., 1996. "Materials coordination in stochastic multi-echelon systems," European Journal of Operational Research, Elsevier, vol. 95(1), pages 1-23, November.
    13. Guillermo Gallego & Özalp Özer & Paul Zipkin, 2007. "Bounds, Heuristics, and Approximations for Distribution Systems," Operations Research, INFORMS, vol. 55(3), pages 503-517, June.
    14. Guillermo Gallego & Paul Zipkin, 1999. "Stock Positioning and Performance Estimation in Serial Production-Transportation Systems," Manufacturing & Service Operations Management, INFORMS, vol. 1(1), pages 77-88.
    15. Rodney P. Parker & Roman Kapuscinski, 2004. "Optimal Policies for a Capacitated Two-Echelon Inventory System," Operations Research, INFORMS, vol. 52(5), pages 739-755, October.
    16. Fangruo Chen, 2000. "Optimal Policies for Multi-Echelon Inventory Problems with Batch Ordering," Operations Research, INFORMS, vol. 48(3), pages 376-389, June.
    17. David Simchi-Levi & Yao Zhao, 2005. "Safety Stock Positioning in Supply Chains with Stochastic Lead Times," Manufacturing & Service Operations Management, INFORMS, vol. 7(4), pages 295-318, December.
    18. Ming Hu & Yi Yang, 2014. "Modified Echelon ( r, Q ) Policies with Guaranteed Performance Bounds for Stochastic Serial Inventory Systems," Operations Research, INFORMS, vol. 62(4), pages 812-828, August.
    19. Alp Muharremoglu & John N. Tsitsiklis, 2008. "A Single-Unit Decomposition Approach to Multiechelon Inventory Systems," Operations Research, INFORMS, vol. 56(5), pages 1089-1103, October.
    20. Fangruo Chen & Rungson Samroengraja, 2000. "A Staggered Ordering Policy for One-Warehouse, Multiretailer Systems," Operations Research, INFORMS, vol. 48(2), pages 281-293, April.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:transe:v:203:y:2025:i:c:s1366554525004089. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/600244/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.