A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization

A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization

Author

Listed:

Afshin Oroojlooyjadid
(Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015)
MohammadReza Nazari
(Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015)
Lawrence V. Snyder
(Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015)
Martin Takáč
(Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015)

Abstract

Problem definition : The beer game is widely used in supply chain management classes to demonstrate the bullwhip effect and the importance of supply chain coordination. The game is a decentralized, multiagent, cooperative problem that can be modeled as a serial supply chain network in which agents choose order quantities while cooperatively attempting to minimize the network’s total cost, although each agent only observes local information. Academic/practical relevance : Under some conditions, a base-stock replenishment policy is optimal. However, in a decentralized supply chain in which some agents act irrationally, there is no known optimal policy for an agent wishing to act optimally. Methodology : We propose a deep reinforcement learning (RL) algorithm to play the beer game. Our algorithm makes no assumptions about costs or other settings. As with any deep RL algorithm, training is computationally intensive, but once trained, the algorithm executes in real time. We propose a transfer-learning approach so that training performed for one agent can be adapted quickly for other agents and settings. Results : When playing with teammates who follow a base-stock policy, our algorithm obtains near-optimal order quantities. More important, it performs significantly better than a base-stock policy when other agents use a more realistic model of human ordering behavior. We observe similar results using a real-world data set. Sensitivity analysis shows that a trained model is robust to changes in the cost coefficients. Finally, applying transfer learning reduces the training time by one order of magnitude. Managerial implications : This paper shows how artificial intelligence can be applied to inventory optimization. Our approach can be extended to other supply chain optimization problems, especially those in which supply chain partners act in irrational or unpredictable ways. Our RL agent has been integrated into a new online beer game, which has been played more than 17,000 times by more than 4,000 people.

Suggested Citation

Afshin Oroojlooyjadid & MohammadReza Nazari & Lawrence V. Snyder & Martin Takáč, 2022. "A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization," Manufacturing & Service Operations Management, INFORMS, vol. 24(1), pages 285-304, January.

Handle: RePEc:inm:ormsom:v:24:y:2022:i:1:p:285-304
DOI: 10.1287/msom.2020.0939

Download full text from publisher

References listed on IDEAS

Gah-Yi Ban & Cynthia Rudin, 2019. "The Big Data Newsvendor: Practical Insights from Machine Learning," Operations Research, INFORMS, vol. 67(1), pages 90-108, January.
Andrew J. Clark & Herbert Scarf, 2004. "Optimal Policies for a Multi-Echelon Inventory Problem," Management Science, INFORMS, vol. 50(12_supple), pages 1782-1790, December.
- Andrew J. Clark & Herbert Scarf, 1960. "Optimal Policies for a Multi-Echelon Inventory Problem," Management Science, INFORMS, vol. 6(4), pages 475-490, July.
Geary, S. & Disney, S.M. & Towill, D.R., 2006. "On bullwhip in supply chains--historical review, present practice and expected future impact," International Journal of Production Economics, Elsevier, vol. 101(1), pages 2-18, May.
Fangruo Chen & Yu-Sheng Zheng, 1994. "Lower Bounds for Multi-Echelon Stochastic Inventory Systems," Management Science, INFORMS, vol. 40(11), pages 1426-1443, November.
Gah-Yi Ban & Jérémie Gallien & Adam J. Mersereau, 2019. "Dynamic Procurement of New Products with Covariate Information: The Residual Tree Method," Manufacturing & Service Operations Management, INFORMS, vol. 21(4), pages 798-815, October.
Stephen C. Graves, 1985. "A Multi-Echelon Inventory Model for a Repairable Item with One-for-One Replenishment," Management Science, INFORMS, vol. 31(10), pages 1247-1256, October.
Hau L. Lee & V. Padmanabhan & Seungjin Whang, 1997. "Information Distortion in a Supply Chain: The Bullwhip Effect," Management Science, INFORMS, vol. 43(4), pages 546-558, April.
Giannoccaro, Ilaria & Pontrandolfo, Pierpaolo, 2002. "Inventory management in supply chains: a reinforcement learning approach," International Journal of Production Economics, Elsevier, vol. 78(2), pages 153-161, July.
Guillermo Gallego & Paul Zipkin, 1999. "Stock Positioning and Performance Estimation in Serial Production-Transportation Systems," Manufacturing & Service Operations Management, INFORMS, vol. 1(1), pages 77-88.
John D. Sterman, 1989. "Modeling Managerial Behavior: Misperceptions of Feedback in a Dynamic Decision Making Experiment," Management Science, INFORMS, vol. 35(3), pages 321-339, March.
Afshin Oroojlooyjadid & Lawrence V. Snyder & Martin Takáč, 2020. "Applying deep learning to the newsvendor problem," IISE Transactions, Taylor & Francis Journals, vol. 52(4), pages 444-463, April.
Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
Rachel Croson & Karen Donohue, 2006. "Behavioral Causes of the Bullwhip Effect and the Observed Value of Inventory Information," Management Science, INFORMS, vol. 52(3), pages 323-336, March.
Daniel S. Bernstein & Robert Givan & Neil Immerman & Shlomo Zilberstein, 2002. "The Complexity of Decentralized Control of Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(4), pages 819-840, November.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Yang, Yimin & Yi, Chaoqun & Li, Hailing & Dong, Xuesong & Yang, Lulu & Wang, Zilong, 2025. "An analysis on the role of artificial intelligence in green supply chains," Technological Forecasting and Social Change, Elsevier, vol. 217(C).
Sara Cheraghi & Abdorrahman Haeri & Seyed Farid Ghannadpour, 2025. "A dynamic and intelligent decision-making framework for a platelet inventory-distribution network," Operational Research, Springer, vol. 25(3), pages 1-61, September.
Bergsma, Ritsaart & de Ruijt, Corné & Bhulai, Sandjai, 2025. "A systematic review of machine learning approaches in inventory control optimization," Operations Research Perspectives, Elsevier, vol. 15(C).
Cui, Geng & Imura, Naoto & Nishinari, Katsuhiro & Ezaki, Takahiro, 2025. "On order smoothing interpolating the order-up-to and constant order policies," Omega, Elsevier, vol. 136(C).
Wang, Zihao & Wang, Wenlong & Liu, Tianjun & Chang, Jasmine & Shi, Jim, 2025. "IoT-driven dynamic replenishment of fresh produce in the presence of seasonal variations: A deep reinforcement learning approach using reward shaping," Omega, Elsevier, vol. 134(C).
Hu, Junkai & Xia, Li & Huang, Teng & Wu, Haoran, 2025. "A multi-agent deep reinforcement learning approach for multi-echelon inventory optimization and its application to the beer game," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 203(C).
Xueqin Hu & Hua Song & Yuan Mi & Xiaoye Yang, 2025. "Unveiling the path to sustainable carbon reduction: a comparative analysis of bank-led vs. firm-led carbon finance strategies," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 12(1), pages 1-18, December.
Ramin Ahadi & Wolfgang Ketter & John Collins & Nicolò Daina, 2023. "Cooperative Learning for Smart Charging of Shared Autonomous Vehicle Fleets," Transportation Science, INFORMS, vol. 57(3), pages 613-630, May.
Yi Chen & Jing Dong & Zhaoran Wang & Chuheng Zhang, 2026. "A Primal-Dual Approach to Constrained Markov Decision Processes with Applications to Queue Scheduling and Inventory Management," Management Science, INFORMS, vol. 72(2), pages 955-988, February.
Ralfs, Jana & Pham, Dai T. & Kiesmüller, Gudrun P., 2025. "Optimal outbound shipment policy for an inventory system with advance demand information," European Journal of Operational Research, Elsevier, vol. 324(1), pages 92-103.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Hu, Junkai & Xia, Li & Huang, Teng & Wu, Haoran, 2025. "A multi-agent deep reinforcement learning approach for multi-echelon inventory optimization and its application to the beer game," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 203(C).
Noel Watson & Yu-Sheng Zheng, 2005. "Decentralized Serial Supply Chains Subject to Order Delays and Information Distortion: Exploiting Real-Time Sales Data," Manufacturing & Service Operations Management, INFORMS, vol. 7(2), pages 152-168, May.
Jingkai Huang & Kevin Shang & Yi Yang & Weihua Zhou & Yuan Li, 2025. "Taylor Approximation of Inventory Policies for One-Warehouse, Multi-Retailer Systems with Demand Feature Information," Management Science, INFORMS, vol. 71(1), pages 879-897, January.
Wang, Xun & Disney, Stephen M., 2016. "The bullwhip effect: Progress, trends and directions," European Journal of Operational Research, Elsevier, vol. 250(3), pages 691-701.
Yang, Y. & Lin, J. & Liu, G. & Zhou, L., 2021. "The behavioural causes of bullwhip effect in supply chains: A systematic literature review," International Journal of Production Economics, Elsevier, vol. 236(C).
Schmidt, Felix G. & Pibernik, Richard, 2025. "Data-driven inventory control for large product portfolios: A practical application of prescriptive analytics," European Journal of Operational Research, Elsevier, vol. 322(1), pages 254-269.
Fangruo Chen, 1999. "Decentralized Supply Chains Subject to Information Delays," Management Science, INFORMS, vol. 45(8), pages 1076-1090, August.
Guillermo Gallego & Özalp Özer & Paul Zipkin, 2007. "Bounds, Heuristics, and Approximations for Distribution Systems," Operations Research, INFORMS, vol. 55(3), pages 503-517, June.
Deniz Preil & Michael Krapp, 2022. "Artificial intelligence-based inventory management: a Monte Carlo tree search approach," Annals of Operations Research, Springer, vol. 308(1), pages 415-439, January.
Zhu, Tianyuan & Balakrishnan, Jaydeep & da Silveira, Giovani J.C., 2020. "Bullwhip effect in the oil and gas supply chain: A multiple-case study," International Journal of Production Economics, Elsevier, vol. 224(C).
de Kok, Ton & Grob, Christopher & Laumanns, Marco & Minner, Stefan & Rambau, Jörg & Schade, Konrad, 2018. "A typology and literature review on stochastic multi-echelon inventory models," European Journal of Operational Research, Elsevier, vol. 269(3), pages 955-983.
Zhao, Yingshuai & Zhao, Xiaobo, 2015. "On human decision behavior in multi-echelon inventory management," International Journal of Production Economics, Elsevier, vol. 161(C), pages 116-128.
Qian, Zishun & Yan, Tingting & Li, Jianbin, 2025. "Process innovation similarity with suppliers and bullwhip effect," International Journal of Production Economics, Elsevier, vol. 289(C).
Mansur M. Arief, 2026. "Deepbullwhip: An Open-Source Simulation and Benchmarking for Multi-Echelon Bullwhip Analyses," Papers 2604.13478, arXiv.org.
Lingxiu Dong & Hau L. Lee, 2003. "Optimal Policies and Approximations for a Serial Multiechelon Inventory System with Time-Correlated Demand," Operations Research, INFORMS, vol. 51(6), pages 969-980, December.
Cannella, Salvatore & Framinan, Jose M. & Bruccoleri, Manfredi & Barbosa-Póvoa, Ana Paula & Relvas, Susana, 2015. "The effect of Inventory Record Inaccuracy in Information Exchange Supply Chains," European Journal of Operational Research, Elsevier, vol. 243(1), pages 120-129.
K. Devika & A. Jafarian & A. Hassanzadeh & R. Khodaverdi, 2016. "Optimizing of bullwhip effect and net stock amplification in three-echelon supply chains using evolutionary multi-objective metaheuristics," Annals of Operations Research, Springer, vol. 242(2), pages 457-487, July.
Fangruo Chen & Rungson Samroengraja, 2000. "A Staggered Ordering Policy for One-Warehouse, Multiretailer Systems," Operations Research, INFORMS, vol. 48(2), pages 281-293, April.
Ma, Yungao & Wang, Nengmin & He, Zhengwen & Lu, Jizhou & Liang, Huigang, 2015. "Analysis of the bullwhip effect in two parallel supply chains with interacting price-sensitive demands," European Journal of Operational Research, Elsevier, vol. 243(3), pages 815-825.
Singh, Yogendra & Disney, Stephen M., 2026. "On the transition from make-to-stock to make-to-order," European Journal of Operational Research, Elsevier, vol. 330(2), pages 444-457.

More about this item

Keywords

; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormsom:v:24:y:2022:i:1:p:285-304. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data