Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand

My bibliography Save this article

Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand

Author

Listed:

Dehaybe, Henri
Catanzaro, Daniele
Chevalier, Philippe

Registered:

Philippe Chevalier

Abstract

We consider here a single-item lot sizing problem with fixed costs, lead time, and both backorders and lost sales, and we show that, after an appropriate training in randomly generated environments, Deep Reinforcement Learning (DRL) agents can interpolate in real-time near-optimal dynamic policies on instances with a rolling-horizon, provided a previously unseen demand forecast and without the need to periodically resolve the problem. Extensive computational experiments show that the policies provided by these agents compete, and in some circumstances even outperform by several percentage points of gap, those provided by heuristics based on dynamic programming. These results confirm the importance of DRL in the context of inventory control problems and support its use in solving practical instances featuring realistic assumptions.

Suggested Citation

Dehaybe, Henri & Catanzaro, Daniele & Chevalier, Philippe, 2024. "Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand," European Journal of Operational Research, Elsevier, vol. 314(2), pages 433-445.

Handle: RePEc:eee:ejores:v:314:y:2024:i:2:p:433-445
DOI: 10.1016/j.ejor.2023.10.007

Download full text from publisher

As the access to this document is restricted, you may want to look for a different version below or

for a different version of it.

Other versions of this item:

Dehaybe, Henri & Catanzaro, Daniele & Chevalier, Philippe, 2023. "Deep Reinforcement Learning for Inventory Optimization with Non-Stationary Uncertain Demand," LIDAM Reprints CORE 3270, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).

References listed on IDEAS

De Moor, Bram J. & Gijsbrechts, Joren & Boute, Robert N., 2022. "Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management," European Journal of Operational Research, Elsevier, vol. 301(2), pages 535-545.
James H. Bookbinder & Jin-Yan Tan, 1988. "Strategies for the Probabilistic Lot-Sizing Problem with Service-Level Constraints," Management Science, INFORMS, vol. 34(9), pages 1096-1108, September.
Srinivas Bollapragada & Thomas E. Morton, 1999. "A Simple Heuristic for Computing Nonstationary (s, S) Policies," Operations Research, INFORMS, vol. 47(4), pages 576-584, August.
Amirhosein Norouzi & Reha Uzsoy, 2014. "Modeling the evolution of dependency between demands, with application to inventory planning," IISE Transactions, Taylor & Francis Journals, vol. 46(1), pages 55-66.
Steven Nahmias, 1979. "Simple Approximations for a Variety of Dynamic Leadtime Lost-Sales Inventory Models," Operations Research, INFORMS, vol. 27(5), pages 904-924, October.
Stephen C. Graves, 1999. "Addendum to "A Single-Item Inventory Model for a Nonstationary Demand Process"," Manufacturing & Service Operations Management, INFORMS, vol. 1(2), pages 174-174.
Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2018. "Computing non-stationary (s, S) policies using mixed integer linear programming," European Journal of Operational Research, Elsevier, vol. 271(2), pages 490-500.
Evan L. Porteus, 1971. "On the Optimality of Generalized (s, S) Policies," Management Science, INFORMS, vol. 17(7), pages 411-426, March.
Donald L. Iglehart, 1963. "Optimality of (s, S) Policies in the Infinite Horizon Dynamic Inventory Problem," Management Science, INFORMS, vol. 9(2), pages 259-267, January.
Andrew J. Clark & Herbert Scarf, 2004. "Optimal Policies for a Multi-Echelon Inventory Problem," Management Science, INFORMS, vol. 50(12_supple), pages 1782-1790, December.
- Andrew J. Clark & Herbert Scarf, 1960. "Optimal Policies for a Multi-Echelon Inventory Problem," Management Science, INFORMS, vol. 6(4), pages 475-490, July.
Boute, Robert N. & Gijsbrechts, Joren & van Jaarsveld, Willem & Vanvuchelen, Nathalie, 2022. "Deep reinforcement learning for inventory control: A roadmap," European Journal of Operational Research, Elsevier, vol. 298(2), pages 401-412.
Lingxiu Dong & Hau L. Lee, 2003. "Optimal Policies and Approximations for a Serial Multiechelon Inventory System with Time-Correlated Demand," Operations Research, INFORMS, vol. 51(6), pages 969-980, December.
Dural-Selcuk, Gozdem & Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2020. "The benefit of receding horizon control: Near-optimal policies for stochastic inventory control," Omega, Elsevier, vol. 97(C).
Stephen C. Graves, 1999. "A Single-Item Inventory Model for a Nonstationary Demand Process," Manufacturing & Service Operations Management, INFORMS, vol. 1(1), pages 50-61.
Tetsuo Iida & Paul H. Zipkin, 2006. "Approximate Solutions of a Dynamic Forecast-Inventory Model," Manufacturing & Service Operations Management, INFORMS, vol. 8(4), pages 407-425, October.
Hill, Roger M. & Johansen, Soren Glud, 2006. "Optimal and near-optimal policies for lost sales inventory models with at most one replenishment order outstanding," European Journal of Operational Research, Elsevier, vol. 169(1), pages 111-132, February.
Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2015. "Piecewise linear approximations for the static–dynamic uncertainty strategy in stochastic lot-sizing," Omega, Elsevier, vol. 50(C), pages 126-140.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Cui, Geng & Imura, Naoto & Nishinari, Katsuhiro & Ezaki, Takahiro, 2025. "On order smoothing interpolating the order-up-to and constant order policies," Omega, Elsevier, vol. 136(C).
Sarkar, Puja & Khanapuri, Vivekanand B. & Tiwari, Manoj Kumar, 2025. "Integration of prediction and optimization for smart stock portfolio selection," European Journal of Operational Research, Elsevier, vol. 321(1), pages 243-256.
Temizöz, Tarkan & Imdahl, Christina & Dijkman, Remco & Lamghari-Idrissi, Douniel & van Jaarsveld, Willem, 2025. "Deep Controlled Learning for Inventory Control," European Journal of Operational Research, Elsevier, vol. 324(1), pages 104-117.
van Hezewijk, Lotte & Dellaert, Nico P. & van Jaarsveld, Willem L., 2025. "Scalable deep reinforcement learning in the non-stationary capacitated lot sizing problem," International Journal of Production Economics, Elsevier, vol. 284(C).
Abada, Ibrahim & Lambin, Xavier & Tchakarov, Nikolay, 2024. "Collusion by mistake: Does algorithmic sophistication drive supra-competitive profits?," European Journal of Operational Research, Elsevier, vol. 318(3), pages 927-953.
Bo Zhang & Wen Jun Tan & Wentong Cai & Allan N. Zhang, 2024. "Leveraging Multi-Agent Reinforcement Learning for Digital Transformation in Supply Chain Inventory Optimization," Sustainability, MDPI, vol. 16(22), pages 1-17, November.
Akkerman, Fabian & Prak, Dennis & Mes, Martijn, 2025. "Dynamic reordering and inspection for the multi-item Inventory Record Inaccuracy problem," European Journal of Operational Research, Elsevier, vol. 321(2), pages 428-444.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2023. "A mathematical programming-based solution method for the nonstationary inventory problem under correlated demand," European Journal of Operational Research, Elsevier, vol. 304(2), pages 515-524.
Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2018. "Computing non-stationary (s, S) policies using mixed integer linear programming," European Journal of Operational Research, Elsevier, vol. 271(2), pages 490-500.
Chen, Zhen & Rossi, Roberto, 2021. "A dynamic ordering policy for a stochastic inventory problem with cash constraints," Omega, Elsevier, vol. 102(C).
Ma, Xiyuan & Rossi, Roberto & Archibald, Thomas Welsh, 2022. "Approximations for non-stationary stochastic lot-sizing under (s,Q)-type policy," European Journal of Operational Research, Elsevier, vol. 298(2), pages 573-584.
Ren, Ke & Bidkhori, Hoda & Shen, Zuo-Jun Max, 2024. "Data-driven inventory policy: Learning from sequentially observed non-stationary data," Omega, Elsevier, vol. 123(C).
Lotte Hezewijk & Nico P. Dellaert & Willem L. Jaarsveld, 2025. "On non-negative auto-correlated integer demand processes," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 101(2), pages 135-161, April.
Visentin, Andrea & Prestwich, Steven & Rossi, Roberto & Tarim, S. Armagan, 2021. "Computing optimal (R,s,S) policy parameters by a hybrid of branch-and-bound and stochastic dynamic programming," European Journal of Operational Research, Elsevier, vol. 294(1), pages 91-99.
Amar Sapra & Van-Anh Truong & Rachel Q. Zhang, 2010. "How Much Demand Should Be Fulfilled?," Operations Research, INFORMS, vol. 58(3), pages 719-733, June.
Stephen C. Graves & Sean P. Willems, 2008. "Strategic Inventory Placement in Supply Chains: Nonstationary Demand," Manufacturing & Service Operations Management, INFORMS, vol. 10(2), pages 278-287, March.
Gah-Yi Ban, 2020. "Confidence Intervals for Data-Driven Inventory Policies with Demand Censoring," Operations Research, INFORMS, vol. 68(2), pages 309-326, March.
Emilio Carrizosa & Alba V. Olivares-Nadal & Pepa Ramírez-Cobo, 2020. "Embedding the production policy in location-allocation decisions," 4OR, Springer, vol. 18(3), pages 357-380, September.
John J. Neale & Sean P. Willems, 2009. "Managing Inventory in Supply Chains with Nonstationary Demand," Interfaces, INFORMS, vol. 39(5), pages 388-399, October.
Z Hua & J Yang & F Huang & X Xu, 2009. "A static-dynamic strategy for spare part inventory systems with nonstationary stochastic demand," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(9), pages 1254-1263, September.
Kilic, Onur A. & Tarim, S. Armagan, 2024. "A simple heuristic for computing non-stationary inventory policies based on function approximation," European Journal of Operational Research, Elsevier, vol. 316(3), pages 899-905.
Dural-Selcuk, Gozdem & Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2020. "The benefit of receding horizon control: Near-optimal policies for stochastic inventory control," Omega, Elsevier, vol. 97(C).
Li Chen & Hau L. Lee, 2009. "Information Sharing and Order Variability Control Under a Generalized Demand Model," Management Science, INFORMS, vol. 55(5), pages 781-797, May.
Hosoda, Takamichi & Disney, Stephen M., 2009. "Impact of market demand mis-specification on a two-level supply chain," International Journal of Production Economics, Elsevier, vol. 121(2), pages 739-751, October.
Amiri-Aref, Mehdi & Klibi, Walid & Babai, M. Zied, 2018. "The multi-sourcing location inventory problem with stochastic demand," European Journal of Operational Research, Elsevier, vol. 266(1), pages 72-87.
Zhaotong Lian & Liming Liu & Stuart X. Zhu, 2010. "Rolling‐horizon replenishment: Policies and performance analysis," Naval Research Logistics (NRL), John Wiley & Sons, vol. 57(6), pages 489-502, September.
Matthew J. Sobel & Volodymyr Babich, 2012. "Optimality of Myopic Policies for Dynamic Lot-Sizing Problems in Serial Production Lines with Random Yields and Autoregressive Demand," Operations Research, INFORMS, vol. 60(6), pages 1520-1536, December.

More about this item

Keywords

; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:314:y:2024:i:2:p:433-445. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand

Author

Abstract

Suggested Citation

Download full text from publisher

Other versions of this item:

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data