IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v314y2024i2p433-445.html
   My bibliography  Save this article

Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand

Author

Listed:
  • Dehaybe, Henri
  • Catanzaro, Daniele
  • Chevalier, Philippe

Abstract

We consider here a single-item lot sizing problem with fixed costs, lead time, and both backorders and lost sales, and we show that, after an appropriate training in randomly generated environments, Deep Reinforcement Learning (DRL) agents can interpolate in real-time near-optimal dynamic policies on instances with a rolling-horizon, provided a previously unseen demand forecast and without the need to periodically resolve the problem. Extensive computational experiments show that the policies provided by these agents compete, and in some circumstances even outperform by several percentage points of gap, those provided by heuristics based on dynamic programming. These results confirm the importance of DRL in the context of inventory control problems and support its use in solving practical instances featuring realistic assumptions.

Suggested Citation

  • Dehaybe, Henri & Catanzaro, Daniele & Chevalier, Philippe, 2024. "Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand," European Journal of Operational Research, Elsevier, vol. 314(2), pages 433-445.
  • Handle: RePEc:eee:ejores:v:314:y:2024:i:2:p:433-445
    DOI: 10.1016/j.ejor.2023.10.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221723007646
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2023.10.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. De Moor, Bram J. & Gijsbrechts, Joren & Boute, Robert N., 2022. "Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management," European Journal of Operational Research, Elsevier, vol. 301(2), pages 535-545.
    2. Andrew J. Clark & Herbert Scarf, 2004. "Optimal Policies for a Multi-Echelon Inventory Problem," Management Science, INFORMS, vol. 50(12_supple), pages 1782-1790, December.
    3. Evan L. Porteus, 1971. "On the Optimality of Generalized (s, S) Policies," Management Science, INFORMS, vol. 17(7), pages 411-426, March.
    4. Dural-Selcuk, Gozdem & Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2020. "The benefit of receding horizon control: Near-optimal policies for stochastic inventory control," Omega, Elsevier, vol. 97(C).
    5. Steven Nahmias, 1979. "Simple Approximations for a Variety of Dynamic Leadtime Lost-Sales Inventory Models," Operations Research, INFORMS, vol. 27(5), pages 904-924, October.
    6. Stephen C. Graves, 1999. "A Single-Item Inventory Model for a Nonstationary Demand Process," Manufacturing & Service Operations Management, INFORMS, vol. 1(1), pages 50-61.
    7. Donald L. Iglehart, 1963. "Optimality of (s, S) Policies in the Infinite Horizon Dynamic Inventory Problem," Management Science, INFORMS, vol. 9(2), pages 259-267, January.
    8. James H. Bookbinder & Jin-Yan Tan, 1988. "Strategies for the Probabilistic Lot-Sizing Problem with Service-Level Constraints," Management Science, INFORMS, vol. 34(9), pages 1096-1108, September.
    9. Srinivas Bollapragada & Thomas E. Morton, 1999. "A Simple Heuristic for Computing Nonstationary (s, S) Policies," Operations Research, INFORMS, vol. 47(4), pages 576-584, August.
    10. Stephen C. Graves, 1999. "Addendum to "A Single-Item Inventory Model for a Nonstationary Demand Process"," Manufacturing & Service Operations Management, INFORMS, vol. 1(2), pages 174-174.
    11. Tetsuo Iida & Paul H. Zipkin, 2006. "Approximate Solutions of a Dynamic Forecast-Inventory Model," Manufacturing & Service Operations Management, INFORMS, vol. 8(4), pages 407-425, October.
    12. Amirhosein Norouzi & Reha Uzsoy, 2014. "Modeling the evolution of dependency between demands, with application to inventory planning," IISE Transactions, Taylor & Francis Journals, vol. 46(1), pages 55-66.
    13. Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2018. "Computing non-stationary (s, S) policies using mixed integer linear programming," European Journal of Operational Research, Elsevier, vol. 271(2), pages 490-500.
    14. Hill, Roger M. & Johansen, Soren Glud, 2006. "Optimal and near-optimal policies for lost sales inventory models with at most one replenishment order outstanding," European Journal of Operational Research, Elsevier, vol. 169(1), pages 111-132, February.
    15. Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2015. "Piecewise linear approximations for the static–dynamic uncertainty strategy in stochastic lot-sizing," Omega, Elsevier, vol. 50(C), pages 126-140.
    16. Lingxiu Dong & Hau L. Lee, 2003. "Optimal Policies and Approximations for a Serial Multiechelon Inventory System with Time-Correlated Demand," Operations Research, INFORMS, vol. 51(6), pages 969-980, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2023. "A mathematical programming-based solution method for the nonstationary inventory problem under correlated demand," European Journal of Operational Research, Elsevier, vol. 304(2), pages 515-524.
    2. Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2018. "Computing non-stationary (s, S) policies using mixed integer linear programming," European Journal of Operational Research, Elsevier, vol. 271(2), pages 490-500.
    3. Chen, Zhen & Rossi, Roberto, 2021. "A dynamic ordering policy for a stochastic inventory problem with cash constraints," Omega, Elsevier, vol. 102(C).
    4. Ma, Xiyuan & Rossi, Roberto & Archibald, Thomas Welsh, 2022. "Approximations for non-stationary stochastic lot-sizing under (s,Q)-type policy," European Journal of Operational Research, Elsevier, vol. 298(2), pages 573-584.
    5. Ren, Ke & Bidkhori, Hoda & Shen, Zuo-Jun Max, 2024. "Data-driven inventory policy: Learning from sequentially observed non-stationary data," Omega, Elsevier, vol. 123(C).
    6. Visentin, Andrea & Prestwich, Steven & Rossi, Roberto & Tarim, S. Armagan, 2021. "Computing optimal (R,s,S) policy parameters by a hybrid of branch-and-bound and stochastic dynamic programming," European Journal of Operational Research, Elsevier, vol. 294(1), pages 91-99.
    7. Amar Sapra & Van-Anh Truong & Rachel Q. Zhang, 2010. "How Much Demand Should Be Fulfilled?," Operations Research, INFORMS, vol. 58(3), pages 719-733, June.
    8. Stephen C. Graves & Sean P. Willems, 2008. "Strategic Inventory Placement in Supply Chains: Nonstationary Demand," Manufacturing & Service Operations Management, INFORMS, vol. 10(2), pages 278-287, March.
    9. Gah-Yi Ban, 2020. "Confidence Intervals for Data-Driven Inventory Policies with Demand Censoring," Operations Research, INFORMS, vol. 68(2), pages 309-326, March.
    10. Emilio Carrizosa & Alba V. Olivares-Nadal & Pepa Ramírez-Cobo, 2020. "Embedding the production policy in location-allocation decisions," 4OR, Springer, vol. 18(3), pages 357-380, September.
    11. John J. Neale & Sean P. Willems, 2009. "Managing Inventory in Supply Chains with Nonstationary Demand," Interfaces, INFORMS, vol. 39(5), pages 388-399, October.
    12. Z Hua & J Yang & F Huang & X Xu, 2009. "A static-dynamic strategy for spare part inventory systems with nonstationary stochastic demand," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(9), pages 1254-1263, September.
    13. Kilic, Onur A. & Tarim, S. Armagan, 2024. "A simple heuristic for computing non-stationary inventory policies based on function approximation," European Journal of Operational Research, Elsevier, vol. 316(3), pages 899-905.
    14. Dural-Selcuk, Gozdem & Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2020. "The benefit of receding horizon control: Near-optimal policies for stochastic inventory control," Omega, Elsevier, vol. 97(C).
    15. Li Chen & Hau L. Lee, 2009. "Information Sharing and Order Variability Control Under a Generalized Demand Model," Management Science, INFORMS, vol. 55(5), pages 781-797, May.
    16. Hosoda, Takamichi & Disney, Stephen M., 2009. "Impact of market demand mis-specification on a two-level supply chain," International Journal of Production Economics, Elsevier, vol. 121(2), pages 739-751, October.
    17. Amiri-Aref, Mehdi & Klibi, Walid & Babai, M. Zied, 2018. "The multi-sourcing location inventory problem with stochastic demand," European Journal of Operational Research, Elsevier, vol. 266(1), pages 72-87.
    18. Zhaotong Lian & Liming Liu & Stuart X. Zhu, 2010. "Rolling‐horizon replenishment: Policies and performance analysis," Naval Research Logistics (NRL), John Wiley & Sons, vol. 57(6), pages 489-502, September.
    19. Matthew J. Sobel & Volodymyr Babich, 2012. "Optimality of Myopic Policies for Dynamic Lot-Sizing Problems in Serial Production Lines with Random Yields and Autoregressive Demand," Operations Research, INFORMS, vol. 60(6), pages 1520-1536, December.
    20. Iida, Tetsuo, 2015. "Benefits of leadtime information and of its combination with demand forecast information," International Journal of Production Economics, Elsevier, vol. 163(C), pages 146-156.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:314:y:2024:i:2:p:433-445. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.