IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v314y2024i2p433-445.html
   My bibliography  Save this article

Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand

Author

Listed:
  • Dehaybe, Henri
  • Catanzaro, Daniele
  • Chevalier, Philippe

Abstract

We consider here a single-item lot sizing problem with fixed costs, lead time, and both backorders and lost sales, and we show that, after an appropriate training in randomly generated environments, Deep Reinforcement Learning (DRL) agents can interpolate in real-time near-optimal dynamic policies on instances with a rolling-horizon, provided a previously unseen demand forecast and without the need to periodically resolve the problem. Extensive computational experiments show that the policies provided by these agents compete, and in some circumstances even outperform by several percentage points of gap, those provided by heuristics based on dynamic programming. These results confirm the importance of DRL in the context of inventory control problems and support its use in solving practical instances featuring realistic assumptions.

Suggested Citation

  • Dehaybe, Henri & Catanzaro, Daniele & Chevalier, Philippe, 2024. "Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand," European Journal of Operational Research, Elsevier, vol. 314(2), pages 433-445.
  • Handle: RePEc:eee:ejores:v:314:y:2024:i:2:p:433-445
    DOI: 10.1016/j.ejor.2023.10.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221723007646
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2023.10.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. De Moor, Bram J. & Gijsbrechts, Joren & Boute, Robert N., 2022. "Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management," European Journal of Operational Research, Elsevier, vol. 301(2), pages 535-545.
    2. Evan L. Porteus, 1971. "On the Optimality of Generalized (s, S) Policies," Management Science, INFORMS, vol. 17(7), pages 411-426, March.
    3. Donald L. Iglehart, 1963. "Optimality of (s, S) Policies in the Infinite Horizon Dynamic Inventory Problem," Management Science, INFORMS, vol. 9(2), pages 259-267, January.
    4. Andrew J. Clark & Herbert Scarf, 2004. "Optimal Policies for a Multi-Echelon Inventory Problem," Management Science, INFORMS, vol. 50(12_supple), pages 1782-1790, December.
    5. James H. Bookbinder & Jin-Yan Tan, 1988. "Strategies for the Probabilistic Lot-Sizing Problem with Service-Level Constraints," Management Science, INFORMS, vol. 34(9), pages 1096-1108, September.
    6. Srinivas Bollapragada & Thomas E. Morton, 1999. "A Simple Heuristic for Computing Nonstationary (s, S) Policies," Operations Research, INFORMS, vol. 47(4), pages 576-584, August.
    7. Amirhosein Norouzi & Reha Uzsoy, 2014. "Modeling the evolution of dependency between demands, with application to inventory planning," IISE Transactions, Taylor & Francis Journals, vol. 46(1), pages 55-66.
    8. Lingxiu Dong & Hau L. Lee, 2003. "Optimal Policies and Approximations for a Serial Multiechelon Inventory System with Time-Correlated Demand," Operations Research, INFORMS, vol. 51(6), pages 969-980, December.
    9. Dural-Selcuk, Gozdem & Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2020. "The benefit of receding horizon control: Near-optimal policies for stochastic inventory control," Omega, Elsevier, vol. 97(C).
    10. Steven Nahmias, 1979. "Simple Approximations for a Variety of Dynamic Leadtime Lost-Sales Inventory Models," Operations Research, INFORMS, vol. 27(5), pages 904-924, October.
    11. Stephen C. Graves, 1999. "A Single-Item Inventory Model for a Nonstationary Demand Process," Manufacturing & Service Operations Management, INFORMS, vol. 1(1), pages 50-61.
    12. Stephen C. Graves, 1999. "Addendum to "A Single-Item Inventory Model for a Nonstationary Demand Process"," Manufacturing & Service Operations Management, INFORMS, vol. 1(2), pages 174-174.
    13. Tetsuo Iida & Paul H. Zipkin, 2006. "Approximate Solutions of a Dynamic Forecast-Inventory Model," Manufacturing & Service Operations Management, INFORMS, vol. 8(4), pages 407-425, October.
    14. Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2018. "Computing non-stationary (s, S) policies using mixed integer linear programming," European Journal of Operational Research, Elsevier, vol. 271(2), pages 490-500.
    15. Hill, Roger M. & Johansen, Soren Glud, 2006. "Optimal and near-optimal policies for lost sales inventory models with at most one replenishment order outstanding," European Journal of Operational Research, Elsevier, vol. 169(1), pages 111-132, February.
    16. Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2015. "Piecewise linear approximations for the static–dynamic uncertainty strategy in stochastic lot-sizing," Omega, Elsevier, vol. 50(C), pages 126-140.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Abada, Ibrahim & Lambin, Xavier & Tchakarov, Nikolay, 2024. "Collusion by mistake: Does algorithmic sophistication drive supra-competitive profits?," European Journal of Operational Research, Elsevier, vol. 318(3), pages 927-953.
    2. Bo Zhang & Wen Jun Tan & Wentong Cai & Allan N. Zhang, 2024. "Leveraging Multi-Agent Reinforcement Learning for Digital Transformation in Supply Chain Inventory Optimization," Sustainability, MDPI, vol. 16(22), pages 1-17, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2023. "A mathematical programming-based solution method for the nonstationary inventory problem under correlated demand," European Journal of Operational Research, Elsevier, vol. 304(2), pages 515-524.
    2. Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2018. "Computing non-stationary (s, S) policies using mixed integer linear programming," European Journal of Operational Research, Elsevier, vol. 271(2), pages 490-500.
    3. Chen, Zhen & Rossi, Roberto, 2021. "A dynamic ordering policy for a stochastic inventory problem with cash constraints," Omega, Elsevier, vol. 102(C).
    4. Ma, Xiyuan & Rossi, Roberto & Archibald, Thomas Welsh, 2022. "Approximations for non-stationary stochastic lot-sizing under (s,Q)-type policy," European Journal of Operational Research, Elsevier, vol. 298(2), pages 573-584.
    5. Ren, Ke & Bidkhori, Hoda & Shen, Zuo-Jun Max, 2024. "Data-driven inventory policy: Learning from sequentially observed non-stationary data," Omega, Elsevier, vol. 123(C).
    6. Visentin, Andrea & Prestwich, Steven & Rossi, Roberto & Tarim, S. Armagan, 2021. "Computing optimal (R,s,S) policy parameters by a hybrid of branch-and-bound and stochastic dynamic programming," European Journal of Operational Research, Elsevier, vol. 294(1), pages 91-99.
    7. Wang, Zhaodong & Wang, Xin & Ouyang, Yanfeng, 2015. "Bounded growth of the bullwhip effect under a class of nonlinear ordering policies," European Journal of Operational Research, Elsevier, vol. 247(1), pages 72-82.
    8. Alexandre Forel & Martin Grunow, 2023. "Dynamic stochastic lot sizing with forecast evolution in rolling‐horizon planning," Production and Operations Management, Production and Operations Management Society, vol. 32(2), pages 449-468, February.
    9. Van-Anh Truong, 2014. "Approximation Algorithm for the Stochastic Multiperiod Inventory Problem via a Look-Ahead Optimization Approach," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1039-1056, November.
    10. Gérard P. Cachon & Marshall Fisher, 2000. "Supply Chain Inventory Management and the Value of Shared Information," Management Science, INFORMS, vol. 46(8), pages 1032-1048, August.
    11. Amar Sapra & Van-Anh Truong & Rachel Q. Zhang, 2010. "How Much Demand Should Be Fulfilled?," Operations Research, INFORMS, vol. 58(3), pages 719-733, June.
    12. Tarim, S. Armagan & Smith, Barbara M., 2008. "Constraint programming for computing non-stationary (R, S) inventory policies," European Journal of Operational Research, Elsevier, vol. 189(3), pages 1004-1021, September.
    13. Stephen C. Graves & Sean P. Willems, 2008. "Strategic Inventory Placement in Supply Chains: Nonstationary Demand," Manufacturing & Service Operations Management, INFORMS, vol. 10(2), pages 278-287, March.
    14. Gah-Yi Ban, 2020. "Confidence Intervals for Data-Driven Inventory Policies with Demand Censoring," Operations Research, INFORMS, vol. 68(2), pages 309-326, March.
    15. Emilio Carrizosa & Alba V. Olivares-Nadal & Pepa Ramírez-Cobo, 2020. "Embedding the production policy in location-allocation decisions," 4OR, Springer, vol. 18(3), pages 357-380, September.
    16. Rachel Croson & Karen Donohue, 2006. "Behavioral Causes of the Bullwhip Effect and the Observed Value of Inventory Information," Management Science, INFORMS, vol. 52(3), pages 323-336, March.
    17. John J. Neale & Sean P. Willems, 2009. "Managing Inventory in Supply Chains with Nonstationary Demand," Interfaces, INFORMS, vol. 39(5), pages 388-399, October.
    18. Z Hua & J Yang & F Huang & X Xu, 2009. "A static-dynamic strategy for spare part inventory systems with nonstationary stochastic demand," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(9), pages 1254-1263, September.
    19. Kilic, Onur A. & Tarim, S. Armagan, 2024. "A simple heuristic for computing non-stationary inventory policies based on function approximation," European Journal of Operational Research, Elsevier, vol. 316(3), pages 899-905.
    20. Dural-Selcuk, Gozdem & Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2020. "The benefit of receding horizon control: Near-optimal policies for stochastic inventory control," Omega, Elsevier, vol. 97(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:314:y:2024:i:2:p:433-445. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.