IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v301y2022i2p535-545.html
   My bibliography  Save this article

Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management

Author

Listed:
  • De Moor, Bram J.
  • Gijsbrechts, Joren
  • Boute, Robert N.

Abstract

Deep reinforcement learning (DRL) has proven to be an effective, general-purpose technology to develop ‘good’ replenishment policies in inventory management. We show how transfer learning from existing, well-performing heuristics may stabilize the training process and improve the performance of DRL in inventory control. While the idea is general, we specifically implement potential-based reward shaping to a deep Q-network algorithm to manage inventory of perishable goods that, cursed by dimensionality, has proven to be notoriously complex. The application of our approach may not only improve inventory cost performance and reduce computational effort, the increased training stability may also help to gain trust in the policies obtained by black box DRL algorithms.

Suggested Citation

  • De Moor, Bram J. & Gijsbrechts, Joren & Boute, Robert N., 2022. "Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management," European Journal of Operational Research, Elsevier, vol. 301(2), pages 535-545.
  • Handle: RePEc:eee:ejores:v:301:y:2022:i:2:p:535-545
    DOI: 10.1016/j.ejor.2021.10.045
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221721008948
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2021.10.045?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. William P. Pierskalla & Chris D. Roach, 1972. "Optimal Issuing Policies for Perishable Inventory," Management Science, INFORMS, vol. 18(11), pages 603-614, July.
    2. Morris A. Cohen, 1976. "Analysis of Single Critical Number Ordering Policies for Perishable Inventories," Operations Research, INFORMS, vol. 24(4), pages 726-741, August.
    3. Duan, Qinglin & Liao, T. Warren, 2013. "A new age-based replenishment policy for supply chain inventory optimization of highly perishable products," International Journal of Production Economics, Elsevier, vol. 145(2), pages 658-671.
    4. Haijema, René, 2013. "A new class of stock-level dependent ordering policies for perishables with a short maximum shelf life," International Journal of Production Economics, Elsevier, vol. 143(2), pages 434-439.
    5. Williams, Craig L. & Patuwo, B. Eddy, 1999. "A perishable inventory model with positive order lead times," European Journal of Operational Research, Elsevier, vol. 116(2), pages 352-373, July.
    6. Haijema, René & Minner, Stefan, 2019. "Improved ordering of perishables: The value of stock-age information," International Journal of Production Economics, Elsevier, vol. 209(C), pages 316-324.
    7. Brant E. Fries, 1975. "Optimal Ordering Policy for a Perishable Commodity with Fixed Lifetime," Operations Research, INFORMS, vol. 23(1), pages 46-61, February.
    8. Richard Bellman, 1957. "On a Dynamic Programming Approach to the Caterer Problem--I," Management Science, INFORMS, vol. 3(3), pages 270-278, April.
    9. Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
    10. Sam Devlin & Daniel Kudenko & Marek Grześ, 2011. "An Empirical Study Of Potential-Based Reward Shaping And Advice In Complex, Multi-Agent Systems," Advances in Complex Systems (ACS), World Scientific Publishing Co. Pte. Ltd., vol. 14(02), pages 251-278.
    11. Steven Nahmias, 2011. "Perishable Inventory Systems," International Series in Operations Research and Management Science, Springer, edition 1, number 978-1-4419-7999-5, December.
    12. Dan Chazan & Shmuel Gal, 1977. "A Markovian Model for a Perishable Product Inventory," Management Science, INFORMS, vol. 23(5), pages 512-521, January.
    13. Steven Nahmias, 1975. "Optimal Ordering Policies for Perishable Inventory—II," Operations Research, INFORMS, vol. 23(4), pages 735-749, August.
    14. Morris A. Cohen & Dov Pekelman, 1978. "LIFO Inventory Systems," Management Science, INFORMS, vol. 24(11), pages 1150-1162, July.
    15. Julian Schrittwieser & Ioannis Antonoglou & Thomas Hubert & Karen Simonyan & Laurent Sifre & Simon Schmitt & Arthur Guez & Edward Lockhart & Demis Hassabis & Thore Graepel & Timothy Lillicrap & David , 2020. "Mastering Atari, Go, chess and shogi by planning with a learned model," Nature, Nature, vol. 588(7839), pages 604-609, December.
    16. David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
    17. David Silver & Julian Schrittwieser & Karen Simonyan & Ioannis Antonoglou & Aja Huang & Arthur Guez & Thomas Hubert & Lucas Baker & Matthew Lai & Adrian Bolton & Yutian Chen & Timothy Lillicrap & Fan , 2017. "Mastering the game of Go without human knowledge," Nature, Nature, vol. 550(7676), pages 354-359, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Koen W. de Bock & Kristof Coussement & Arno De Caigny & Roman Slowiński & Bart Baesens & Robert N Boute & Tsan-Ming Choi & Dursun Delen & Mathias Kraus & Stefan Lessmann & Sebastián Maldonado & David , 2023. "Explainable AI for Operational Research: A Defining Framework, Methods, Applications, and a Research Agenda," Post-Print hal-04219546, HAL.
    2. Park, Hyungjun & Choi, Dong Gu & Min, Daiki, 2023. "Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy structure," International Journal of Production Economics, Elsevier, vol. 266(C).
    3. Yen, Benjamin P.-C. & Luo, Yu, 2023. "Navigational guidance – A deep learning approach," European Journal of Operational Research, Elsevier, vol. 310(3), pages 1179-1191.
    4. Erkip, Nesim Kohen, 2023. "Can accessing much data reshape the theory? Inventory theory under the challenge of data-driven systems," European Journal of Operational Research, Elsevier, vol. 308(3), pages 949-959.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Liming Liu & Zhaotong Lian, 1999. "(s, S) Continuous Review Models for Products with Fixed Lifetimes," Operations Research, INFORMS, vol. 47(1), pages 150-158, February.
    2. Xiuli Chao & Xiting Gong & Cong Shi & Chaolin Yang & Huanan Zhang & Sean X. Zhou, 2018. "Approximation Algorithms for Capacitated Perishable Inventory Systems with Positive Lead Times," Management Science, INFORMS, vol. 64(11), pages 5038-5061, November.
    3. Haijema, René & Minner, Stefan, 2019. "Improved ordering of perishables: The value of stock-age information," International Journal of Production Economics, Elsevier, vol. 209(C), pages 316-324.
    4. Duan, Qinglin & Liao, T. Warren, 2014. "Optimization of blood supply chain with shortened shelf lives and ABO compatibility," International Journal of Production Economics, Elsevier, vol. 153(C), pages 113-129.
    5. Gorria, Carlos & Lezaun, Mikel & López, F. Javier, 2022. "Performance measures of nonstationary inventory models for perishable products under the EWA policy," European Journal of Operational Research, Elsevier, vol. 303(3), pages 1137-1150.
    6. Jake Clarkson & Michael A. Voelkel & Anna‐Lena Sachs & Ulrich W. Thonemann, 2023. "The periodic review model with independent age‐dependent lifetimes," Production and Operations Management, Production and Operations Management Society, vol. 32(3), pages 813-828, March.
    7. Minner, Stefan & Transchel, Sandra, 2017. "Order variability in perishable product supply chains," European Journal of Operational Research, Elsevier, vol. 260(1), pages 93-107.
    8. Kouki, Chaaben & Jouini, Oualid, 2015. "On the effect of lifetime variability on the performance of inventory systems," International Journal of Production Economics, Elsevier, vol. 167(C), pages 23-34.
    9. Qing Li & Peiwen Yu & Xiaoli Wu, 2016. "Managing Perishable Inventories in Retailing: Replenishment, Clearance Sales, and Segregation," Operations Research, INFORMS, vol. 64(6), pages 1270-1284, December.
    10. Hansen, Ole & Transchel, Sandra & Friedrich, Hanno, 2023. "Replenishment strategies for lost sales inventory systems of perishables under demand and lead time uncertainty," European Journal of Operational Research, Elsevier, vol. 308(2), pages 661-675.
    11. Janssen, Larissa & Diabat, Ali & Sauer, Jürgen & Herrmann, Frank, 2018. "A stochastic micro-periodic age-based inventory replenishment policy for perishable goods," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 118(C), pages 445-465.
    12. Jinzhi Bu & Xiting Gong & Xiuli Chao, 2023. "Asymptotic Optimality of Base-Stock Policies for Perishable Inventory Systems," Management Science, INFORMS, vol. 69(2), pages 846-864, February.
    13. Hailun Zhang & Jiheng Zhang & Rachel Q. Zhang, 2020. "Simple Policies with Provable Bounds for Managing Perishable Inventory," Production and Operations Management, Production and Operations Management Society, vol. 29(11), pages 2637-2650, November.
    14. Li, Wenqing & Ni, Shaoquan, 2022. "Train timetabling with the general learning environment and multi-agent deep reinforcement learning," Transportation Research Part B: Methodological, Elsevier, vol. 157(C), pages 230-251.
    15. Lowalekar, Harshal & Ravi, R. Raghavendra, 2017. "Revolutionizing blood bank inventory management using the TOC thinking process: An Indian case study," International Journal of Production Economics, Elsevier, vol. 186(C), pages 89-122.
    16. Ketzenberg, Michael & Gaukler, Gary & Salin, Victoria, 2018. "Expiration dates and order quantities for perishables," European Journal of Operational Research, Elsevier, vol. 266(2), pages 569-584.
    17. Haijema, René & Minner, Stefan, 2016. "Stock-level dependent ordering of perishables: A comparison of hybrid base-stock and constant order policies," International Journal of Production Economics, Elsevier, vol. 181(PA), pages 215-225.
    18. Osorio, Andres F. & Brailsford, Sally C. & Smith, Honora K., 2018. "Whole blood or apheresis donations? A multi-objective stochastic optimization approach," European Journal of Operational Research, Elsevier, vol. 266(1), pages 193-204.
    19. Shouchang Chen & Yanzhi Li & Yi Yang & Weihua Zhou, 2021. "Managing Perishable Inventory Systems with Age‐differentiated Demand," Production and Operations Management, Production and Operations Management Society, vol. 30(10), pages 3784-3799, October.
    20. Christopher R. Madan, 2020. "Considerations for Comparing Video Game AI Agents with Humans," Challenges, MDPI, vol. 11(2), pages 1-12, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:301:y:2022:i:2:p:535-545. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.