IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v13y2025i14p2229-d1698190.html
   My bibliography  Save this article

Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems

Author

Listed:
  • Lativa Sid Ahmed Abdellahi

    (Department of Mathematics and Computer Science, Faculty of Science and Technology, University of Nouakchott, Nouakchott BP 5026, Mauritania)

  • Zeinebou Zoubeir

    (Department of Mathematics and Industrial Engineering, Institute of Industrial Engineering, University of Nouakchott, Nouakchott BP 5026, Mauritania)

  • Yahya Mohamed

    (Analysis and Modeling for Environment and Health (UMR-AMES), Department of Quantitative Techniques, Faculty of Economics and Management, University of Nouakchott, Nouakchott BP 5026, Mauritania)

  • Ahmedou Haouba

    (Department of Mathematics and Computer Science, Faculty of Science and Technology, University of Nouakchott, Nouakchott BP 5026, Mauritania)

  • Sidi Hmetty

    (Department of Mathematics and Computer Science, Faculty of Science and Technology, University of Nouakchott, Nouakchott BP 5026, Mauritania)

Abstract

This study presents a reinforcement learning–based approach to optimize replenishment policies in the presence of uncertainty, with the objective of minimizing total costs, including inventory holding, shortage, and ordering costs. The focus is on single-level assembly systems, where both component delivery lead times and finished product demand are subject to randomness. The problem is formulated as a Markov decision process (MDP), in which an agent determines optimal order quantities for each component by accounting for stochastic lead times and demand variability. The Deep Q-Network (DQN) algorithm is adapted and employed to learn optimal replenishment policies over a fixed planning horizon. To enhance learning performance, we develop a tailored simulation environment that captures multi-component interactions, random lead times, and variable demand, along with a modular and realistic cost structure. The environment enables dynamic state transitions, lead time sampling, and flexible order reception modeling, providing a high-fidelity training ground for the agent. To further improve convergence and policy quality, we incorporate local search mechanisms and multiple action space discretizations per component. Simulation results show that the proposed method converges to stable ordering policies after approximately 100 episodes. The agent achieves an average service level of 96.93%, and stockout events are reduced by over 100% relative to early training phases. The system maintains component inventories within operationally feasible ranges, and cost components—holding, shortage, and ordering—are consistently minimized across 500 training episodes. These findings highlight the potential of deep reinforcement learning as a data-driven and adaptive approach to inventory management in complex and uncertain supply chains.

Suggested Citation

  • Lativa Sid Ahmed Abdellahi & Zeinebou Zoubeir & Yahya Mohamed & Ahmedou Haouba & Sidi Hmetty, 2025. "Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems," Mathematics, MDPI, vol. 13(14), pages 1-29, July.
  • Handle: RePEc:gam:jmathe:v:13:y:2025:i:14:p:2229-:d:1698190
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/13/14/2229/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/13/14/2229/
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:14:p:2229-:d:1698190. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.