Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems

My bibliography Save this article

Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems

Author

Listed:

Lativa Sid Ahmed Abdellahi
(Department of Mathematics and Computer Science, Faculty of Science and Technology, University of Nouakchott, Nouakchott BP 5026, Mauritania)
Zeinebou Zoubeir
(Department of Mathematics and Industrial Engineering, Institute of Industrial Engineering, University of Nouakchott, Nouakchott BP 5026, Mauritania)
Yahya Mohamed
(Analysis and Modeling for Environment and Health (UMR-AMES), Department of Quantitative Techniques, Faculty of Economics and Management, University of Nouakchott, Nouakchott BP 5026, Mauritania)
Ahmedou Haouba
(Department of Mathematics and Computer Science, Faculty of Science and Technology, University of Nouakchott, Nouakchott BP 5026, Mauritania)
Sidi Hmetty
(Department of Mathematics and Computer Science, Faculty of Science and Technology, University of Nouakchott, Nouakchott BP 5026, Mauritania)

Registered:

Abstract

This study presents a reinforcement learning–based approach to optimize replenishment policies in the presence of uncertainty, with the objective of minimizing total costs, including inventory holding, shortage, and ordering costs. The focus is on single-level assembly systems, where both component delivery lead times and finished product demand are subject to randomness. The problem is formulated as a Markov decision process (MDP), in which an agent determines optimal order quantities for each component by accounting for stochastic lead times and demand variability. The Deep Q-Network (DQN) algorithm is adapted and employed to learn optimal replenishment policies over a fixed planning horizon. To enhance learning performance, we develop a tailored simulation environment that captures multi-component interactions, random lead times, and variable demand, along with a modular and realistic cost structure. The environment enables dynamic state transitions, lead time sampling, and flexible order reception modeling, providing a high-fidelity training ground for the agent. To further improve convergence and policy quality, we incorporate local search mechanisms and multiple action space discretizations per component. Simulation results show that the proposed method converges to stable ordering policies after approximately 100 episodes. The agent achieves an average service level of 96.93%, and stockout events are reduced by over 100% relative to early training phases. The system maintains component inventories within operationally feasible ranges, and cost components—holding, shortage, and ordering—are consistently minimized across 500 training episodes. These findings highlight the potential of deep reinforcement learning as a data-driven and adaptive approach to inventory management in complex and uncertain supply chains.

Suggested Citation

Lativa Sid Ahmed Abdellahi & Zeinebou Zoubeir & Yahya Mohamed & Ahmedou Haouba & Sidi Hmetty, 2025. "Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems," Mathematics, MDPI, vol. 13(14), pages 1-29, July.

Handle: RePEc:gam:jmathe:v:13:y:2025:i:14:p:2229-:d:1698190

Download full text from publisher

References listed on IDEAS

Komeyl Baghizadeh & Nafiseh Ebadi & Dominik Zimon & Luay Jum’a, 2022. "Using Four Metaheuristic Algorithms to Reduce Supplier Disruption Risk in a Mathematical Inventory Model for Supplying Spare Parts," Mathematics, MDPI, vol. 11(1), pages 1-19, December.
Nima Hamta & M. Akbarpour Shirazi & S.M.T. Fatemi Ghomi & Sara Behdad, 2015. "Supply chain network optimization considering assembly line balancing and demand uncertainty," International Journal of Production Research, Taylor & Francis Journals, vol. 53(10), pages 2970-2994, May.
Wenting Pan & Kut C. So, 2016. "Component procurement strategies in decentralized assembly systems under supply uncertainty," IISE Transactions, Taylor & Francis Journals, vol. 48(3), pages 267-282, March.
Chauhan, Satyaveer S. & Dolgui, Alexandre & Proth, Jean-Marie, 2009. "A continuous model for supply planning of assembly systems with stochastic component procurement times," International Journal of Production Economics, Elsevier, vol. 120(2), pages 411-417, August.
Shapiro, Alexander, 2011. "Analysis of stochastic dual dynamic programming method," European Journal of Operational Research, Elsevier, vol. 209(1), pages 63-72, February.
Louly, Mohamed-Aly & Dolgui, Alexandre & Hnaien, Faicel, 2008. "Supply planning for single-level assembly system with stochastic component delivery times and service-level constraint," International Journal of Production Economics, Elsevier, vol. 115(1), pages 236-247, September.
Ould-Louly, Mohamed-Aly & Dolgui, Alexandre, 2004. "The MPS parameterization under lead time uncertainty," International Journal of Production Economics, Elsevier, vol. 90(3), pages 369-376, August.
Hill, Craig A. & Zhang, G. Peter & Miller, Keith E., 2018. "Collaborative planning, forecasting, and replenishment & firm performance: An empirical evaluation," International Journal of Production Economics, Elsevier, vol. 196(C), pages 12-23.
Ana Esteso & David Peidro & Josefa Mula & Manuel Díaz-Madroñero, 2023. "Reinforcement learning applied to production planning and control," International Journal of Production Research, Taylor & Francis Journals, vol. 61(16), pages 5772-5789, August.
Dolgui, Alexandre & Ould-Louly, Mohamed-Aly, 2002. "A model for supply planning under lead time uncertainty," International Journal of Production Economics, Elsevier, vol. 78(2), pages 145-152, July.
Wang, Yunzeng & Hu, Xiangpei, 2016. "Optimal production planning for assembly systems with uncertain capacities and random demandAuthor-Name: Ji, Qingkai," European Journal of Operational Research, Elsevier, vol. 253(2), pages 383-391.
Juan Pablo Usuga Cadavid & Samir Lamouri & Bernard Grabot & Robert Pellerin & Arnaud Fortin, 2020. "Machine learning applied in production planning and control: a state-of-the-art in the era of industry 4.0," Journal of Intelligent Manufacturing, Springer, vol. 31(6), pages 1531-1558, August.
Powell, Warren B., 2019. "A unified framework for stochastic optimization," European Journal of Operational Research, Elsevier, vol. 275(3), pages 795-821.
Benjamin Rolf & Ilya Jackson & Marcel Müller & Sebastian Lang & Tobias Reggelin & Dmitry Ivanov, 2023. "A review on reinforcement learning algorithms and applications in supply chain management," International Journal of Production Research, Taylor & Francis Journals, vol. 61(20), pages 7151-7179, October.
Louly, Mohamed-Aly & Dolgui, Alexandre, 2013. "Optimal MRP parameters for a single item inventory with random replenishment lead time, POQ policy and service level constraint," International Journal of Production Economics, Elsevier, vol. 143(1), pages 35-40.
Tang, Ou & Grubbstrom, Robert W., 2003. "The detailed coordination problem in a two-level assembly system with stochastic lead times," International Journal of Production Economics, Elsevier, vol. 81(1), pages 415-429, January.
Retsef Levi & Robin O. Roundy & David B. Shmoys & Van Anh Truong, 2008. "Approximation Algorithms for Capacitated Stochastic Inventory Control Models," Operations Research, INFORMS, vol. 56(5), pages 1184-1199, October.
Mario Di Nardo & Mariano Clericuzio & Teresa Murino & Chiara Sepe, 2020. "An Economic Order Quantity Stochastic Dynamic Optimization Model in a Logistic 4.0 Environment," Sustainability, MDPI, vol. 12(10), pages 1-25, May.
Louly, Mohamed-Aly & Dolgui, Alexandre, 2011. "Optimal time phasing and periodicity for MRP with POQ policy," International Journal of Production Economics, Elsevier, vol. 131(1), pages 76-86, May.
Zhang, Guoquan & Shang, Jennifer & Li, Wenli, 2011. "Collaborative production planning of supply chain under price and demand uncertainty," European Journal of Operational Research, Elsevier, vol. 215(3), pages 590-603, December.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Louly, Mohamed-Aly & Dolgui, Alexandre, 2011. "Optimal time phasing and periodicity for MRP with POQ policy," International Journal of Production Economics, Elsevier, vol. 131(1), pages 76-86, May.
Manuel Díaz-Madroñero & Josefa Mula & Mariano Jiménez & David Peidro, 2017. "A rolling horizon approach for material requirement planning under fuzzy lead times," International Journal of Production Research, Taylor & Francis Journals, vol. 55(8), pages 2197-2211, April.
Louly, Mohamed-Aly & Dolgui, Alexandre, 2012. "A note on analytic calculation of planned lead times for assembly systems under POQ policy and service level constraint," International Journal of Production Economics, Elsevier, vol. 140(2), pages 778-781.
Ben-Ammar, Oussama & Bettayeb, Belgacem & Dolgui, Alexandre, 2019. "Optimization of multi-period supply planning under stochastic lead times and a dynamic demand," International Journal of Production Economics, Elsevier, vol. 218(C), pages 106-117.
Milne, R. John & Mahapatra, Santosh & Wang, Chi-Tai, 2015. "Optimizing planned lead times for enhancing performance of MRP systems," International Journal of Production Economics, Elsevier, vol. 167(C), pages 220-231.
Chauhan, Satyaveer S. & Dolgui, Alexandre & Proth, Jean-Marie, 2009. "A continuous model for supply planning of assembly systems with stochastic component procurement times," International Journal of Production Economics, Elsevier, vol. 120(2), pages 411-417, August.
Ben-Ammar, Oussama & Dolgui, Alexandre & Wu, Desheng Dash, 2018. "Planned lead times optimization for multi-level assembly systems under uncertainties," Omega, Elsevier, vol. 78(C), pages 39-56.
Faicel Hnaien & Alexandre Dolgui & Desheng Dash Wu, 2016. "Single-period inventory model for one-level assembly system with stochastic lead times and demand," International Journal of Production Research, Taylor & Francis Journals, vol. 54(1), pages 186-203, January.
Louly, Mohamed-Aly & Dolgui, Alexandre, 2013. "Optimal MRP parameters for a single item inventory with random replenishment lead time, POQ policy and service level constraint," International Journal of Production Economics, Elsevier, vol. 143(1), pages 35-40.
Ivanov, Dmitry & Pavlov, Alexander & Dolgui, Alexandre & Pavlov, Dmitry & Sokolov, Boris, 2016. "Disruption-driven supply chain (re)-planning and performance impact assessment with consideration of pro-active and recovery policies," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 90(C), pages 7-24.
Lin, James T. & Chen, Tzu-Li & Lin, Yen-Ting, 2009. "Critical material planning for TFT-LCD production industry," International Journal of Production Economics, Elsevier, vol. 122(2), pages 639-655, December.
Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
- Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
Lee, Jinkyu & Bae, Sanghyeon & Kim, Woo Chang & Lee, Yongjae, 2023. "Value function gradient learning for large-scale multistage stochastic programming problems," European Journal of Operational Research, Elsevier, vol. 308(1), pages 321-335.
Borodin, Valeria & Dolgui, Alexandre & Hnaien, Faicel & Labadie, Nacima, 2016. "Component replenishment planning for a single-level assembly system under random lead times: A chance constrained programming approach," International Journal of Production Economics, Elsevier, vol. 181(PA), pages 79-86.
Li, Kunpeng & Liu, Tengbo & Ram Kumar, P.N. & Han, Xuefang, 2024. "A reinforcement learning-based hyper-heuristic for AGV task assignment and route planning in parts-to-picker warehouses," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 185(C).
Rappold, James A. & Yoho, Keenan D., 2014. "Setting safety stocks for stable rotation cycle schedules," International Journal of Production Economics, Elsevier, vol. 156(C), pages 146-158.
Hnaien, Faicel & Afsar, Hasan Murat, 2017. "Robust single-item lot-sizing problems with discrete-scenario lead time," International Journal of Production Economics, Elsevier, vol. 185(C), pages 223-229.
Louly, Mohamed-Aly & Dolgui, Alexandre & Hnaien, Faicel, 2008. "Supply planning for single-level assembly system with stochastic component delivery times and service-level constraint," International Journal of Production Economics, Elsevier, vol. 115(1), pages 236-247, September.
Lunhao Ju & Jiang Jiang & Luofu Wu & Jianbin Sun, 2024. "A Sample Average Approximation Approach for Stochastic Optimization of Flight Test Planning with Sorties Uncertainty," Mathematics, MDPI, vol. 12(19), pages 1-20, September.
Liu, Rui Peng & Shapiro, Alexander, 2020. "Risk neutral reformulation approach to risk averse stochastic programming," European Journal of Operational Research, Elsevier, vol. 286(1), pages 21-31.

More about this item

Keywords

; ; ; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:14:p:2229-:d:1698190. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data