IDEAS home Printed from https://ideas.repec.org/a/kap/hcarem/v28y2025i2d10.1007_s10729-025-09699-6.html
   My bibliography  Save this article

Reinforcement learning for healthcare operations management: methodological framework, recent developments, and future research directions

Author

Listed:
  • Qihao Wu

    (The University of Hong Kong)

  • Jiangxue Han

    (The University of Hong Kong)

  • Yimo Yan

    (The University of Hong Kong)

  • Yong-Hong Kuo

    (The University of Hong Kong)

  • Zuo-Jun Max Shen

    (The University of Hong Kong
    University of California, Berkeley)

Abstract

With the advancement in computing power and data science techniques, reinforcement learning (RL) has emerged as a powerful tool for decision-making problems in complex systems. In recent years, the research on RL for healthcare operations has grown rapidly. Especially during the COVID-19 pandemic, RL has played a critical role in optimizing decisions with greater degrees of uncertainty. RL for healthcare applications has been an exciting topic across multiple disciplines, including operations research, operations management, healthcare systems engineering, and data science. This review paper first provides a tutorial on the overall framework of RL, including its key components, training models, and approximators. Then, we present the recent advances of RL in the domain of healthcare operations management (HOM) and analyze the current trends. Our paper concludes by presenting existing challenges and future directions for RL in HOM.

Suggested Citation

  • Qihao Wu & Jiangxue Han & Yimo Yan & Yong-Hong Kuo & Zuo-Jun Max Shen, 2025. "Reinforcement learning for healthcare operations management: methodological framework, recent developments, and future research directions," Health Care Management Science, Springer, vol. 28(2), pages 298-333, June.
  • Handle: RePEc:kap:hcarem:v:28:y:2025:i:2:d:10.1007_s10729-025-09699-6
    DOI: 10.1007/s10729-025-09699-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10729-025-09699-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10729-025-09699-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Anton Braverman & Itai Gurvich & Junfei Huang, 2020. "On the Taylor Expansion of Value Functions," Operations Research, INFORMS, vol. 68(2), pages 631-654, March.
    2. Dong Li & Li Ding & Stephen Connor, 2020. "When to Switch? Index Policies for Resource Scheduling in Emergency Response," Production and Operations Management, Production and Operations Management Society, vol. 29(2), pages 241-262, February.
    3. J. G. Dai & Pengyi Shi, 2019. "Inpatient Overflow: An Approximate Dynamic Programming Approach," Manufacturing & Service Operations Management, INFORMS, vol. 21(4), pages 894-911, October.
    4. Vanvuchelen, Nathalie & De Boeck, Kim & Boute, Robert N., 2024. "Cluster-based lateral transshipments for the Zambian health supply chain," European Journal of Operational Research, Elsevier, vol. 313(1), pages 373-386.
    5. Amir Ali Nasrollahzadeh & Amin Khademi & Maria E. Mayorga, 2018. "Real-Time Ambulance Dispatching and Relocation," Manufacturing & Service Operations Management, INFORMS, vol. 20(3), pages 467-480, July.
    6. Zeng, Jia-Ying & Lu, Ping & Wei, Ying & Chen, Xin & Lin, Kai-Biao, 2023. "Deep reinforcement learning based medical supplies dispatching model for major infectious diseases: Case study of COVID-19," Operations Research Perspectives, Elsevier, vol. 11(C).
    7. Peng Zhou & Xing-Lou Yang & Xian-Guang Wang & Ben Hu & Lei Zhang & Wei Zhang & Hao-Rui Si & Yan Zhu & Bei Li & Chao-Lin Huang & Hui-Dong Chen & Jing Chen & Yun Luo & Hua Guo & Ren-Di Jiang & Mei-Qin L, 2020. "Addendum: A pneumonia outbreak associated with a new coronavirus of probable bat origin," Nature, Nature, vol. 588(7836), pages 6-6, December.
    8. Zhishuai Liu & Jesse Clifton & Eric B. Laber & John Drake & Ethan X. Fang, 2023. "Deep Spatial Q-Learning for Infectious Disease Control," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 28(4), pages 749-773, December.
    9. Abhijit Gosavi, 2009. "Reinforcement Learning: A Tutorial Survey and Recent Advances," INFORMS Journal on Computing, INFORMS, vol. 21(2), pages 178-192, May.
    10. Elliot Lee & Mariel Lavieri & Michael Volk & Yongcai Xu, 2015. "Applying reinforcement learning techniques to detect hepatocellular carcinoma under limited screening capacity," Health Care Management Science, Springer, vol. 18(3), pages 363-375, September.
    11. Schütz, Hans-Jörg & Kolisch, Rainer, 2012. "Approximate dynamic programming for capacity allocation in the service industry," European Journal of Operational Research, Elsevier, vol. 218(1), pages 239-250.
    12. Adam Diamant, 2021. "Dynamic multistage scheduling for patient-centered care plans," Health Care Management Science, Springer, vol. 24(4), pages 827-844, December.
    13. Adam Diamant & Joseph Milner & Fayez Quereshy, 2018. "Dynamic Patient Scheduling for Multi†Appointment Health Care Programs," Production and Operations Management, Production and Operations Management Society, vol. 27(1), pages 58-79, January.
    14. Gloria Hyunjung Kwak & Lowell Ling & Pan Hui, 2021. "Deep reinforcement learning approaches for global public health strategies for COVID-19 pandemic," PLOS ONE, Public Library of Science, vol. 16(5), pages 1-15, May.
    15. Julien Grand-Clément & Carri W. Chan & Vineet Goyal & Gabriel Escobar, 2023. "Robustness of Proactive Intensive Care Unit Transfer Policies," Operations Research, INFORMS, vol. 71(5), pages 1653-1688, September.
    16. Ridvan Gedik & Shengfan Zhang & Chase Rainwater, 2017. "Strategic level proton therapy patient admission planning: a Markov decision process modeling approach," Health Care Management Science, Springer, vol. 20(2), pages 286-302, June.
    17. Gary Kochenberger & Jin-Kao Hao & Fred Glover & Mark Lewis & Zhipeng Lü & Haibo Wang & Yang Wang, 2014. "The unconstrained binary quadratic programming problem: a survey," Journal of Combinatorial Optimization, Springer, vol. 28(1), pages 58-81, July.
    18. Huyang Xu & Yuanchen Fang & Chun-An Chou & Nasser Fard & Li Luo, 2023. "A reinforcement learning-based optimal control approach for managing an elective surgery backlog after pandemic disruption," Health Care Management Science, Springer, vol. 26(3), pages 430-446, September.
    19. Jonathan Patrick & Martin L. Puterman & Maurice Queyranne, 2008. "Dynamic Multipriority Patient Scheduling for a Diagnostic Resource," Operations Research, INFORMS, vol. 56(6), pages 1507-1525, December.
    20. Diwas Singh KC & Stefan Scholtes & Christian Terwiesch, 2020. "Empirical Research in Healthcare Operations: Past Research, Present Understanding, and Future Opportunities," Manufacturing & Service Operations Management, INFORMS, vol. 22(1), pages 73-83, January.
    21. Quang Dang Nguyen & Mikhail Prokopenko, 2022. "A general framework for optimising cost-effectiveness of pandemic response under partial intervention measures," Papers 2205.08996, arXiv.org, revised Nov 2022.
    22. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    23. Peng Zhou & Xing-Lou Yang & Xian-Guang Wang & Ben Hu & Lei Zhang & Wei Zhang & Hao-Rui Si & Yan Zhu & Bei Li & Chao-Lin Huang & Hui-Dong Chen & Jing Chen & Yun Luo & Hua Guo & Ren-Di Jiang & Mei-Qin L, 2020. "A pneumonia outbreak associated with a new coronavirus of probable bat origin," Nature, Nature, vol. 579(7798), pages 270-273, March.
    24. Astaraky, Davood & Patrick, Jonathan, 2015. "A simulation based approximate dynamic programming approach to multi-class, multi-resource surgical scheduling," European Journal of Operational Research, Elsevier, vol. 245(1), pages 309-319.
    25. Benyun Shi & Guangliang Liu & Hongjun Qiu & Yu-Wang Chen & Shaoliang Peng, 2019. "Voluntary Vaccination through Perceiving Epidemic Severity in Social Networks," Complexity, Hindawi, vol. 2019, pages 1-13, February.
    26. Hamsa Bastani & Kimon Drakopoulos & Vishal Gupta & Ioannis Vlachogiannis & Christos Hadjichristodoulou & Pagona Lagiou & Gkikas Magiorkinis & Dimitrios Paraskevis & Sotirios Tsiodras, 2021. "Efficient and targeted COVID-19 border testing via reinforcement learning," Nature, Nature, vol. 599(7883), pages 108-113, November.
    27. Rey, David & Hammad, Ahmed W. & Saberi, Meead, 2023. "Vaccine allocation policy optimization and budget sharing mechanism using reinforcement learning," Omega, Elsevier, vol. 115(C).
    28. Guodong Yu & Aijun Liu & Huiping Sun, 2021. "Risk-averse flexible policy on ambulance allocation in humanitarian operations under uncertainty," International Journal of Production Research, Taylor & Francis Journals, vol. 59(9), pages 2588-2610, May.
    29. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    30. Jim G. Dai & Pengyi Shi, 2021. "Recent Modeling and Analytical Advances in Hospital Inpatient Flow Management," Production and Operations Management, Production and Operations Management Society, vol. 30(6), pages 1838-1862, June.
    31. Matt Baucum & Anahita Khojandi & Rama Vasudevan & Ritesh Ramdhani, 2023. "Optimizing Patient-Specific Medication Regimen Policies Using Wearable Sensors in Parkinson’s Disease," Management Science, INFORMS, vol. 69(10), pages 5964-5982, October.
    32. Daganzo, Carlos F., 1995. "Requiem for second-order fluid approximations of traffic flow," Transportation Research Part B: Methodological, Elsevier, vol. 29(4), pages 277-286, August.
    33. Lee, Hyun-Rok & Lee, Taesik, 2021. "Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response," European Journal of Operational Research, Elsevier, vol. 291(1), pages 296-308.
    34. Thul, Lawrence & Powell, Warren, 2023. "Stochastic optimization for vaccine and testing kit allocation for the COVID-19 pandemic," European Journal of Operational Research, Elsevier, vol. 304(1), pages 325-338.
    35. David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
    36. Michael N. Katehakis & Arthur F. Veinott, 1987. "The Multi-Armed Bandit Problem: Decomposition and Computation," Mathematics of Operations Research, INFORMS, vol. 12(2), pages 262-268, May.
    37. Mohammad Zhalechian & Esmaeil Keyvanshokooh & Cong Shi & Mark P. Van Oyen, 2022. "Online Resource Allocation with Personalized Learning," Operations Research, INFORMS, vol. 70(4), pages 2138-2161, July.
    38. Rong Zhang & Jianhao Lv & Jinsong Bao & Yu Zheng, 2023. "A digital twin-driven flexible scheduling method in a human–machine collaborative workshop based on hierarchical reinforcement learning," Flexible Services and Manufacturing Journal, Springer, vol. 35(4), pages 1116-1138, December.
    39. L. Jeff Hong & Zhiyuan Huang & Henry Lam, 2021. "Learning-Based Robust Optimization: Procedures and Statistical Guarantees," Management Science, INFORMS, vol. 67(6), pages 3447-3467, June.
    40. Agrawal, Deepak & Pang, Guodong & Kumara, Soundar, 2023. "Preference based scheduling in a healthcare provider network," European Journal of Operational Research, Elsevier, vol. 307(3), pages 1318-1335.
    41. Mohammad Zhalechian & Esmaeil Keyvanshokooh & Cong Shi & Mark P. Van Oyen, 2023. "Data-Driven Hospital Admission Control: A Learning Approach," Operations Research, INFORMS, vol. 71(6), pages 2111-2129, November.
    42. Yuan Shi & Saied Mahdian & Jose Blanchet & Peter Glynn & Andrew Y. Shin & David Scheinker, 2023. "Surgical scheduling via optimization and machine learning with long-tailed data," Health Care Management Science, Springer, vol. 26(4), pages 692-718, December.
    43. Matthew S. Maxwell & Mateo Restrepo & Shane G. Henderson & Huseyin Topaloglu, 2010. "Approximate Dynamic Programming for Ambulance Redeployment," INFORMS Journal on Computing, INFORMS, vol. 22(2), pages 266-281, May.
    44. Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
    45. Pengyi Shi & Mabel C. Chou & J. G. Dai & Ding Ding & Joe Sim, 2016. "Models and Insights for Hospital Inpatient Operations: Time-Dependent ED Boarding Time," Management Science, INFORMS, vol. 62(1), pages 1-28, January.
    46. Elliot Lee & Mariel S. Lavieri & Michael Volk, 2019. "Optimal Screening for Hepatocellular Carcinoma: A Restless Bandit Model," Service Science, INFORMS, vol. 21(1), pages 198-212, January.
    47. Schmid, Verena, 2012. "Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming," European Journal of Operational Research, Elsevier, vol. 219(3), pages 611-621.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Antoine Sauré & Jonathan Patrick & Martin L. Puterman, 2015. "Simulation-Based Approximate Policy Iteration with Generalized Logistic Functions," INFORMS Journal on Computing, INFORMS, vol. 27(3), pages 579-595, August.
    2. Peter J. H. Hulshof & Martijn R. K. Mes & Richard J. Boucherie & Erwin W. Hans, 2016. "Patient admission planning using Approximate Dynamic Programming," Flexible Services and Manufacturing Journal, Springer, vol. 28(1), pages 30-61, June.
    3. Ulusan, Aybike & Ergun, Özlem, 2021. "Approximate dynamic programming for network recovery problems with stochastic demand," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 151(C).
    4. Gang Du & Xinyue Li & Hui Hu & Xiaoling Ouyang, 2018. "Optimizing Daily Service Scheduling for Medical Diagnostic Equipment Considering Patient Satisfaction and Hospital Revenue," Sustainability, MDPI, vol. 10(9), pages 1-23, September.
    5. Wang, Ziwei & Chen, Hongmin & Luo, Jun & Wang, Chunming & Xu, Xinyi & Zhou, Ying, 2024. "Sharing service in healthcare systems: A recent survey," Omega, Elsevier, vol. 129(C).
    6. Huy Chau & Duy Nguyen & Thai Nguyen, 2024. "Continuous-time optimal investment with portfolio constraints: a reinforcement learning approach," Papers 2412.10692, arXiv.org.
    7. Wang, Xianjia & Yang, Zhipeng & Liu, Yanli & Chen, Guici, 2023. "A reinforcement learning-based strategy updating model for the cooperative evolution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 618(C).
    8. Zheng Wang & Jiuh‐Biing Sheu & Chung‐Piaw Teo & Guiqin Xue, 2022. "Robot Scheduling for Mobile‐Rack Warehouses: Human–Robot Coordinated Order Picking Systems," Production and Operations Management, Production and Operations Management Society, vol. 31(1), pages 98-116, January.
    9. Jim G. Dai & Pengyi Shi, 2021. "Recent Modeling and Analytical Advances in Hospital Inpatient Flow Management," Production and Operations Management, Production and Operations Management Society, vol. 30(6), pages 1838-1862, June.
    10. Andre A. Cire & Adam Diamant, 2022. "Dynamic scheduling of home care patients to medical providers," Production and Operations Management, Production and Operations Management Society, vol. 31(11), pages 4038-4056, November.
    11. Ma, Xin & Zhao, Xue & Guo, Pengfei, 2022. "Cope with the COVID-19 pandemic: Dynamic bed allocation and patient subsidization in a public healthcare system," International Journal of Production Economics, Elsevier, vol. 243(C).
    12. Phillip R. Jenkins & Matthew J. Robbins & Brian J. Lunday, 2021. "Approximate Dynamic Programming for Military Medical Evacuation Dispatching Policies," INFORMS Journal on Computing, INFORMS, vol. 33(1), pages 2-26, January.
    13. Zhang, Qin & Liu, Yu & Xiang, Yisha & Xiahou, Tangfan, 2024. "Reinforcement learning in reliability and maintenance optimization: A tutorial," Reliability Engineering and System Safety, Elsevier, vol. 251(C).
    14. Liping Zhou & Na Geng & Zhibin Jiang & Shan Jiang, 2022. "Integrated Multiresource Capacity Planning and Multitype Patient Scheduling," INFORMS Journal on Computing, INFORMS, vol. 34(1), pages 129-149, January.
    15. Jenkins, Phillip R. & Robbins, Matthew J. & Lunday, Brian J., 2021. "Approximate dynamic programming for the military aeromedical evacuation dispatching, preemption-rerouting, and redeployment problem," European Journal of Operational Research, Elsevier, vol. 290(1), pages 132-143.
    16. Jinsheng Chen & Jing Dong & Pengyi Shi, 2020. "A survey on skill-based routing with applications to service operations management," Queueing Systems: Theory and Applications, Springer, vol. 96(1), pages 53-82, October.
    17. Hou, Lei & Elsworth, Derek & Zhang, Fengshou & Wang, Zhiyuan & Zhang, Jianbo, 2023. "Evaluation of proppant injection based on a data-driven approach integrating numerical and ensemble learning models," Energy, Elsevier, vol. 264(C).
    18. Jun-Yu Si & Yuan-Mei Chen & Ye-Hui Sun & Meng-Xue Gu & Mei-Ling Huang & Lu-Lu Shi & Xiao Yu & Xiao Yang & Qing Xiong & Cheng-Bao Ma & Peng Liu & Zheng-Li Shi & Huan Yan, 2024. "Sarbecovirus RBD indels and specific residues dictating multi-species ACE2 adaptiveness," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    19. Ma, Zhikai & Huo, Qian & Wang, Wei & Zhang, Tao, 2023. "Voltage-temperature aware thermal runaway alarming framework for electric vehicles via deep learning with attention mechanism in time-frequency domain," Energy, Elsevier, vol. 278(C).
    20. Benjamin Heinbach & Peter Burggräf & Johannes Wagner, 2024. "gym-flp: A Python Package for Training Reinforcement Learning Algorithms on Facility Layout Problems," SN Operations Research Forum, Springer, vol. 5(1), pages 1-26, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:kap:hcarem:v:28:y:2025:i:2:d:10.1007_s10729-025-09699-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.