Modified monotone policy iteration for interpretable policies in Markov decision processes and the impact of state ordering rules

My bibliography Save this article

Modified monotone policy iteration for interpretable policies in Markov decision processes and the impact of state ordering rules

Author

Listed:

Sun Ju Lee
(Georgia Institute of Technology)
Xingyu Gong
(Georgia Institute of Technology)
Gian-Gabriel Garcia
(Georgia Institute of Technology)

Registered:

Abstract

Optimizing interpretable policies for Markov decision processes (MDPs) can be computationally intractable for large-scale MDPs, e.g., for monotone policies, the optimal interpretable policy depends on the initial state distribution, precluding standard dynamic programming techniques. Previous work has proposed monotone policy iteration (MPI) to produce a feasible solution for warm starting a mixed integer linear program that finds an optimal monotone policy. However, this prior work did not investigate the convergence and optimality of this algorithm, nor did they investigate the impact of state ordering rules, i.e., the order in which policy improvement steps are performed in MPI. In this study, we analytically characterize the convergence and optimality of the MPI algorithm, introduce a modified MPI (MMPI) algorithm, and show that our algorithm improves upon the MPI algorithm. To test MMPI numerically, we conduct experiments in two settings: (1) perturbations of a machine maintenance problem wherein the optimal policy is guaranteed to be monotone or near-monotone and (2) randomly generated MDPs. We propose and investigate 19 state ordering rules for MMPI based on each state’s value function, initial probability, and stationary distribution. Computational results reveal a trade-off between computational time and optimality gap; in the structured machine maintenance setting, the fastest state ordering rules still yield high quality policies while the trade-off is more pronounced in the random MDP setting. Across both settings, the random state ordering rule performs the best in terms of optimality gap (less than approximately 5% on average) at the expense of computational time.

Suggested Citation

Sun Ju Lee & Xingyu Gong & Gian-Gabriel Garcia, 2025. "Modified monotone policy iteration for interpretable policies in Markov decision processes and the impact of state ordering rules," Annals of Operations Research, Springer, vol. 347(2), pages 783-841, April.

Handle: RePEc:spr:annopr:v:347:y:2025:i:2:d:10.1007_s10479-024-06158-3
DOI: 10.1007/s10479-024-06158-3

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Kim, Minsun & Ghate, Archis & Phillips, Mark H., 2012. "A stochastic control formalism for dynamic biologically conformal radiation therapy," European Journal of Operational Research, Elsevier, vol. 219(3), pages 541-556.
Wang, Hongzhou, 2002. "A survey of maintenance policies of deteriorating systems," European Journal of Operational Research, Elsevier, vol. 139(3), pages 469-489, June.
S. Christian Albright, 1979. "Structural Results for Partially Observable Markov Decision Processes," Operations Research, INFORMS, vol. 27(5), pages 1041-1053, October.
Warren B. Powell, 2016. "Perspectives of approximate dynamic programming," Annals of Operations Research, Springer, vol. 241(1), pages 319-356, June.
Oguzhan Alagoz & Lisa M. Maillart & Andrew J. Schaefer & Mark S. Roberts, 2004. "The Optimal Timing of Living-Donor Liver Transplantation," Management Science, INFORMS, vol. 50(10), pages 1420-1430, October.
Dragos Florin Ciocan & Velibor V. Mišić, 2022. "Interpretable Optimal Stopping," Management Science, INFORMS, vol. 68(3), pages 1616-1638, March.
Liu, Bin & Wu, Shaomin & Xie, Min & Kuo, Way, 2017. "A condition-based maintenance policy for degrading systems with age- and state-dependent operating cost," European Journal of Operational Research, Elsevier, vol. 263(3), pages 879-887.
William S. Lovejoy, 1987. "Some Monotonicity Results for Partially Observed Markov Decision Processes," Operations Research, INFORMS, vol. 35(5), pages 736-743, October.
Alaa H. Elwany & Nagi Z. Gebraeel & Lisa M. Maillart, 2011. "Structured Replacement Policies for Components with Complex Degradation Processes and Dedicated Sensors," Operations Research, INFORMS, vol. 59(3), pages 684-695, June.
S. Christian Albright & Wayne Winston, 1979. "Markov Models of Advertising and Pricing Decisions," Operations Research, INFORMS, vol. 27(4), pages 668-681, August.
Richard C. Grinold, 1973. "Technical Note—Elimination of Suboptimal Actions in Markov Decision Problems," Operations Research, INFORMS, vol. 21(3), pages 848-851, June.
Sandun C. Perera & Suresh P. Sethi, 2023. "A survey of stochastic inventory models with fixed costs: Optimality of (s, S) and (s, S)‐type policies—Discrete‐time case," Production and Operations Management, Production and Operations Management Society, vol. 32(1), pages 131-153, January.
White, D. J., 1981. "Isotone optimal policies for structured Markov decision processes," European Journal of Operational Research, Elsevier, vol. 7(4), pages 396-402, August.
Jay K. Satia & Roy E. Lave, 1973. "Markovian Decision Processes with Uncertain Transition Probabilities," Operations Research, INFORMS, vol. 21(3), pages 728-740, June.
Rebekah S. McKenna & Matthew J. Robbins & Brian J. Lunday & Ian M. McCormack, 2020. "Approximate dynamic programming for the military inventory routing problem," Annals of Operations Research, Springer, vol. 288(1), pages 391-416, May.
Hao Zhang & Weihua Zhang, 2023. "Analytical Solution to a Partially Observable Machine Maintenance Problem with Obvious Failures," Management Science, INFORMS, vol. 69(7), pages 3993-4015, July.
Huizhen Yu & Dimitri Bertsekas, 2013. "Q-learning and policy iteration algorithms for stochastic shortest path problems," Annals of Operations Research, Springer, vol. 208(1), pages 95-132, September.
J. MacQueen, 1967. "Letter to the Editor—A Test for Suboptimal Actions in Markovian Decision Problems," Operations Research, INFORMS, vol. 15(3), pages 559-561, June.
Steven M. Shechter & Matthew D. Bailey & Andrew J. Schaefer & Mark S. Roberts, 2008. "The Optimal Time to Initiate HIV Therapy Under Ordered Health States," Operations Research, INFORMS, vol. 56(1), pages 20-33, February.
Yuhai Hu & Boris Defourny, 2022. "Optimal price-threshold control for battery operation with aging phenomenon: a quasiconvex optimization approach," Annals of Operations Research, Springer, vol. 317(2), pages 623-650, October.
de Jonge, Bram & Scarf, Philip A., 2020. "A review on maintenance optimization," European Journal of Operational Research, Elsevier, vol. 285(3), pages 805-824.
Martin L. Puterman & Shelby L. Brumelle, 1979. "On the Convergence of Policy Iteration in Stationary Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 4(1), pages 60-69, February.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Deep, Akash & Zhou, Shiyu & Veeramani, Dharmaraj & Chen, Yong, 2023. "Partially observable Markov decision process-based optimal maintenance planning with time-dependent observations," European Journal of Operational Research, Elsevier, vol. 311(2), pages 533-544.
Sarang Deo & Seyed Iravani & Tingting Jiang & Karen Smilowitz & Stephen Samuelson, 2013. "Improving Health Outcomes Through Better Capacity Allocation in a Community-Based Chronic Care Model," Operations Research, INFORMS, vol. 61(6), pages 1277-1294, December.
Collin Drent & Melvin Drent & Joachim Arts, 2024. "Condition-Based Production for Stochastically Deteriorating Systems: Optimal Policies and Learning," Manufacturing & Service Operations Management, INFORMS, vol. 26(3), pages 1137-1156, May.
- Drent, Collin & Drent, Melvin & Arts, Joachim, 2024. "Condition-based production for stochastically deteriorating systems: Optimal policies and learning," Other publications TiSEM 69737ccb-2893-497d-b597-e, Tilburg University, School of Economics and Management.
M. Reza Skandari & Steven M. Shechter, 2021. "Patient-Type Bayes-Adaptive Treatment Plans," Operations Research, INFORMS, vol. 69(2), pages 574-598, March.
Liu, Bin & Pandey, Mahesh D. & Wang, Xiaolin & Zhao, Xiujie, 2021. "A finite-horizon condition-based maintenance policy for a two-unit system with dependent degradation processes," European Journal of Operational Research, Elsevier, vol. 295(2), pages 705-717.
Cai, Yue & de Jonge, Bram & Teunter, Ruud H., 2025. "Data-driven condition-based maintenance optimization given limited data," European Journal of Operational Research, Elsevier, vol. 324(1), pages 324-334.
de Jonge, Bram & Scarf, Philip A., 2020. "A review on maintenance optimization," European Journal of Operational Research, Elsevier, vol. 285(3), pages 805-824.
Cai, Yue & Teunter, Ruud H. & de Jonge, Bram, 2023. "A data-driven approach for condition-based maintenance optimization," European Journal of Operational Research, Elsevier, vol. 311(2), pages 730-738.
Arts, Joachim & Boute, Robert N. & Loeys, Stijn & van Staden, Heletjé E., 2025. "Fifty years of maintenance optimization: Reflections and perspectives," European Journal of Operational Research, Elsevier, vol. 322(3), pages 725-739.
Pinciroli, Luca & Baraldi, Piero & Zio, Enrico, 2023. "Maintenance optimization in industry 4.0," Reliability Engineering and System Safety, Elsevier, vol. 234(C).
Jonathan E. Helm & Mariel S. Lavieri & Mark P. Van Oyen & Joshua D. Stein & David C. Musch, 2015. "Dynamic Forecasting and Control Algorithms of Glaucoma Progression for Clinician Decision Support," Operations Research, INFORMS, vol. 63(5), pages 979-999, October.
Jingyu Zhang & Brian T. Denton & Hari Balasubramanian & Nilay D. Shah & Brant A. Inman, 2012. "Optimization of Prostate Biopsy Referral Decisions," Manufacturing & Service Operations Management, INFORMS, vol. 14(4), pages 529-547, October.
Kotas, Jakob & Ghate, Archis, 2018. "Bayesian learning of dose–response parameters from a cohort under response-guided dosing," European Journal of Operational Research, Elsevier, vol. 265(1), pages 328-343.
Qiuzhuang Sun & Piao Chen & Xin Wang & Zhi‐Sheng Ye, 2023. "Robust condition‐based production and maintenance planning for degradation management," Production and Operations Management, Production and Operations Management Society, vol. 32(12), pages 3951-3967, December.
Mason, J.E. & Denton, B.T. & Shah, N.D. & Smith, S.A., 2014. "Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients," European Journal of Operational Research, Elsevier, vol. 233(3), pages 727-738.
Zhao, Xian & He, Zongda & Wu, Yaguang & Qiu, Qingan, 2022. "Joint optimization of condition-based performance control and maintenance policies for mission-critical systems," Reliability Engineering and System Safety, Elsevier, vol. 226(C).
Liu, Gehui & Chen, Shaokuan & Ho, Tinkin & Ran, Xinchen & Mao, Baohua & Lan, Zhen, 2022. "Optimum opportunistic maintenance schedule over variable horizons considering multi-stage degradation and dynamic strategy," Reliability Engineering and System Safety, Elsevier, vol. 225(C).
Chiel van Oosterom & Lisa M. Maillart & Jeffrey P. Kharoufeh, 2017. "Optimal maintenance policies for a safety‐critical system and its deteriorating sensor," Naval Research Logistics (NRL), John Wiley & Sons, vol. 64(5), pages 399-417, August.
Oguzhan Alagoz & Jagpreet Chhatwal & Elizabeth S. Burnside, 2013. "Optimal Policies for Reducing Unnecessary Follow-Up Mammography Exams in Breast Cancer Diagnosis," Decision Analysis, INFORMS, vol. 10(3), pages 200-224, September.
Jyrki Savolainen & Michele Urbani, 2021. "Maintenance optimization for a multi-unit system with digital twin simulation," Journal of Intelligent Manufacturing, Springer, vol. 32(7), pages 1953-1973, October.

More about this item

Keywords

; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:347:y:2025:i:2:d:10.1007_s10479-024-06158-3. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Modified monotone policy iteration for interpretable policies in Markov decision processes and the impact of state ordering rules

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data