IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v26y1978i2p282-304.html
   My bibliography  Save this article

The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs

Author

Listed:
  • Edward J. Sondik

    (Stanford University, Stanford, California)

Abstract

This paper treats the discounted cost, optimal control problem for Markov processes with incomplete state information. The optimization approach for these partially observable Markov processes is a generalization of the well-known policy iteration technique for finding optimal stationary policies for completely observable Markov processes. The state space for the problem is the space of state occupancy probability distributions (the unit simplex). The development of the algorithm introduces several new ideas, including the class of finitely transient policies, which are shown to possess piecewise linear cost functions. The paper develops easily implemented approximations to stationary policies based on these finitely transient policies and shows that the concave hull of an approximation can be included in the well-known Howard policy improvement algorithm with subsequent convergence. The paper closes with a detailed example illustrating the application of the algorithm to the two-state partially observable Markov process.

Suggested Citation

  • Edward J. Sondik, 1978. "The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs," Operations Research, INFORMS, vol. 26(2), pages 282-304, April.
  • Handle: RePEc:inm:oropre:v:26:y:1978:i:2:p:282-304
    DOI: 10.1287/opre.26.2.282
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.26.2.282
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.26.2.282?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Abhijit Gosavi, 2009. "Reinforcement Learning: A Tutorial Survey and Recent Advances," INFORMS Journal on Computing, INFORMS, vol. 21(2), pages 178-192, May.
    2. White, Chelsea C. & Cheong, Taesu, 2012. "In-transit perishable product inspection," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 48(1), pages 310-330.
    3. Ricardo Montoya & Oded Netzer & Kamel Jedidi, 2010. "Dynamic Allocation of Pharmaceutical Detailing and Sampling for Long-Term Profitability," Marketing Science, INFORMS, vol. 29(5), pages 909-924, 09-10.
    4. Yanling Chang & Alan Erera & Chelsea White, 2015. "Value of information for a leader–follower partially observed Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 129-153, December.
    5. Powell, Warren B., 2019. "A unified framework for stochastic optimization," European Journal of Operational Research, Elsevier, vol. 275(3), pages 795-821.
    6. Satya S. Malladi & Alan L. Erera & Chelsea C. White, 2023. "Inventory control with modulated demand and a partially observed modulation process," Annals of Operations Research, Springer, vol. 321(1), pages 343-369, February.
    7. Deep, Akash & Zhou, Shiyu & Veeramani, Dharmaraj & Chen, Yong, 2023. "Partially observable Markov decision process-based optimal maintenance planning with time-dependent observations," European Journal of Operational Research, Elsevier, vol. 311(2), pages 533-544.
    8. Gong, Linguo & Tang, Kwei, 1997. "Monitoring machine operations using on-line sensors," European Journal of Operational Research, Elsevier, vol. 96(3), pages 479-492, February.
    9. Eugene A. Feinberg & Pavlo O. Kasyanov & Michael Z. Zgurovsky, 2016. "Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities," Mathematics of Operations Research, INFORMS, vol. 41(2), pages 656-681, May.
    10. Satya S. Malladi & Alan L. Erera & Chelsea C. White, 2021. "Managing mobile production-inventory systems influenced by a modulation process," Annals of Operations Research, Springer, vol. 304(1), pages 299-330, September.
    11. Compare, Michele & Baraldi, Piero & Marelli, Paolo & Zio, Enrico, 2020. "Partially observable Markov decision processes for optimal operations of gas transmission networks," Reliability Engineering and System Safety, Elsevier, vol. 199(C).
    12. Daming Lin & Viliam Makis, 2006. "On‐line parameter estimation for a partially observable system subject to random failure," Naval Research Logistics (NRL), John Wiley & Sons, vol. 53(5), pages 477-483, August.
    13. Hao Zhang, 2010. "Partially Observable Markov Decision Processes: A Geometric Technique and Analysis," Operations Research, INFORMS, vol. 58(1), pages 214-228, February.
    14. Armando Z. Milioni & Stanley R. Pliska, 1988. "Optimal inspection under semi‐markovian deterioration: Basic results," Naval Research Logistics (NRL), John Wiley & Sons, vol. 35(5), pages 373-392, October.
    15. Zhang, Mimi, 2020. "A heuristic policy for maintaining multiple multi-state systems," Reliability Engineering and System Safety, Elsevier, vol. 203(C).
    16. Saghafian, Soroush, 2018. "Ambiguous partially observable Markov decision processes: Structural results and applications," Journal of Economic Theory, Elsevier, vol. 178(C), pages 1-35.
    17. Williams, Byron K., 2009. "Markov decision processes in natural resources management: Observability and uncertainty," Ecological Modelling, Elsevier, vol. 220(6), pages 830-840.
    18. Williams, Byron K., 2011. "Resolving structural uncertainty in natural resources management using POMDP approaches," Ecological Modelling, Elsevier, vol. 222(5), pages 1092-1102.
    19. Seites-Rundlett, William & Bashar, Mohammad Z. & Torres-Machi, Cristina & Corotis, Ross B., 2022. "Combined evidence model to enhance pavement condition prediction from highly uncertain sensor data," Reliability Engineering and System Safety, Elsevier, vol. 217(C).
    20. Kıvanç, İpek & Özgür-Ünlüakın, Demet & Bilgiç, Taner, 2022. "Maintenance policy analysis of the regenerative air heater system using factored POMDPs," Reliability Engineering and System Safety, Elsevier, vol. 219(C).
    21. V. Makis & X. Jiang, 2003. "Optimal Replacement Under Partial Observations," Mathematics of Operations Research, INFORMS, vol. 28(2), pages 382-394, May.
    22. Hao Zhang, 2022. "Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making," Management Science, INFORMS, vol. 68(8), pages 5924-5957, August.
    23. Hao Zhang & Weihua Zhang, 2023. "Analytical Solution to a Partially Observable Machine Maintenance Problem with Obvious Failures," Management Science, INFORMS, vol. 69(7), pages 3993-4015, July.
    24. Memarzadeh, Milad & Pozzi, Matteo & Kolter, J. Zico, 2016. "Hierarchical modeling of systems with similar components: A framework for adaptive monitoring and control," Reliability Engineering and System Safety, Elsevier, vol. 153(C), pages 159-169.
    25. Yanling Chang & Alan Erera & Chelsea White, 2015. "A leader–follower partially observed, multiobjective Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 103-128, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:26:y:1978:i:2:p:282-304. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.