Risk-Sensitive Markov Decision Processes
This paper considers the maximization of certain equivalent reward generated by a Markov decision process with constant risk sensitivity. First, value iteration is used to optimize possibly time-varying processes of finite duration. Then a policy iteration procedure is developed to find the stationary policy with highest certain equivalent gain for the infinite duration case. A simple example demonstrates both procedures.
Volume (Year): 18 (1972)
Issue (Month): 7 (March)
|Contact details of provider:|| Postal: |
Web page: http://www.informs.org/
More information through EDIRC
When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:18:y:1972:i:7:p:356-369. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Mirko Janc)
If references are entirely missing, you can add them using this form.