On Dynamic Programming with Unbounded Rewards
Using the technique employed by the author in an earlier paper, the existence of an optimal stationary policy that can be obtained from the usual functional equation is again established in the presence of a bound (not necessarily polynomial) on the one-period reward of a semi-Markov decision process. This is done for both the discounted and the average cost case. In addition to allowing an uncountable state space, the law of motion of the system is rather general in that we permit any state to be reached in a single transition. There is, however, a bound on a weighted moment of the next state reached. Finally, we indicate the applicability of these results.
Volume (Year): 21 (1975)
Issue (Month): 11 (July)
|Contact details of provider:|| Postal: 7240 Parkway Drive, Suite 300, Hanover, MD 21076 USA|
Web page: http://www.informs.org/
More information through EDIRC
When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:21:y:1975:i:11:p:1225-1233. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Mirko Janc)
If references are entirely missing, you can add them using this form.