IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v17y1969i4p716-727.html
   My bibliography  Save this article

Perturbation Theory and Undiscounted Markov Renewal Programming

Author

Listed:
  • Paul J. Schweitzer

    (Institute for Defense Analyses, Arlington, Virginia)

Abstract

A recently-developed perturbation formalism for finite Markov chains is used here to analyze the policy iteration algorithm for undiscounted, single-chain Markov renewal programming. The relative values are shown to be essentially partial derivatives of the gain rate with respect to the transition probabilities, and they rank the states by indicating desirable changes in the probabilistic structure. This both implies the optimality of nonrandomized policies and suggests a gradient technique for optimizing the gain rate with respect to a parameter. The policy iteration algorithm is shown to be a steepest-ascent technique in policy space: the successor to a given policy is chosen in a direction that maximizes the directional derivative of the gain rate. The occurrence during policy improvement of the gain and relative values of the original policy is explained by their essentially determining the gradient of the gain rate.

Suggested Citation

  • Paul J. Schweitzer, 1969. "Perturbation Theory and Undiscounted Markov Renewal Programming," Operations Research, INFORMS, vol. 17(4), pages 716-727, August.
  • Handle: RePEc:inm:oropre:v:17:y:1969:i:4:p:716-727
    DOI: 10.1287/opre.17.4.716
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.17.4.716
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.17.4.716?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dijk, N.M. van, 1991. "An improved error bound theorem for approximate Markov chains," Serie Research Memoranda 0084, VU University Amsterdam, Faculty of Economics, Business Administration and Econometrics.
    2. Dijk, N.M. van, 1991. "On error bound analysis for transient continuous-time Markov reward structures," Serie Research Memoranda 0005, VU University Amsterdam, Faculty of Economics, Business Administration and Econometrics.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:17:y:1969:i:4:p:716-727. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.