IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v69y2023i11p6898-6911.html
   My bibliography  Save this article

Approximation Benefits of Policy Gradient Methods with Aggregated States

Author

Listed:
  • Daniel Russo

    (Graduate School of Business, Columbia University, New York, New York 10027)

Abstract

Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, in which the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per period is bounded by ϵ , the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as ϵ / ( 1 − γ ) , where γ is a discount factor. Faced with inherent approximation error, methods that locally optimize the true decision objective can be far more robust.

Suggested Citation

  • Daniel Russo, 2023. "Approximation Benefits of Policy Gradient Methods with Aggregated States," Management Science, INFORMS, vol. 69(11), pages 6898-6911, November.
  • Handle: RePEc:inm:ormnsc:v:69:y:2023:i:11:p:6898-6911
    DOI: 10.1287/mnsc.2023.4788
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2023.4788
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2023.4788?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:69:y:2023:i:11:p:6898-6911. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.