IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v71y2023i4p1040-1054.html
   My bibliography  Save this article

Dynamic Programming Principles for Mean-Field Controls with Learning

Author

Listed:
  • Haotian Gu

    (Department of Mathematics, University of California, Berkeley, California 94720)

  • Xin Guo

    (Department of Industrial Engineering and Operations Research, University of California, Berkeley, California 94720)

  • Xiaoli Wei

    (Tsinghua-Berkeley Shenzhen Institute, Shenzen 518055, China)

  • Renyuan Xu

    (Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90001)

Abstract

The dynamic programming principle (DPP) is fundamental for control and optimization, including Markov decision problems (MDPs), reinforcement learning (RL), and, more recently, mean-field controls (MFCs). However, in the learning framework of MFCs, the DPP has not been rigorously established, despite its critical importance for algorithm designs. In this paper, we first present a simple example in MFCs with learning where the DPP fails with a misspecified Q function and then propose the correct form of Q function in an appropriate space for MFCs with learning. This particular form of Q function is different from the classical one and is called the IQ function. In the special case when the transition probability and the reward are independent of the mean-field information, it integrates the classical Q function for single-agent RL over the state-action distribution. In other words, MFCs with learning can be viewed as lifting the classical RLs by replacing the state-action space with its probability distribution space. This identification of the IQ function enables us to establish precisely the DPP in the learning framework of MFCs. Finally, we illustrate through numerical experiments the time consistency of this IQ function.

Suggested Citation

  • Haotian Gu & Xin Guo & Xiaoli Wei & Renyuan Xu, 2023. "Dynamic Programming Principles for Mean-Field Controls with Learning," Operations Research, INFORMS, vol. 71(4), pages 1040-1054, July.
  • Handle: RePEc:inm:oropre:v:71:y:2023:i:4:p:1040-1054
    DOI: 10.1287/opre.2022.2395
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.2022.2395
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2022.2395?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:71:y:2023:i:4:p:1040-1054. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.