IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v71y2025i3p1988-2010.html
   My bibliography  Save this article

Thompson Sampling with Information Relaxation Penalties

Author

Listed:
  • Seungki Min

    (KAIST, Daejeon 34141, Republic of Korea)

  • Costis Maglaras

    (Columbia University, New York, New York 10027)

  • Ciamac C. Moallemi

    (Columbia University, New York, New York 10027)

Abstract

We consider a finite-horizon multiarmed bandit (MAB) problem in a Bayesian setting, for which we propose an information relaxation sampling framework. With this framework, we define an intuitive family of control policies that include Thompson sampling (TS) and the Bayesian optimal policy as endpoints. Analogous to TS, which at each decision epoch pulls an arm that is best with respect to the randomly sampled parameters, our algorithms sample entire future reward realizations and take the corresponding best action. However, this is done in the presence of “penalties” that seek to compensate for the availability of future information. We develop several novel policies and performance bounds for MAB problems that vary in terms of improving performance and increasing computational complexity between the two endpoints. Our policies can be viewed as natural generalizations of TS that simultaneously incorporate knowledge of the time horizon and explicitly consider the exploration-exploitation trade-off. We prove associated structural results on performance bounds and suboptimality gaps. Numerical experiments suggest that this new class of policies perform well, in particular, in settings where the finite time horizon introduces significant exploration-exploitation tension into the problem. Finally, inspired by the finite-horizon Gittins index, we propose an index policy that builds on our framework that particularly outperforms the state-of-the-art algorithms in our numerical experiments.

Suggested Citation

  • Seungki Min & Costis Maglaras & Ciamac C. Moallemi, 2025. "Thompson Sampling with Information Relaxation Penalties," Management Science, INFORMS, vol. 71(3), pages 1988-2010, March.
  • Handle: RePEc:inm:ormnsc:v:71:y:2025:i:3:p:1988-2010
    DOI: 10.1287/mnsc.2020.01396
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2020.01396
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2020.01396?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:71:y:2025:i:3:p:1988-2010. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.