IDEAS home Printed from https://ideas.repec.org/a/inm/ormoor/v44y2019i2p377-399.html
   My bibliography  Save this article

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Author

Listed:
  • Aurélien Garivier

    (Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France)

  • Pierre Ménard

    (Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France)

  • Gilles Stoltz

    (HEC Paris Management Research Group (GREGHEC): HEC Paris—CNRS, 78351 Jouy-en-Josas, France)

Abstract

We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.

Suggested Citation

  • Aurélien Garivier & Pierre Ménard & Gilles Stoltz, 2019. "Explore First, Exploit Next: The True Shape of Regret in Bandit Problems," Mathematics of Operations Research, INFORMS, vol. 44(2), pages 377-399, May.
  • Handle: RePEc:inm:ormoor:v:44:y:2019:i:2:p:377-399
    DOI: 10.1287/moor.2017.0928
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/moor.2017.0928
    Download Restriction: no

    File URL: https://libkey.io/10.1287/moor.2017.0928?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sanjeev R. Kulkarni & Gábor Lugosi, 1997. "Minimax lower bounds for the two-armed bandit problem," Economics Working Papers 206, Department of Economics and Business, Universitat Pompeu Fabra.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Masahiro Kato & Kaito Ariu, 2021. "The Role of Contextual Information in Best Arm Identification," Papers 2106.14077, arXiv.org, revised Feb 2024.
    2. Apostolos Burnetas, 2022. "Learning and data-driven optimization in queues with strategic customers," Queueing Systems: Theory and Applications, Springer, vol. 100(3), pages 517-519, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormoor:v:44:y:2019:i:2:p:377-399. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.