IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v62y2016i1p264-285.html
   My bibliography  Save this article

Robust Multiarmed Bandit Problems

Author

Listed:
  • Michael Jong Kim

    (Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario M5S 3G8, Canada)

  • Andrew E.B. Lim

    (Department of Decision Sciences and Department of Finance, NUS Business School, National University of Singapore, Singapore 119245)

Abstract

The multiarmed bandit problem is a popular framework for studying the exploration versus exploitation trade-off. Recent applications include dynamic assortment design, Internet advertising, dynamic pricing, and the control of queues. The standard mathematical formulation for a bandit problem makes the strong assumption that the decision maker has a full characterization of the joint distribution of the rewards, and that “arms” under this distribution are independent. These assumptions are not satisfied in many applications, and the out-of-sample performance of policies that optimize a misspecified model can be poor. Motivated by these concerns, we formulate a robust bandit problem in which a decision maker accounts for distrust in the nominal model by solving a worst-case problem against an adversary (“nature”) who has the ability to alter the underlying reward distribution and does so to minimize the decision maker’s expected total profit. Structural properties of the optimal worst-case policy are characterized by using the robust Bellman (dynamic programming) equation, and arms are shown to be no longer independent under nature’s worst-case response. One implication of this is that index policies are not optimal for the robust problem, and we propose, as an alternative, a robust version of the Gittins index. Performance bounds for the robust Gittins index are derived by using structural properties of the value function together with ideas from stochastic dynamic programming duality. We also investigate the performance of the robust Gittins index policy when applied to a Bayesian webpage design problem. In the presence of model misspecification, numerical experiments show that the robust Gittins index policy not only outperforms the classical Gittins index policy, but also substantially reduces the variability in the out-of-sample performance. This paper was accepted by Dimitris Bertsimas, optimization.

Suggested Citation

  • Michael Jong Kim & Andrew E.B. Lim, 2016. "Robust Multiarmed Bandit Problems," Management Science, INFORMS, vol. 62(1), pages 264-285, January.
  • Handle: RePEc:inm:ormnsc:v:62:y:2016:i:1:p:264-285
    DOI: 10.1287/mnsc.2015.2153
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2015.2153
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2015.2153?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hansen, Lars Peter & Sargent, Thomas J., 2007. "Recursive robust estimation and control without commitment," Journal of Economic Theory, Elsevier, vol. 136(1), pages 1-27, September.
    2. Carri W. Chan & Vivek F. Farias, 2009. "Stochastic Depletion Problems: Effective Myopic Policies for a Class of Dynamic Optimization Problems," Mathematics of Operations Research, INFORMS, vol. 34(2), pages 333-350, May.
    3. Larry G. Epstein & Martin Schneider, 2007. "Learning Under Ambiguity," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 74(4), pages 1275-1303.
    4. Andrea Grove & Gary A. Berg (ed.), 2014. "Social Business," Springer Books, Springer, edition 127, number 978-3-642-45275-8, January.
    5. Wolfram Wiesemann & Daniel Kuhn & Berç Rustem, 2013. "Robust Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 38(1), pages 153-183, February.
    6. ,, 2000. "Problems And Solutions," Econometric Theory, Cambridge University Press, vol. 16(2), pages 287-299, April.
    7. David B. Brown & James E. Smith & Peng Sun, 2010. "Information Relaxations and Duality in Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 58(4-part-1), pages 785-801, August.
    8. Garud N. Iyengar, 2005. "Robust Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 30(2), pages 257-280, May.
    9. A. Ben-Tal & A. Nemirovski, 1998. "Robust Convex Optimization," Mathematics of Operations Research, INFORMS, vol. 23(4), pages 769-805, November.
    10. Dimitris Bertsimas & Melvyn Sim, 2004. "The Price of Robustness," Operations Research, INFORMS, vol. 52(1), pages 35-53, February.
    11. Andrew E. B. Lim & J. George Shanthikumar, 2007. "Relative Entropy, Exponential Utility, and Robust Dynamic Pricing," Operations Research, INFORMS, vol. 55(2), pages 198-214, April.
    12. David B. Brown & James E. Smith, 2013. "Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats," Operations Research, INFORMS, vol. 61(3), pages 644-665, June.
    13. Epstein, Larry G. & Schneider, Martin, 2003. "Recursive multiple-priors," Journal of Economic Theory, Elsevier, vol. 113(1), pages 1-31, November.
    14. Arnab Nilim & Laurent El Ghaoui, 2005. "Robust Control of Markov Decision Processes with Uncertain Transition Matrices," Operations Research, INFORMS, vol. 53(5), pages 780-798, October.
    15. Felipe Caro & Jérémie Gallien, 2007. "Dynamic Assortment with Demand Learning for Seasonal Consumer Goods," Management Science, INFORMS, vol. 53(2), pages 276-292, February.
    16. Hansen, Lars Peter & Sargent, Thomas J., 2005. "Robust estimation and control under commitment," Journal of Economic Theory, Elsevier, vol. 124(2), pages 258-301, October.
    17. Jonathan Li & Roy Kwon, 2013. "Portfolio selection under model uncertainty: a penalized moment-based optimization approach," Journal of Global Optimization, Springer, vol. 56(1), pages 131-164, May.
    18. Andrew E. B. Lim & J. George Shanthikumar & Gah-Yi Vahn, 2012. "Robust Portfolio Choice with Learning in the Framework of Regret: Single-Period Case," Management Science, INFORMS, vol. 58(9), pages 1732-1746, September.
    19. Paat Rusmevichientong & Zuo-Jun Max Shen & David B. Shmoys, 2010. "Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint," Operations Research, INFORMS, vol. 58(6), pages 1666-1680, December.
    20. Dimitris Bertsimas & José Niño-Mora, 1996. "Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems," Mathematics of Operations Research, INFORMS, vol. 21(2), pages 257-306, May.
    21. Ilya O. Ryzhov & Warren B. Powell & Peter I. Frazier, 2012. "The Knowledge Gradient Algorithm for a General Class of Online Learning Problems," Operations Research, INFORMS, vol. 60(1), pages 180-195, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nikolsko-Rzhevskyy, Alex & Papell, David H. & Prodan, Ruxandra, 2017. "The Yellen rules," Journal of Macroeconomics, Elsevier, vol. 54(PA), pages 59-71.
    2. Xikui Wang & You Liang & Lysa Porth, 2019. "A Bayesian two‐armed bandit model," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 35(3), pages 624-636, May.
    3. Michael Jong Kim, 2020. "Variance Regularization in Sequential Bayesian Optimization," Mathematics of Operations Research, INFORMS, vol. 45(3), pages 966-992, August.
    4. Xu, Jianyu & Chen, Lujie & Tang, Ou, 2021. "An online algorithm for the risk-aware restless bandit," European Journal of Operational Research, Elsevier, vol. 290(2), pages 622-639.
    5. Keskin, Burcu B. & Griffin, Emily C. & Prell, Jonathan O. & Dilkina, Bistra & Ferber, Aaron & MacDonald, John & Hilend, Rowan & Griffis, Stanley & Gore, Meredith L., 2023. "Quantitative Investigation of Wildlife Trafficking Supply Chains: A Review," Omega, Elsevier, vol. 115(C).
    6. David B. Brown & Martin B. Haugh, 2017. "Information Relaxation Bounds for Infinite Horizon Markov Decision Processes," Operations Research, INFORMS, vol. 65(5), pages 1355-1379, October.
    7. Eisenberg, Julia & Krühner, Paul, 2018. "The impact of negative interest rates on optimal capital injections," Insurance: Mathematics and Economics, Elsevier, vol. 82(C), pages 1-10.
    8. Malekipirbazari, Milad & Çavuş, Özlem, 2024. "Index policy for multiarmed bandit problem with dynamic risk measures," European Journal of Operational Research, Elsevier, vol. 312(2), pages 627-640.
    9. Flores-Szwagrzak, Karol, 2022. "Learning by Convex Combination," Working Papers 16-2022, Copenhagen Business School, Department of Economics.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Jong Kim, 2016. "Robust Control of Partially Observable Failing Systems," Operations Research, INFORMS, vol. 64(4), pages 999-1014, August.
    2. Shie Mannor & Ofir Mebel & Huan Xu, 2016. "Robust MDPs with k -Rectangular Uncertainty," Mathematics of Operations Research, INFORMS, vol. 41(4), pages 1484-1509, November.
    3. Bren, Austin & Saghafian, Soroush, 2018. "Data-Driven Percentile Optimization for Multi-Class Queueing Systems with Model Ambiguity: Theory and Application," Working Paper Series rwp18-008, Harvard University, John F. Kennedy School of Government.
    4. Rasouli, Mohammad & Saghafian, Soroush, 2018. "Robust Partially Observable Markov Decision Processes," Working Paper Series rwp18-027, Harvard University, John F. Kennedy School of Government.
    5. Saghafian, Soroush, 2018. "Ambiguous partially observable Markov decision processes: Structural results and applications," Journal of Economic Theory, Elsevier, vol. 178(C), pages 1-35.
    6. Andrew J. Keith & Darryl K. Ahner, 2021. "A survey of decision making and optimization under uncertainty," Annals of Operations Research, Springer, vol. 300(2), pages 319-353, May.
    7. Bakker, Hannah & Dunke, Fabian & Nickel, Stefan, 2020. "A structuring review on multi-stage optimization under uncertainty: Aligning concepts from theory and practice," Omega, Elsevier, vol. 96(C).
    8. Maximilian Blesch & Philipp Eisenhauer, 2021. "Robust decision-making under risk and ambiguity," Papers 2104.12573, arXiv.org, revised Oct 2021.
    9. Hansen, Lars Peter & Mayer, Ricardo & Sargent, Thomas, 2010. "Robust hidden Markov LQG problems," Journal of Economic Dynamics and Control, Elsevier, vol. 34(10), pages 1951-1966, October.
    10. Jose Blanchet & Karthyek Murthy, 2019. "Quantifying Distributional Model Risk via Optimal Transport," Mathematics of Operations Research, INFORMS, vol. 44(2), pages 565-600, May.
    11. Nicole Bauerle & Alexander Glauner, 2020. "Distributionally Robust Markov Decision Processes and their Connection to Risk Measures," Papers 2007.13103, arXiv.org.
    12. Xin, Linwei & Goldberg, David A., 2021. "Time (in)consistency of multistage distributionally robust inventory models with moment constraints," European Journal of Operational Research, Elsevier, vol. 289(3), pages 1127-1141.
    13. Wei, Cansheng & Li, Yongjian & Cai, Xiaoqiang, 2011. "Robust optimal policies of production and inventory with uncertain returns and demand," International Journal of Production Economics, Elsevier, vol. 134(2), pages 357-367, December.
    14. Alexander Shapiro, 2016. "Rectangular Sets of Probability Measures," Operations Research, INFORMS, vol. 64(2), pages 528-541, April.
    15. Andrew E. B. Lim & J. George Shanthikumar, 2007. "Relative Entropy, Exponential Utility, and Robust Dynamic Pricing," Operations Research, INFORMS, vol. 55(2), pages 198-214, April.
    16. Alper Atamtürk & Muhong Zhang, 2007. "Two-Stage Robust Network Flow and Design Under Demand Uncertainty," Operations Research, INFORMS, vol. 55(4), pages 662-673, August.
    17. Fernandes, Betina & Street, Alexandre & Valladão, Davi & Fernandes, Cristiano, 2016. "An adaptive robust portfolio optimization model with loss constraints based on data-driven polyhedral uncertainty sets," European Journal of Operational Research, Elsevier, vol. 255(3), pages 961-970.
    18. Erim Kardeş & Fernando Ordóñez & Randolph W. Hall, 2011. "Discounted Robust Stochastic Games and an Application to Queueing Control," Operations Research, INFORMS, vol. 59(2), pages 365-382, April.
    19. Wenqing Chen & Melvyn Sim & Jie Sun & Chung-Piaw Teo, 2010. "From CVaR to Uncertainty Set: Implications in Joint Chance-Constrained Optimization," Operations Research, INFORMS, vol. 58(2), pages 470-485, April.
    20. Stefan Mišković, 2017. "A VNS-LP algorithm for the robust dynamic maximal covering location problem," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 39(4), pages 1011-1033, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:62:y:2016:i:1:p:264-285. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.