IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2312.16489.html

Best-of-Both-Worlds Linear Contextual Bandits

Author

Listed:
  • Masahiro Kato
  • Shinji Ito

Abstract

This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed context and then selects an arm based on the context and past observations. After selecting an arm, the decision-maker incurs a loss corresponding to the selected arm. The decision-maker aims to minimize the cumulative loss over the trial. The goal of this study is to develop a strategy that is effective in both stochastic and adversarial environments, with theoretical guarantees. We first formulate the problem by introducing a novel setting of bandits with adversarial corruption, referred to as the contextual adversarial regime with a self-bounding constraint. We assume linear models for the relationship between the loss and the context. Then, we propose a strategy that extends the RealLinExp3 by Neu & Olkhovskaya (2020) and the Follow-The-Regularized-Leader (FTRL). The regret of our proposed algorithm is shown to be upper-bounded by $O\left(\min\left\{\frac{(\log(T))^3}{\Delta_{*}} + \sqrt{\frac{C(\log(T))^3}{\Delta_{*}}},\ \ \sqrt{T}(\log(T))^2\right\}\right)$, where $T \in\mathbb{N}$ is the number of rounds, $\Delta_{*} > 0$ is the constant minimum gap between the best and suboptimal arms for any context, and $C\in[0, T] $ is an adversarial corruption parameter. This regret upper bound implies $O\left(\frac{(\log(T))^3}{\Delta_{*}}\right)$ in a stochastic environment and by $O\left( \sqrt{T}(\log(T))^2\right)$ in an adversarial environment. We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both stochastic and adversarial regimes.

Suggested Citation

  • Masahiro Kato & Shinji Ito, 2023. "Best-of-Both-Worlds Linear Contextual Bandits," Papers 2312.16489, arXiv.org.
  • Handle: RePEc:arx:papers:2312.16489
    as

    Download full text from publisher

    File URL: https://arxiv.org/pdf/2312.16489
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hamsa Bastani & Mohsen Bayati, 2020. "Online Decision Making with High-Dimensional Covariates," Operations Research, INFORMS, vol. 68(1), pages 276-294, January.
    2. Paat Rusmevichientong & John N. Tsitsiklis, 2010. "Linearly Parameterized Bandits," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 395-411, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rong Jin & David Simchi-Levi & Li Wang & Xinshang Wang & Sen Yang, 2021. "Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses," Management Science, INFORMS, vol. 67(8), pages 4756-4771, August.
    2. Yining Wang & Boxiao Chen & David Simchi-Levi, 2021. "Multimodal Dynamic Pricing," Management Science, INFORMS, vol. 67(10), pages 6136-6152, October.
    3. Nian Si & Fan Zhang & Zhengyuan Zhou & Jose Blanchet, 2023. "Distributionally Robust Batch Contextual Bandits," Management Science, INFORMS, vol. 69(10), pages 5772-5793, October.
    4. Divya Singhvi & Somya Singhvi, 2025. "Online Learning with Sample Selection Bias," Operations Research, INFORMS, vol. 73(5), pages 2458-2476, September.
    5. Ying Zhong & L. Jeff Hong & Guangwu Liu, 2021. "Earning and Learning with Varying Cost," Production and Operations Management, Production and Operations Management Society, vol. 30(8), pages 2379-2394, August.
    6. Xue Wang & Mike Mingcheng Wei & Tao Yao, 2025. "Online Learning and Decision Making Under Generalized Linear Model with High-Dimensional Data," Management Science, INFORMS, vol. 71(8), pages 6647-6665, August.
    7. Mohammad Zhalechian & Esmaeil Keyvanshokooh & Cong Shi & Mark P. Van Oyen, 2023. "Data-Driven Hospital Admission Control: A Learning Approach," Operations Research, INFORMS, vol. 71(6), pages 2111-2129, November.
    8. Esmaeil Keyvanshokooh & Mohammad Zhalechian & Cong Shi & Mark P. Van Oyen & Pooyan Kazemian, 2025. "Contextual Learning with Online Convex Optimization: Theory and Application to Medical Decision-Making," Management Science, INFORMS, vol. 71(12), pages 10442-10464, December.
    9. Sentao Miao & Xiuli Chao, 2022. "Online Personalized Assortment Optimization with High-Dimensional Customer Contextual Data," Manufacturing & Service Operations Management, INFORMS, vol. 24(5), pages 2741-2760, September.
    10. David Simchi-Levi & Rui Sun & Huanan Zhang, 2022. "Online Learning and Optimization for Revenue Management Problems with Add-on Discounts," Management Science, INFORMS, vol. 68(10), pages 7402-7421, October.
    11. Hao Zhang, 2022. "Dynamic Learning and Decision Making via Basis Weight Vectors," Operations Research, INFORMS, vol. 70(3), pages 1835-1853, May.
    12. Hamsa Bastani & David Simchi-Levi & Ruihao Zhu, 2022. "Meta Dynamic Pricing: Transfer Learning Across Experiments," Management Science, INFORMS, vol. 68(3), pages 1865-1881, March.
    13. Mark Egan & Tomas Philipson, 2016. "Health Care Adherence and Personalized Medicine," Working Papers 2016-H01, Becker Friedman Institute for Research In Economics.
    14. Leon Yang Chu & Qi Feng & J. George Shanthikumar & Zuo-Jun Max Shen & Jian Wu, 2025. "Solving the Price-Setting Newsvendor Problem with Parametric Operational Data Analytics (ODA)," Management Science, INFORMS, vol. 71(8), pages 6627-6646, August.
    15. Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2024. "Policy Learning with Adaptively Collected Data," Management Science, INFORMS, vol. 70(8), pages 5270-5297, August.
    16. Kan Xu & Hamsa Bastani, 2025. "Multitask Learning and Bandits via Robust Statistics," Management Science, INFORMS, vol. 71(9), pages 7752-7773, September.
    17. Agrawal, Priyank & Tulabandhula, Theja & Avadhanula, Vashist, 2023. "A tractable online learning algorithm for the multinomial logit contextual bandit," European Journal of Operational Research, Elsevier, vol. 310(2), pages 737-750.
    18. Xin Zhou & Yang Li & Zemin Zheng & Jie Wu & Jiarui Zhang, 2025. "Reproducible Feature Selection for High-Dimensional Measurement Error Models," INFORMS Journal on Computing, INFORMS, vol. 37(5), pages 1350-1368, September.
    19. Haihui Shen & L. Jeff Hong & Xiaowei Zhang, 2021. "Ranking and Selection with Covariates for Personalized Decision Making," INFORMS Journal on Computing, INFORMS, vol. 33(4), pages 1500-1519, October.
    20. Yonatan Gur & Ahmadreza Momeni & Stefan Wager, 2022. "Smoothness-Adaptive Contextual Bandits," Operations Research, INFORMS, vol. 70(6), pages 3198-3216, November.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2312.16489. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.