IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v66y2018i1p230-252.html

Learning to Optimize via Information-Directed Sampling

Author

Listed:
  • Daniel Russo

    (Graduate School of Business, Columbia University, New York, New York 10027)

  • Benjamin Van Roy

    (Stanford University, Stanford, California 94305)

Abstract

We propose information-directed sampling —a new approach to online optimization problems in which a decision maker must balance between exploration and exploitation while learning from partial feedback. Each action is sampled in a manner that minimizes the ratio between squared expected single-period regret and a measure of information gain: the mutual information between the optimal action and the next observation. We establish an expected regret bound for information-directed sampling that applies across a very general class of models and scales with the entropy of the optimal action distribution. We illustrate through simple analytic examples how information-directed sampling accounts for kinds of information that alternative approaches do not adequately address and that this can lead to dramatic performance gains. For the widely studied Bernoulli, Gaussian, and linear bandit problems, we demonstrate state-of-the-art simulation performance. The electronic companion is available at https://doi.org/10.1287/opre.2017.1663 .

Suggested Citation

  • Daniel Russo & Benjamin Van Roy, 2018. "Learning to Optimize via Information-Directed Sampling," Operations Research, INFORMS, vol. 66(1), pages 230-252, January.
  • Handle: RePEc:inm:oropre:v:66:y:2018:i:1:p:230-252
    DOI: 10.1287/opre.2017.1663
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/opre.2017.1663
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2017.1663?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Peter I. Frazier & Warren B. Powell, 2010. "Paradoxes in Learning and the Marginal Value of Information," Decision Analysis, INFORMS, vol. 7(4), pages 378-403, December.
    2. Sébastien Bubeck & Rémi Munos & Gilles Stoltz & Csaba Szepesvari, 2011. "X-Armed Bandits," Post-Print hal-00450235, HAL.
    3. Steven L. Scott, 2010. "A modern Bayesian look at the multi‐armed bandit," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 26(6), pages 639-658, November.
    4. Josef Broder & Paat Rusmevichientong, 2012. "Dynamic Pricing Under a General Parametric Choice Model," Operations Research, INFORMS, vol. 60(4), pages 965-980, August.
    5. Paat Rusmevichientong & Zuo-Jun Max Shen & David B. Shmoys, 2010. "Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint," Operations Research, INFORMS, vol. 58(6), pages 1666-1680, December.
    6. Paat Rusmevichientong & John N. Tsitsiklis, 2010. "Linearly Parameterized Bandits," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 395-411, May.
    7. Jean-Yves Audibert & Sébastien Bubeck & Gábor Lugosi, 2014. "Regret in Online Combinatorial Optimization," Mathematics of Operations Research, INFORMS, vol. 39(1), pages 31-45, February.
    8. Ilya O. Ryzhov & Warren B. Powell & Peter I. Frazier, 2012. "The Knowledge Gradient Algorithm for a General Class of Online Learning Problems," Operations Research, INFORMS, vol. 60(1), pages 180-195, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bart Van Parys & Negin Golrezaei, 2024. "Optimal Learning for Structured Bandits," Management Science, INFORMS, vol. 70(6), pages 3951-3998, June.
    2. Hanzhao Wang & Xiaocheng Li & Kalyan Talluri, 2022. "Learning to Sell a Focal-ancillary Combination," Papers 2207.11545, arXiv.org.
    3. Chao Qin & Daniel Russo, 2024. "Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification," Papers 2402.10592, arXiv.org, revised Jul 2024.
    4. Saeid Delshad & Amin Khademi, 2022. "Adaptive Design of Personalized Dose-Finding Clinical Trials," Service Science, INFORMS, vol. 14(4), pages 273-291, December.
    5. David B. Brown & Jingwei Zhang, 2022. "Dynamic Programs with Shared Resources and Signals: Dynamic Fluid Policies and Asymptotic Optimality," Operations Research, INFORMS, vol. 70(5), pages 3015-3033, September.
    6. Divya Singhvi & Somya Singhvi, 2025. "Online Learning with Sample Selection Bias," Operations Research, INFORMS, vol. 73(5), pages 2458-2476, September.
    7. Ying Jin & Zhuoran Yang & Zhaoran Wang, 2025. "Is Pessimism Provably Efficient for Offline Reinforcement Learning?," Mathematics of Operations Research, INFORMS, vol. 50(4), pages 2738-2793, November.
    8. Seungki Min & Costis Maglaras & Ciamac C. Moallemi, 2025. "Thompson Sampling with Information Relaxation Penalties," Management Science, INFORMS, vol. 71(3), pages 1988-2010, March.
    9. Boxiao Chen & Cong Shi, 2025. "Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand Learning," Operations Research, INFORMS, vol. 73(4), pages 1723-1743, July.
    10. Vivek F. Farias & Eli Gutin, 2022. "Optimistic Gittins Indices," Operations Research, INFORMS, vol. 70(6), pages 3432-3456, November.
    11. Hamsa Bastani & Mohsen Bayati & Khashayar Khosravi, 2021. "Mostly Exploration-Free Algorithms for Contextual Bandits," Management Science, INFORMS, vol. 67(3), pages 1329-1349, March.
    12. Alok Baveja & Amit Chavan & Andrei Nikiforov & Aravind Srinivasan & Pan Xu, 2025. "Technical Note—Improved Sample-Complexity Bounds in Stochastic Optimization," Operations Research, INFORMS, vol. 73(2), pages 986-994, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yuqing Zhang & Neil Walton, 2019. "Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches," Papers 1907.05381, arXiv.org.
    2. Daniel Russo & Benjamin Van Roy, 2014. "Learning to Optimize via Posterior Sampling," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1221-1243, November.
    3. Daniel Russo & Benjamin Van Roy, 2022. "Satisficing in Time-Sensitive Bandit Learning," Mathematics of Operations Research, INFORMS, vol. 47(4), pages 2815-2839, November.
    4. David Simchi-Levi & Rui Sun & Huanan Zhang, 2022. "Online Learning and Optimization for Revenue Management Problems with Add-on Discounts," Management Science, INFORMS, vol. 68(10), pages 7402-7421, October.
    5. Hao Zhang, 2022. "Dynamic Learning and Decision Making via Basis Weight Vectors," Operations Research, INFORMS, vol. 70(3), pages 1835-1853, May.
    6. Hamsa Bastani & David Simchi-Levi & Ruihao Zhu, 2022. "Meta Dynamic Pricing: Transfer Learning Across Experiments," Management Science, INFORMS, vol. 68(3), pages 1865-1881, March.
    7. Rong Jin & David Simchi-Levi & Li Wang & Xinshang Wang & Sen Yang, 2021. "Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses," Management Science, INFORMS, vol. 67(8), pages 4756-4771, August.
    8. Yuhang Wu & Zeyu Zheng & Guangyu Zhang & Zuohua Zhang & Chu Wang, 2025. "Nonstationary A/B Tests: Optimal Variance Reduction, Bias Correction, and Valid Inference," Management Science, INFORMS, vol. 71(6), pages 4707-4727, June.
    9. Kohei Kawaguchi, 2021. "When Will Workers Follow an Algorithm? A Field Experiment with a Retail Business," Management Science, INFORMS, vol. 67(3), pages 1670-1695, March.
    10. Li Chen & Adam J.Mersereau & Zhe (Frank) Wang, 2017. "Optimal Merchandise Testing with Limited Inventory," Operations Research, INFORMS, vol. 65(4), pages 968-991, August.
    11. Xi Chen & Yining Wang, 2023. "Robust Dynamic Pricing with Demand Learning in the Presence of Outlier Customers," Operations Research, INFORMS, vol. 71(4), pages 1362-1386, July.
    12. Agrawal, Priyank & Tulabandhula, Theja & Avadhanula, Vashist, 2023. "A tractable online learning algorithm for the multinomial logit contextual bandit," European Journal of Operational Research, Elsevier, vol. 310(2), pages 737-750.
    13. Yining Wang & Boxiao Chen & David Simchi-Levi, 2021. "Multimodal Dynamic Pricing," Management Science, INFORMS, vol. 67(10), pages 6136-6152, October.
    14. David B. Brown & James E. Smith, 2013. "Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats," Operations Research, INFORMS, vol. 61(3), pages 644-665, June.
    15. Arnoud V. den Boer & Boxiao Chen & Yining Wang, 2024. "Pricing and Positioning of Horizontally Differentiated Products with Incomplete Demand Information," Operations Research, INFORMS, vol. 72(6), pages 2446-2466, November.
    16. Shipra Agrawal & Vashist Avadhanula & Vineet Goyal & Assaf Zeevi, 2019. "MNL-Bandit: A Dynamic Learning Approach to Assortment Selection," Operations Research, INFORMS, vol. 67(5), pages 1453-1485, September.
    17. N. Bora Keskin & Assaf Zeevi, 2017. "Chasing Demand: Learning and Earning in a Changing Environment," Mathematics of Operations Research, INFORMS, vol. 42(2), pages 277-307, May.
    18. Ying Zhong & L. Jeff Hong & Guangwu Liu, 2021. "Earning and Learning with Varying Cost," Production and Operations Management, Production and Operations Management Society, vol. 30(8), pages 2379-2394, August.
    19. Xi Chen & David Simchi-Levi & Yining Wang, 2026. "Utility Fairness in Contextual Dynamic Pricing with Demand Learning," Management Science, INFORMS, vol. 72(3), pages 2619-2633, March.
    20. Wang Chi Cheung & David Simchi-Levi & He Wang, 2017. "Technical Note—Dynamic Pricing and Demand Learning with Limited Price Experimentation," Operations Research, INFORMS, vol. 65(6), pages 1722-1731, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:66:y:2018:i:1:p:230-252. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.