IDEAS home Printed from https://ideas.repec.org/a/wly/apsmbi/v26y2010i6p639-658.html
   My bibliography  Save this article

A modern Bayesian look at the multi‐armed bandit

Author

Listed:
  • Steven L. Scott

Abstract

A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.

Suggested Citation

  • Steven L. Scott, 2010. "A modern Bayesian look at the multi‐armed bandit," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 26(6), pages 639-658, November.
  • Handle: RePEc:wly:apsmbi:v:26:y:2010:i:6:p:639-658
    DOI: 10.1002/asmb.874
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asmb.874
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asmb.874?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. T. Law & J. Shawe-Taylor, 2017. "Practical Bayesian support vector regression for financial time series prediction and market condition change detection," Quantitative Finance, Taylor & Francis Journals, vol. 17(9), pages 1403-1416, September.
    2. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    3. Hana Choi & Carl F. Mela & Santiago R. Balseiro & Adam Leary, 2020. "Online Display Advertising Markets: A Literature Review and Future Directions," Information Systems Research, INFORMS, vol. 31(2), pages 556-575, June.
    4. Eric M. Schwartz & Eric T. Bradlow & Peter S. Fader, 2017. "Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments," Marketing Science, INFORMS, vol. 36(4), pages 500-522, July.
    5. Chao Qin & Daniel Russo, 2024. "Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification," Papers 2402.10592, arXiv.org.
    6. Dean Eckles & Maurits Kaptein, 2019. "Bootstrap Thompson Sampling and Sequential Decision Problems in the Behavioral Sciences," SAGE Open, , vol. 9(2), pages 21582440198, June.
    7. Sareh Nabi & Houssam Nassif & Joseph Hong & Hamed Mamani & Guido Imbens, 2022. "Bayesian Meta-Prior Learning Using Empirical Bayes," Management Science, INFORMS, vol. 68(3), pages 1737-1755, March.
    8. Daniel Russo & Benjamin Van Roy, 2018. "Learning to Optimize via Information-Directed Sampling," Operations Research, INFORMS, vol. 66(1), pages 230-252, January.
    9. Kohei Kawaguchi, 2021. "When Will Workers Follow an Algorithm? A Field Experiment with a Retail Business," Management Science, INFORMS, vol. 67(3), pages 1670-1695, March.
    10. Mingyu Joo & Michael L. Thompson & Greg M. Allenby6, 2019. "Optimal Product Design by Sequential Experiments in High Dimensions," Management Science, INFORMS, vol. 65(7), pages 3235-3254, July.
    11. Stefano Caria & Grant Gordon & Maximilian Kasy & Simon Quinn & Soha Shami & Alexander Teytelboym, 2020. "An Adaptive Targeted Field Experiment: Job Search Assistance for Refugees in Jordan," CESifo Working Paper Series 8535, CESifo.
    12. Ben Vinod & Richard Ratliff & Vikram Jayaram, 2018. "An approach to offer management: maximizing sales with fare products and ancillaries," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 17(2), pages 91-101, April.
    13. Duflo, Esther & Banerjee, Abhijit & Keniston, Daniel, 2019. "The Efficient Deployment of Police Resources: Theory and New Evidence from a Randomized Drunk Driving Crackdown in India," CEPR Discussion Papers 13981, C.E.P.R. Discussion Papers.
    14. Gui Liberali & Alina Ferecatu, 2022. "Morphing for Consumer Dynamics: Bandits Meet Hidden Markov Models," Marketing Science, INFORMS, vol. 41(4), pages 769-794, July.
    15. Alina Ferecatu & Arnaud De Bruyn, 2022. "Understanding Managers’ Trade-Offs Between Exploration and Exploitation," Marketing Science, INFORMS, vol. 41(1), pages 139-165, January.
    16. Maria Dimakopoulou & Zhimei Ren & Zhengyuan Zhou, 2021. "Online Multi-Armed Bandits with Adaptive Inference," Papers 2102.13202, arXiv.org, revised Jun 2021.
    17. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    18. Manini Madireddy & Ramasubramanian Sundararajan & Goda Doreswamy & Meisam Hejazi Nia & Amod Mital, 2017. "Constructing bundled offers for airline customers," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 16(6), pages 532-552, December.
    19. Mauersberger, Felix, 2021. "Monetary policy rules in a non-rational world: A macroeconomic experiment," Journal of Economic Theory, Elsevier, vol. 197(C).
    20. Elea McDonnell Feit & Ron Berman, 2019. "Test & Roll: Profit-Maximizing A/B Tests," Marketing Science, INFORMS, vol. 38(6), pages 1038-1058, November.
    21. Po-Yi Liu & Chi-Hua Wang & Henghsiu Tsai, 2022. "Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed Pricing," Papers 2208.09372, arXiv.org, revised Sep 2022.
    22. Yixin Tang & Yicong Lin & Navdeep S. Sahni, 2023. "Business Policy Experiments using Fractional Factorial Designs: Consumer Retention on DoorDash," Papers 2311.14698, arXiv.org, revised Nov 2023.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:apsmbi:v:26:y:2010:i:6:p:639-658. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://doi.org/10.1002/(ISSN)1526-4025 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.