Simple Bayesian Algorithms for Best-Arm Identification

My bibliography Save this article

Simple Bayesian Algorithms for Best-Arm Identification

Author

Listed:

Daniel Russo
(Columbia University, New York, New York 10027)

Registered:

Abstract

This paper considers the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality with the goal of confidently identifying the best design after a small number of measurements. This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort and formalizes a sense in which these seemingly naive rules are the best possible. One proposal is top-two probability sampling, which computes the two designs with the highest posterior probability of being optimal and then randomizes to select among these two. One is a variant of top-two sampling that considers not only the probability that a design is optimal, but the expected amount by which its quality exceeds that of other designs. The final algorithm is a modified version of Thompson sampling that is tailored for identifying the best design. We prove that these simple algorithms satisfy a sharp optimality property. In a frequentist setting where the true quality of the designs is fixed, one hopes that the posterior definitively identifies the optimal design, in the sense that that the posterior probability assigned to the event that some other design is optimal converges to zero as measurements are collected. We show that under the proposed algorithms, this convergence occurs at an exponential rate, and the corresponding exponent is the best possible among all allocation rules. It should be highlighted that the proposed algorithms depend on a single tuning parameter, which determines the probability used when randomizing among the top-two designs. Attaining the optimal rate of posterior convergence requires either that this parameter is set optimally or is tuned adaptively toward the optimal value. The paper goes further, characterizing the exponent attained on any problem instance and for any value of the tunable parameter. This exponent is interpreted as being optimal among a constrained class of allocation rules. Finally, considerable robustness to this parameter is established through numerical experiments and theoretical results. When this parameter is set to 1/2, the exponent attained is within a factor of 2 of best possible across all problem instances.

Suggested Citation

Daniel Russo, 2020. "Simple Bayesian Algorithms for Best-Arm Identification," Operations Research, INFORMS, vol. 68(6), pages 1625-1647, November.

Handle: RePEc:inm:oropre:v:68:y:2020:i:6:p:1625-1647
DOI: 10.1287/opre.2019.1911

Download full text from publisher

References listed on IDEAS

Eric C. Ni & Dragos F. Ciocan & Shane G. Henderson & Susan R. Hunter, 2017. "Efficient Ranking and Selection in Parallel Computing Environments," Operations Research, INFORMS, vol. 65(3), pages 821-836, June.
Chun-Hung Chen & Stephen E. Chick & Loo Hay Lee & Nugroho A. Pujowidianto, 2015. "Ranking and Selection: Efficient Simulation Budget Allocation," International Series in Operations Research & Management Science, in: Michael C Fu (ed.), Handbook of Simulation Optimization, edition 127, chapter 0, pages 45-80, Springer.
Stephen E. Chick & Noah Gans, 2009. "Economic Analysis of Simulation Selection Problems," Management Science, INFORMS, vol. 55(3), pages 421-437, March.
Weiwei Fan & L. Jeff Hong & Barry L. Nelson, 2016. "Indifference-Zone-Free Selection of the Best," Operations Research, INFORMS, vol. 64(6), pages 1499-1514, December.
Stephen E. Chick & Peter Frazier, 2012. "Sequential Sampling with Economics of Selection Procedures," Management Science, INFORMS, vol. 58(3), pages 550-569, March.
Ilya O. Ryzhov, 2016. "On the Convergence Rates of Expected Improvement Methods," Operations Research, INFORMS, vol. 64(6), pages 1515-1528, December.
Sébastien Bubeck & Rémi Munos & Gilles Stoltz, 2010. "Pure Exploration for Multi-Armed Bandit Problems," Working Papers hal-00257454, HAL.
Susan R. Hunter & Raghu Pasupathy, 2013. "Optimal Sampling Laws for Stochastically Constrained Simulation Optimization on Finite Sets," INFORMS Journal on Computing, INFORMS, vol. 25(3), pages 527-542, August.
Peter I. Frazier, 2014. "A Fully Sequential Elimination Procedure for Indifference-Zone Ranking and Selection with Tight Bounds on Probability of Correct Selection," Operations Research, INFORMS, vol. 62(4), pages 926-942, August.
Ilya O. Ryzhov & Warren B. Powell & Peter I. Frazier, 2012. "The Knowledge Gradient Algorithm for a General Class of Online Learning Problems," Operations Research, INFORMS, vol. 60(1), pages 180-195, February.
Stephen E. Chick & Jürgen Branke & Christian Schmidt, 2010. "Sequential Sampling to Myopically Maximize the Expected Value of Information," INFORMS Journal on Computing, INFORMS, vol. 22(1), pages 71-80, February.
Stephen E. Chick & Koichiro Inoue, 2001. "New Two-Stage and Sequential Procedures for Selecting the Best Simulated System," Operations Research, INFORMS, vol. 49(5), pages 732-743, October.
L. Jeff Hong & Barry L. Nelson & Jie Xu, 2015. "Discrete Optimization via Simulation," International Series in Operations Research & Management Science, in: Michael C Fu (ed.), Handbook of Simulation Optimization, edition 127, chapter 0, pages 9-44, Springer.
Chun-Hung Chen & Donghai He & Michael Fu & Loo Hay Lee, 2008. "Efficient Simulation Budget Allocation for Selecting an Optimal Subset," INFORMS Journal on Computing, INFORMS, vol. 20(4), pages 579-595, November.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Ye Chen & Ilya O. Ryzhov, 2023. "Balancing Optimal Large Deviations in Sequential Selection," Management Science, INFORMS, vol. 69(6), pages 3457-3473, June.
Nicolò Cesa‐Bianchi & Roberto Colomboni & Maximilian Kasy, 2025. "Adaptive Maximization of Social Welfare," Econometrica, Econometric Society, vol. 93(3), pages 1073-1104, May.
- Nicolo Cesa-Bianchi & Roberto Colomboni & Maximilian Kasy, 2023. "Adaptive maximization of social welfare," Papers 2310.09597, arXiv.org, revised Jul 2024.
- Cesa-Bianchi, Nicol`o & Colomboni, Roberto & Kasy, Maximilian, 2024. "Adaptive Maximization of Social Welfare," OSF Preprints u9xjg, Center for Open Science.
- Cesa-Bianchi, Nicolò & Colomboni, Roberto & Kasy, Maximilian, 2024. "Adaptive Maximization of Social Welfare," IZA Discussion Papers 17186, Institute of Labor Economics (IZA).
- Nicolò Cesa-Bianchi & Roberto Colomboni & Maximilian Kasy, 2024. "Adaptive Maximization of Social Welfare," CESifo Working Paper Series 11259, CESifo.
- Cesa-Bianchi, Nicol`o & Colomboni, Roberto & Kasy, Maximilian, 2024. "Adaptive Maximization of Social Welfare," SocArXiv bgcjk, Center for Open Science.
Cheng, Zhenxia & Luo, Jun & Wu, Ruijing, 2023. "On the finite-sample statistical validity of adaptive fully sequential procedures," European Journal of Operational Research, Elsevier, vol. 307(1), pages 266-278.
Victor F. Araman & René A. Caldentey, 2022. "Diffusion Approximations for a Class of Sequential Experimentation Problems," Management Science, INFORMS, vol. 68(8), pages 5958-5979, August.
Yinchu Zhu & Ilya O. Ryzhov, 2022. "Optimal data-driven hiring with equity for underrepresented groups," Papers 2206.09300, arXiv.org.
Guido Imbens & Chao Qin & Stefan Wager, 2025. "Admissibility of Completely Randomized Trials: A Large-Deviation Approach," Papers 2506.05329, arXiv.org.
repec:osf:osfxxx:u9xjg_v1 is not listed on IDEAS
David J. Eckman & Shane G. Henderson, 2022. "Posterior-Based Stopping Rules for Bayesian Ranking-and-Selection Procedures," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1711-1728, May.
Masahiro Kato, 2023. "Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a Fixed Budget," Papers 2310.19788, arXiv.org, revised Mar 2024.
repec:osf:socarx:bgcjk_v1 is not listed on IDEAS
Esposito Acosta,Bruno Nicola & Sautmann,Anja, 2022. "Adaptive Experiments for Policy Choice : Phone Calls for Home Reading in Kenya," Policy Research Working Paper Series 10098, The World Bank.
Masahiro Kato, 2023. "Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances," Papers 2312.12741, arXiv.org, revised Mar 2024.
Stephen E. Chick & Noah Gans & Özge Yapar, 2022. "Bayesian Sequential Learning for Clinical Trials of Multiple Correlated Medical Interventions," Management Science, INFORMS, vol. 68(7), pages 4919-4938, July.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Haihui Shen & L. Jeff Hong & Xiaowei Zhang, 2021. "Ranking and Selection with Covariates for Personalized Decision Making," INFORMS Journal on Computing, INFORMS, vol. 33(4), pages 1500-1519, October.
Zhongshun Shi & Yijie Peng & Leyuan Shi & Chun-Hung Chen & Michael C. Fu, 2022. "Dynamic Sampling Allocation Under Finite Simulation Budget for Feasibility Determination," INFORMS Journal on Computing, INFORMS, vol. 34(1), pages 557-568, January.
Weiwei Fan & L. Jeff Hong & Xiaowei Zhang, 2020. "Distributionally Robust Selection of the Best," Management Science, INFORMS, vol. 66(1), pages 190-208, January.
Juergen Branke & Wen Zhang, 2019. "Identifying efficient solutions via simulation: myopic multi-objective budget allocation for the bi-objective case," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 41(3), pages 831-865, September.
Gongbo Zhang & Yijie Peng & Jianghua Zhang & Enlu Zhou, 2023. "Asymptotically Optimal Sampling Policy for Selecting Top- m Alternatives," INFORMS Journal on Computing, INFORMS, vol. 35(6), pages 1261-1285, November.
L. Jeff Hong & Guangxin Jiang & Ying Zhong, 2022. "Solving Large-Scale Fixed-Budget Ranking and Selection Problems," INFORMS Journal on Computing, INFORMS, vol. 34(6), pages 2930-2949, November.
Zhong, Ying & Du, Jianzhong & Li, Deng-Feng & Hu, Zhaolin, 2025. "Reference alternatives based knockout-tournament procedure for ranking and selection," European Journal of Operational Research, Elsevier, vol. 320(3), pages 628-641.
Demet Batur & F. Fred Choobineh, 2021. "Selecting the Best Alternative Based on Its Quantile," INFORMS Journal on Computing, INFORMS, vol. 33(2), pages 657-671, May.
Ilya O. Ryzhov, 2016. "On the Convergence Rates of Expected Improvement Methods," Operations Research, INFORMS, vol. 64(6), pages 1515-1528, December.
David J. Eckman & Shane G. Henderson, 2022. "Posterior-Based Stopping Rules for Bayesian Ranking-and-Selection Procedures," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1711-1728, May.
Jing Xie & Peter I. Frazier, 2013. "Sequential Bayes-Optimal Policies for Multiple Comparisons with a Known Standard," Operations Research, INFORMS, vol. 61(5), pages 1174-1189, October.
Zhongshun Shi & Siyang Gao & Hui Xiao & Weiwei Chen, 2019. "A worst‐case formulation for constrained ranking and selection with input uncertainty," Naval Research Logistics (NRL), John Wiley & Sons, vol. 66(8), pages 648-662, December.
Powell, Warren B., 2019. "A unified framework for stochastic optimization," European Journal of Operational Research, Elsevier, vol. 275(3), pages 795-821.
Eric M. Schwartz & Eric T. Bradlow & Peter S. Fader, 2017. "Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments," Marketing Science, INFORMS, vol. 36(4), pages 500-522, July.
Ye Chen & Ilya O. Ryzhov, 2023. "Balancing Optimal Large Deviations in Sequential Selection," Management Science, INFORMS, vol. 69(6), pages 3457-3473, June.
Chao Qin & Daniel Russo, 2024. "Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification," Papers 2402.10592, arXiv.org, revised Jul 2024.
Weiwei Fan & L. Jeff Hong & Barry L. Nelson, 2016. "Indifference-Zone-Free Selection of the Best," Operations Research, INFORMS, vol. 64(6), pages 1499-1514, December.
Cheng Li & Siyang Gao & Jianzhong Du, 2023. "Convergence Analysis of Stochastic Kriging-Assisted Simulation with Random Covariates," INFORMS Journal on Computing, INFORMS, vol. 35(2), pages 386-402, March.
Saeid Delshad & Amin Khademi, 2020. "Information theory for ranking and selection," Naval Research Logistics (NRL), John Wiley & Sons, vol. 67(4), pages 239-253, June.
Rojas Gonzalez, Sebastian & Branke, Juergen & Van Nieuwenhuyse, Inneke, 2025. "Bi-objective ranking and selection using stochastic kriging," European Journal of Operational Research, Elsevier, vol. 322(2), pages 599-614.

More about this item

Keywords

multiarmed bandit; ranking and selection; Bayesian;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:68:y:2020:i:6:p:1625-1647. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Simple Bayesian Algorithms for Best-Arm Identification

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data