Index policy for multiarmed bandit problem with dynamic risk measures

My bibliography Save this article

Index policy for multiarmed bandit problem with dynamic risk measures

Author

Listed:

Malekipirbazari, Milad
Çavuş, Özlem

Registered:

Abstract

The multiarmed bandit problem (MAB) is a classic problem in which a finite amount of resources must be allocated among competing choices with the aim of identifying a policy that maximizes the expected total reward. MAB has a wide range of applications including clinical trials, portfolio design, tuning parameters, internet advertisement, auction mechanisms, adaptive routing in networks, and project management. The classical MAB makes the strong assumption that the decision maker is risk-neutral and indifferent to the variability of the outcome. However, in many real life applications, these assumptions are not met and decision makers are risk-averse. Motivated to resolve this, we study risk-averse control of the multiarmed bandit problem in regard to the concept of dynamic coherent risk measures to determine a policy with the best risk-adjusted total discounted return. In respect of this specific setting, we present a theoretical analysis based on Whittle’s retirement problem and propose a priority-index policy that reduces to the Gittins index when the level of risk-aversion converges to zero. We generalize the restart formulation of the Gittins index to effectively compute these risk-averse allocation indices. Numerical results exhibit the excellent performance of this heuristic approach for two well-known coherent risk measures of first-order mean-semideviation and mean-AVaR. Our experimental studies suggest that there is no guarantee that an index-based optimal policy exists for the risk-averse problem. Nonetheless, our risk-averse allocation indices can achieve optimal or near-optimal policies which in some instances are easier to interpret compared to the exact optimal policy.

Suggested Citation

Malekipirbazari, Milad & Çavuş, Özlem, 2024. "Index policy for multiarmed bandit problem with dynamic risk measures," European Journal of Operational Research, Elsevier, vol. 312(2), pages 627-640.

Handle: RePEc:eee:ejores:v:312:y:2024:i:2:p:627-640
DOI: 10.1016/j.ejor.2023.08.004

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Grechuk, Bogdan & Zabarankin, Michael, 2016. "Inverse portfolio problem with coherent risk measures," European Journal of Operational Research, Elsevier, vol. 249(2), pages 740-750.
Eric Denardo & Eugene Feinberg & Uriel Rothblum, 2013. "The multi-armed bandit, with constraints," Annals of Operations Research, Springer, vol. 208(1), pages 37-62, September.
Andrzej Ruszczyński & Alexander Shapiro, 2006. "Optimization of Convex Risk Functions," Mathematics of Operations Research, INFORMS, vol. 31(3), pages 433-452, August.
- Andrzej Ruszczynski & Alexander Shapiro, 2004. "Optimization of Convex Risk Functions," Risk and Insurance 0404001, University Library of Munich, Germany, revised 08 Oct 2005.
Ogryczak, Wlodzimierz & Ruszczynski, Andrzej, 1999. "From stochastic dominance to mean-risk models: Semideviations as risk measures," European Journal of Operational Research, Elsevier, vol. 116(1), pages 33-50, July.
- W. Ogryczak & A. Ruszczynski, 1997. "From Stochastic Dominance to Mean-Risk Models: Semideviations as Risk Measures," Working Papers ir97027, International Institute for Applied Systems Analysis.
Riedel, Frank, 2004. "Dynamic coherent risk measures," Stochastic Processes and their Applications, Elsevier, vol. 112(2), pages 185-200, August.
- Frank Riedel, 2003. "Dynamic Coherent Risk Measures," Working Papers 03004, Stanford University, Department of Economics.
Sonin, Isaac M., 2008. "A generalized Gittins index for a Markov chain and its recursive calculation," Statistics & Probability Letters, Elsevier, vol. 78(12), pages 1526-1533, September.
Talias, Michael A., 2007. "Optimal decision indices for R&D project evaluation in the pharmaceutical industry: Pearson index versus Gittins index," European Journal of Operational Research, Elsevier, vol. 177(2), pages 1105-1112, March.
Powell, Warren B., 2019. "A unified framework for stochastic optimization," European Journal of Operational Research, Elsevier, vol. 275(3), pages 795-821.
Ricardo Collado & Dávid Papp & Andrzej Ruszczyński, 2012. "Scenario decomposition of risk-averse multistage stochastic programming problems," Annals of Operations Research, Springer, vol. 200(1), pages 147-170, November.
Philippe Artzner & Freddy Delbaen & Jean-Marc Eber & David Heath & Hyejin Ku, 2007. "Coherent multiperiod risk adjusted values and Bellman’s principle," Annals of Operations Research, Springer, vol. 152(1), pages 5-22, July.
Eric V. Denardo & Haechurl Park & Uriel G. Rothblum, 2007. "Risk-Sensitive and Risk-Neutral Multiarmed Bandits," Mathematics of Operations Research, INFORMS, vol. 32(2), pages 374-394, May.
Özlem Çavuş & Andrzej Ruszczyński, 2014. "Computational Methods for Risk-Averse Undiscounted Transient Markov Models," Operations Research, INFORMS, vol. 62(2), pages 401-417, April.
Xiaoguang Huo & Feng Fu, 2017. "Risk-Aware Multi-Armed Bandit Problem with Application to Portfolio Selection," Papers 1709.04415, arXiv.org.
Glazebrook, K. D. & Greatrix, S., 1993. "On scheduling influential stochastic tasks on a single machine," European Journal of Operational Research, Elsevier, vol. 70(3), pages 405-424, November.
Andrzej Ruszczyński & Alexander Shapiro, 2006. "Conditional Risk Mappings," Mathematics of Operations Research, INFORMS, vol. 31(3), pages 544-561, August.
- Andrzej Ruszczynski & Alexander Shapiro, 2004. "Conditional Risk Mappings," Risk and Insurance 0404002, University Library of Munich, Germany, revised 08 Oct 2005.
Jean-Philippe Chancelier & Michel De Lara & André de Palma, 2007. "Risk Aversion, Road Choice, and the One-Armed Bandit Problem," Transportation Science, INFORMS, vol. 41(1), pages 1-14, February.
Philippe Artzner & Freddy Delbaen & Jean‐Marc Eber & David Heath, 1999. "Coherent Measures of Risk," Mathematical Finance, Wiley Blackwell, vol. 9(3), pages 203-228, July.
Michael Jong Kim & Andrew E.B. Lim, 2016. "Robust Multiarmed Bandit Problems," Management Science, INFORMS, vol. 62(1), pages 264-285, January.
Dimitris Bertsimas & José Niño-Mora, 1996. "Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems," Mathematics of Operations Research, INFORMS, vol. 21(2), pages 257-306, May.
Michael N. Katehakis & Arthur F. Veinott, 1987. "The Multi-Armed Bandit Problem: Decomposition and Computation," Mathematics of Operations Research, INFORMS, vol. 12(2), pages 262-268, May.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Abada, Ibrahim & Belkhouja, Mustapha & Ehrenmann, Andreas, 2025. "On the valuation of legacy power production in liberalized markets via option-pricing," European Journal of Operational Research, Elsevier, vol. 322(3), pages 1005-1024.
Malekipirbazari, Milad, 2025. "Optimizing sequential decision-making under risk: Strategic allocation with switching penalties," European Journal of Operational Research, Elsevier, vol. 321(1), pages 160-176.
Teymourian, Ehsan & Yang, Jian, 2025. "Simple fixes that accommodate switching costs in multi-armed bandits," European Journal of Operational Research, Elsevier, vol. 320(3), pages 616-627.
Hu, Hongda & Charpentier, Arthur & Ghossoub, Mario & Schied, Alexander, 2025. "The multi-armed bandit problem under the mean-variance setting," European Journal of Operational Research, Elsevier, vol. 324(1), pages 168-182.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Malekipirbazari, Milad, 2025. "Optimizing sequential decision-making under risk: Strategic allocation with switching penalties," European Journal of Operational Research, Elsevier, vol. 321(1), pages 160-176.
Sıtkı Gülten & Andrzej Ruszczyński, 2015. "Two-stage portfolio optimization with higher-order conditional measures of risk," Annals of Operations Research, Springer, vol. 229(1), pages 409-427, June.
Mahmutoğulları, Ali İrfan & Çavuş, Özlem & Aktürk, M. Selim, 2018. "Bounds on risk-averse mixed-integer multi-stage stochastic programming problems with mean-CVaR," European Journal of Operational Research, Elsevier, vol. 266(2), pages 595-608.
Ricardo Collado & Dávid Papp & Andrzej Ruszczyński, 2012. "Scenario decomposition of risk-averse multistage stochastic programming problems," Annals of Operations Research, Springer, vol. 200(1), pages 147-170, November.
Schur, Rouven & Gönsch, Jochen & Hassler, Michael, 2019. "Time-consistent, risk-averse dynamic pricing," European Journal of Operational Research, Elsevier, vol. 277(2), pages 587-603.
Zachary Feinstein & Birgit Rudloff, 2018. "Scalar multivariate risk measures with a single eligible asset," Papers 1807.10694, arXiv.org, revised Feb 2021.
Zachary Feinstein & Birgit Rudloff, 2018. "Time consistency for scalar multivariate risk measures," Papers 1810.04978, arXiv.org, revised Nov 2021.
Naomi Miller & Andrzej Ruszczyński, 2011. "Risk-Averse Two-Stage Stochastic Linear Programming: Modeling and Decomposition," Operations Research, INFORMS, vol. 59(1), pages 125-132, February.
Collado, Ricardo & Meisel, Stephan & Priekule, Laura, 2017. "Risk-averse stochastic path detection," European Journal of Operational Research, Elsevier, vol. 260(1), pages 195-211.
Andreas H Hamel, 2018. "Monetary Measures of Risk," Papers 1812.04354, arXiv.org.
Samuel N. Cohen & Tanut Treetanthiploet, 2019. "Gittins' theorem under uncertainty," Papers 1907.05689, arXiv.org, revised Jun 2021.
Özlem Çavuş & Andrzej Ruszczyński, 2014. "Computational Methods for Risk-Averse Undiscounted Transient Markov Models," Operations Research, INFORMS, vol. 62(2), pages 401-417, April.
Esther Frostig & Gideon Weiss, 2016. "Four proofs of Gittins’ multiarmed bandit theorem," Annals of Operations Research, Springer, vol. 241(1), pages 127-165, June.
Zachary Feinstein & Birgit Rudloff, 2012. "Multiportfolio time consistency for set-valued convex and coherent risk measures," Papers 1212.5563, arXiv.org, revised Oct 2014.
Christopher W. Miller & Insoon Yang, 2015. "Optimal Control of Conditional Value-at-Risk in Continuous Time," Papers 1512.05015, arXiv.org, revised Jan 2017.
Henri Gérard & Michel Lara & Jean-Philippe Chancelier, 2020. "Equivalence between time consistency and nested formula," Annals of Operations Research, Springer, vol. 292(2), pages 627-647, September.
Kovacevic Raimund M., 2012. "Conditional risk and acceptability mappings as Banach-lattice valued mappings," Statistics & Risk Modeling, De Gruyter, vol. 29(1), pages 1-18, March.
Oscar Dowson & David P. Morton & Bernardo K. Pagnoncelli, 2025. "Incorporating convex risk measures into multistage stochastic programming algorithms," Annals of Operations Research, Springer, vol. 348(2), pages 807-831, May.
repec:hum:wpaper:sfb649dp2007-010 is not listed on IDEAS
Qinyu Wu & Fan Yang & Ping Zhang, 2023. "Conditional generalized quantiles based on expected utility model and equivalent characterization of properties," Papers 2301.12420, arXiv.org.
Alois Pichler & Ruben Schlotter, 2020. "Quantification of Risk in Classical Models of Finance," Papers 2004.04397, arXiv.org, revised Feb 2021.

More about this item

Keywords

; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:312:y:2024:i:2:p:627-640. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Index policy for multiarmed bandit problem with dynamic risk measures

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data