IDEAS home Printed from https://ideas.repec.org/a/spr/mathme/v99y2024i1d10.1007_s00186-024-00857-0.html
   My bibliography  Save this article

Markov decision processes with risk-sensitive criteria: an overview

Author

Listed:
  • Nicole Bäuerle

    (Karlsruhe Institute of Technology (KIT))

  • Anna Jaśkiewicz

    (Wrocław University of Science and Technology)

Abstract

The paper provides an overview of the theory and applications of risk-sensitive Markov decision processes. The term ’risk-sensitive’ refers here to the use of the Optimized Certainty Equivalent as a means to measure expectation and risk. This comprises the well-known entropic risk measure and Conditional Value-at-Risk. We restrict our considerations to stationary problems with an infinite time horizon. Conditions are given under which optimal policies exist and solution procedures are explained. We present both the theory when the Optimized Certainty Equivalent is applied recursively as well as the case where it is applied to the cumulated reward. Discounted as well as non-discounted models are reviewed.

Suggested Citation

  • Nicole Bäuerle & Anna Jaśkiewicz, 2024. "Markov decision processes with risk-sensitive criteria: an overview," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 99(1), pages 141-178, April.
  • Handle: RePEc:spr:mathme:v:99:y:2024:i:1:d:10.1007_s00186-024-00857-0
    DOI: 10.1007/s00186-024-00857-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00186-024-00857-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00186-024-00857-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Homem-de-Mello, Tito & Pagnoncelli, Bernardo K., 2016. "Risk aversion in multistage stochastic programming: A modeling and algorithmic perspective," European Journal of Operational Research, Elsevier, vol. 249(1), pages 188-199.
    2. Goswami, Anindya & Rana, Nimit & Siu, Tak Kuen, 2022. "Regime switching optimal growth model with risk sensitive preferences," Journal of Mathematical Economics, Elsevier, vol. 101(C).
    3. Eric V. Denardo & Haechurl Park & Uriel G. Rothblum, 2007. "Risk-Sensitive and Risk-Neutral Multiarmed Bandits," Mathematics of Operations Research, INFORMS, vol. 32(2), pages 374-394, May.
    4. Stratton C. Jaquette, 1976. "A Utility Criterion for Markov Decision Processes," Management Science, INFORMS, vol. 23(1), pages 43-49, September.
    5. Rolando Cavazos-Cadena & Daniel Hernández-Hernández, 2016. "A Characterization of the Optimal Certainty Equivalent of the Average Cost via the Arrow-Pratt Sensitivity Function," Mathematics of Operations Research, INFORMS, vol. 41(1), pages 224-235, February.
    6. Larry G. Epstein & Stanley E. Zin, 2013. "Substitution, risk aversion and the temporal behavior of consumption and asset returns: A theoretical framework," World Scientific Book Chapters, in: Leonard C MacLean & William T Ziemba (ed.), HANDBOOK OF THE FUNDAMENTALS OF FINANCIAL DECISION MAKING Part I, chapter 12, pages 207-239, World Scientific Publishing Co. Pte. Ltd..
    7. Philippe Weil, 1990. "Nonexpected Utility in Macroeconomics," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 105(1), pages 29-42.
    8. Shapiro, Alexander & Tekaya, Wajdi & da Costa, Joari Paulo & Soares, Murilo Pereira, 2013. "Risk neutral and risk averse Stochastic Dual Dynamic Programming method," European Journal of Operational Research, Elsevier, vol. 224(2), pages 375-391.
    9. Philippe Weil, 1993. "Precautionary Savings and the Permanent Income Hypothesis," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 60(2), pages 367-383.
    10. Nicole Bäuerle & Jonathan Ott, 2011. "Markov Decision Processes with Average-Value-at-Risk criteria," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 74(3), pages 361-379, December.
    11. Staino, Alessandro & Russo, Emilio, 2020. "Nested Conditional Value-at-Risk portfolio selection: A model with temporal dependence driven by market-index volatility," European Journal of Operational Research, Elsevier, vol. 280(2), pages 741-753.
    12. David M. Kreps, 1977. "Decision Problems with Expected Utility Criteria, II: Stationarity," Mathematics of Operations Research, INFORMS, vol. 2(3), pages 266-274, August.
    13. Mokrane Bouakiz & Matthew J. Sobel, 1992. "Inventory Control with an Exponential Utility Criterion," Operations Research, INFORMS, vol. 40(3), pages 603-608, June.
    14. Ozaki, Hiroyuki & Streufert, Peter A., 1996. "Dynamic programming for non-additive stochastic objectives," Journal of Mathematical Economics, Elsevier, vol. 25(4), pages 391-442.
    15. V. S. Borkar, 2002. "Q-Learning for Risk-Sensitive Control," Mathematics of Operations Research, INFORMS, vol. 27(2), pages 294-311, May.
    16. Schur, Rouven & Gönsch, Jochen & Hassler, Michael, 2019. "Time-consistent, risk-averse dynamic pricing," European Journal of Operational Research, Elsevier, vol. 277(2), pages 587-603.
    17. Yulei Luo & Eric R. Young, 2010. "Risk-Sensitive Consumption and Savings under Rational Inattention," American Economic Journal: Macroeconomics, American Economic Association, vol. 2(4), pages 281-325, October.
    18. Lukasz Stettner, 1999. "Risk sensitive portfolio optimization," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 50(3), pages 463-474, December.
    19. Arnab Basu & Tirthankar Bhattacharyya & Vivek S. Borkar, 2008. "A Learning Algorithm for Risk-Sensitive Cost," Mathematics of Operations Research, INFORMS, vol. 33(4), pages 880-898, November.
    20. Nicole Bäuerle & Ulrich Rieder, 2014. "More Risk-Sensitive Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 39(1), pages 105-120, February.
    21. Ronald A. Howard & James E. Matheson, 1972. "Risk-Sensitive Markov Decision Processes," Management Science, INFORMS, vol. 18(7), pages 356-369, March.
    22. Marinacci, Massimo & Montrucchio, Luigi, 2010. "Unique solutions for stochastic recursive utilities," Journal of Economic Theory, Elsevier, vol. 145(5), pages 1776-1804, September.
    23. Bushaj, Sabah & Büyüktahtakın, İ. Esra & Haight, Robert G., 2022. "Risk-averse multi-stage stochastic optimization for surveillance and operations planning of a forest insect infestation," European Journal of Operational Research, Elsevier, vol. 299(3), pages 1094-1110.
    24. Dan A. Iancu & Marek Petrik & Dharmashankar Subramanian, 2015. "Tight Approximations of Dynamic Risk Measures," Mathematics of Operations Research, INFORMS, vol. 40(3), pages 655-682, March.
    25. Duffie, Darrel & Lions, Pierre-Louis, 1992. "PDE solutions of stochastic differential utility," Journal of Mathematical Economics, Elsevier, vol. 21(6), pages 577-606.
    26. Rolando Cavazos-Cadena, 2010. "Optimality equations and inequalities in a class of risk-sensitive average cost Markov decision chains," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 71(1), pages 47-84, February.
    27. David M. Kreps, 1977. "Decision Problems with Expected Utility Critera, I: Upper and Lower Convergent Utility," Mathematics of Operations Research, INFORMS, vol. 2(1), pages 45-53, February.
    28. Shapiro, Alexander, 2021. "Tutorial on risk neutral, distributionally robust and risk averse multistage stochastic programming," European Journal of Operational Research, Elsevier, vol. 288(1), pages 1-13.
    29. Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
    30. Marcin Pitera & Łukasz Stettner, 2023. "Discrete‐time risk sensitive portfolio optimization with proportional transaction costs," Mathematical Finance, Wiley Blackwell, vol. 33(4), pages 1287-1313, October.
    31. Jochen Gönsch & Michael Hassler & Rouven Schur, 2018. "Optimizing conditional value-at-risk in dynamic pricing," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 40(3), pages 711-750, July.
    32. Nicole Bäauerle & Ulrich Rieder, 2017. "Partially Observable Risk-Sensitive Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 42(4), pages 1180-1196, November.
    33. Rolando Cavazos-Cadena, 2018. "Characterization of the Optimal Risk-Sensitive Average Cost in Denumerable Markov Decision Chains," Mathematics of Operations Research, INFORMS, vol. 43(3), pages 1025-1050, August.
    34. Shapiro, Alexander, 2012. "Minimax and risk averse multistage stochastic programming," European Journal of Operational Research, Elsevier, vol. 219(3), pages 719-726.
    35. Zachary Feinstein & Birgit Rudloff, 2017. "A recursive algorithm for multivariate risk measures and a set-valued Bellman’s principle," Journal of Global Optimization, Springer, vol. 68(1), pages 47-69, May.
    36. Bloise, Gaetano & Vailakis, Yiannis, 2018. "Convex dynamic programming with (bounded) recursive utility," Journal of Economic Theory, Elsevier, vol. 173(C), pages 118-141.
    37. Uriel G. Rothblum, 1984. "Multiplicative Markov Decision Chains," Mathematics of Operations Research, INFORMS, vol. 9(1), pages 6-24, February.
    38. Duffie, Darrell & Epstein, Larry G, 1992. "Stochastic Differential Utility," Econometrica, Econometric Society, vol. 60(2), pages 353-394, March.
    39. Holger Kraft & Frank Seifried & Mogens Steffensen, 2013. "Consumption-portfolio optimization with recursive utility in incomplete markets," Finance and Stochastics, Springer, vol. 17(1), pages 161-196, January.
    40. Ari Arapostathis & Vivek S. Borkar & K. Suresh Kumar, 2016. "Risk-Sensitive Control and an Abstract Collatz–Wielandt Formula," Journal of Theoretical Probability, Springer, vol. 29(4), pages 1458-1484, December.
    41. Rolando Cavazos-Cadena & Raúl Montes-de-Oca, 2003. "The Value Iteration Algorithm in Risk-Sensitive Average Markov Decision Chains with Finite State Space," Mathematics of Operations Research, INFORMS, vol. 28(4), pages 752-776, November.
    42. Weini Zhang & Hamed Rahimian & Güzin Bayraksan, 2016. "Decomposition Algorithms for Risk-Averse Multistage Stochastic Programs with Application to Water Allocation under Uncertainty," INFORMS Journal on Computing, INFORMS, vol. 28(3), pages 385-404, August.
    43. Rudloff, Birgit & Street, Alexandre & Valladão, Davi M., 2014. "Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences," European Journal of Operational Research, Elsevier, vol. 234(3), pages 743-750.
    44. Kreps, David M & Porteus, Evan L, 1978. "Temporal Resolution of Uncertainty and Dynamic Choice Theory," Econometrica, Econometric Society, vol. 46(1), pages 185-200, January.
    45. C. Barz & K. Waldmann, 2007. "Risk-sensitive capacity control in revenue management," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 65(3), pages 565-579, June.
    46. Wozabal, David & Rameseder, Gunther, 2020. "Optimal bidding of a virtual power plant on the Spanish day-ahead and intraday market for electricity," European Journal of Operational Research, Elsevier, vol. 280(2), pages 639-655.
    47. Aharon Ben‐Tal & Marc Teboulle, 2007. "An Old‐New Concept Of Convex Risk Measures: The Optimized Certainty Equivalent," Mathematical Finance, Wiley Blackwell, vol. 17(3), pages 449-476, July.
    48. Duffie, Darrell & Epstein, Larry G, 1992. "Asset Pricing with Stochastic Differential Utility," The Review of Financial Studies, Society for Financial Studies, vol. 5(3), pages 411-436.
    49. Antoine Bommier & François Le Grand, 2019. "Risk Aversion and Precautionary Savings in Dynamic Settings," Management Science, INFORMS, vol. 65(3), pages 1386-1397, March.
    50. Ben Hambly & Renyuan Xu & Huining Yang, 2023. "Recent advances in reinforcement learning in finance," Mathematical Finance, Wiley Blackwell, vol. 33(3), pages 437-503, July.
    51. Rolando Cavazos-Cadena & Daniel Hernández-Hernández, 2011. "Discounted Approximations for Risk-Sensitive Average Criteria in Markov Decision Chains with Finite State Space," Mathematics of Operations Research, INFORMS, vol. 36(1), pages 133-146, February.
    52. Li Xia, 2020. "Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2808-2827, December.
    53. Andy Philpott & Vitor de Matos & Erlon Finardi, 2013. "On Solving Multistage Stochastic Programs with Coherent Risk Measures," Operations Research, INFORMS, vol. 61(4), pages 957-970, August.
    54. Antoine Bommier & François Le Grand, 2019. "Risk Aversion and Precautionary Savings in Dynamic Settings," Post-Print hal-02312171, HAL.
    55. Tomasz Bielecki & Daniel Hernández-Hernández & Stanley R. Pliska, 1999. "Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 50(2), pages 167-188, October.
    56. Rainer Schlosser, 2016. "Stochastic dynamic multi-product pricing with dynamic advertising and adoption effects," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 15(2), pages 153-169, April.
    57. V. S. Borkar & S. P. Meyn, 2002. "Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost," Mathematics of Operations Research, INFORMS, vol. 27(1), pages 192-209, February.
    58. Anderson, Evan W., 2005. "The dynamics of risk-sensitive allocations," Journal of Economic Theory, Elsevier, vol. 125(2), pages 93-150, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nicole Bäuerle & Ulrich Rieder, 2014. "More Risk-Sensitive Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 39(1), pages 105-120, February.
    2. Bäuerle, Nicole & Glauner, Alexander, 2022. "Markov decision processes with recursive risk measures," European Journal of Operational Research, Elsevier, vol. 296(3), pages 953-966.
    3. Dirk Becherer & Wilfried Kuissi-Kamdem & Olivier Menoukeu-Pamen, 2023. "Optimal consumption with labor income and borrowing constraints for recursive preferences," Working Papers hal-04017143, HAL.
    4. Li, Hanwu & Riedel, Frank & Yang, Shuzhen, 2024. "Optimal consumption for recursive preferences with local substitution — the case of certainty," Journal of Mathematical Economics, Elsevier, vol. 110(C).
    5. Anis Matoussi & Hao Xing, 2016. "Convex duality for stochastic differential utility," Papers 1601.03562, arXiv.org.
    6. Kraft, Holger & Seifried, Frank Thomas, 2014. "Stochastic differential utility as the continuous-time limit of recursive utility," Journal of Economic Theory, Elsevier, vol. 151(C), pages 528-550.
    7. Wang, Chong & Wang, Neng & Yang, Jinqiang, 2016. "Optimal consumption and savings with stochastic income and recursive utility," Journal of Economic Theory, Elsevier, vol. 165(C), pages 292-331.
    8. Stanca Lorenzo, 2023. "Recursive preferences, correlation aversion, and the temporal resolution of uncertainty," Working papers 080, Department of Economics, Social Studies, Applied Mathematics and Statistics (Dipartimento di Scienze Economico-Sociali e Matematico-Statistiche), University of Torino.
    9. Shigeta, Yuki, 2022. "Quasi-hyperbolic discounting under recursive utility and consumption–investment decisions," Journal of Economic Theory, Elsevier, vol. 204(C).
    10. Luca De Gennaro Aquino & Sascha Desmettre & Yevhen Havrylenko & Mogens Steffensen, 2024. "Equilibrium control theory for Kihlstrom-Mirman preferences in continuous time," Papers 2407.16525, arXiv.org, revised Oct 2024.
    11. Fahrenwaldt, Matthias Albrecht & Jensen, Ninna Reitzel & Steffensen, Mogens, 2020. "Nonrecursive separation of risk and time preferences," Journal of Mathematical Economics, Elsevier, vol. 90(C), pages 95-108.
    12. Aït-Sahalia, Yacine & Matthys, Felix, 2019. "Robust consumption and portfolio policies when asset prices can jump," Journal of Economic Theory, Elsevier, vol. 179(C), pages 1-56.
    13. Joshua Aurand & Yu‐Jui Huang, 2023. "Epstein‐Zin utility maximization on a random horizon," Mathematical Finance, Wiley Blackwell, vol. 33(4), pages 1370-1411, October.
    14. Turnovsky, Stephen J. & Smith, William T., 2006. "Equilibrium consumption and precautionary savings in a stochastically growing economy," Journal of Economic Dynamics and Control, Elsevier, vol. 30(2), pages 243-278, February.
    15. Dumas, Bernard & Uppal, Raman & Wang, Tan, 2000. "Efficient Intertemporal Allocations with Recursive Utility," Journal of Economic Theory, Elsevier, vol. 93(2), pages 240-259, August.
    16. Bäuerle, Nicole & Jaśkiewicz, Anna, 2017. "Optimal dividend payout model with risk sensitive preferences," Insurance: Mathematics and Economics, Elsevier, vol. 73(C), pages 82-93.
    17. Jaroslav Borovička & John Stachurski, 2020. "Necessary and Sufficient Conditions for Existence and Uniqueness of Recursive Utilities," Journal of Finance, American Finance Association, vol. 75(3), pages 1457-1493, June.
    18. Holger Kraft & Thomas Seiferling & Frank Thomas Seifried, 2017. "Optimal consumption and investment with Epstein–Zin recursive utility," Finance and Stochastics, Springer, vol. 21(1), pages 187-226, January.
    19. Schroder, Mark & Skiadas, Costis, 1999. "Optimal Consumption and Portfolio Selection with Stochastic Differential Utility," Journal of Economic Theory, Elsevier, vol. 89(1), pages 68-126, November.
    20. Aase, Knut K., 2014. "Recursive utility and jump-diffusions," Discussion Papers 2014/9, Norwegian School of Economics, Department of Business and Management Science.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:mathme:v:99:y:2024:i:1:d:10.1007_s00186-024-00857-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.