IDEAS home Printed from https://ideas.repec.org/a/spr/mathme/v99y2024i1d10.1007_s00186-024-00857-0.html
   My bibliography  Save this article

Markov decision processes with risk-sensitive criteria: an overview

Author

Listed:
  • Nicole Bäuerle

    (Karlsruhe Institute of Technology (KIT))

  • Anna Jaśkiewicz

    (Wrocław University of Science and Technology)

Abstract

The paper provides an overview of the theory and applications of risk-sensitive Markov decision processes. The term ’risk-sensitive’ refers here to the use of the Optimized Certainty Equivalent as a means to measure expectation and risk. This comprises the well-known entropic risk measure and Conditional Value-at-Risk. We restrict our considerations to stationary problems with an infinite time horizon. Conditions are given under which optimal policies exist and solution procedures are explained. We present both the theory when the Optimized Certainty Equivalent is applied recursively as well as the case where it is applied to the cumulated reward. Discounted as well as non-discounted models are reviewed.

Suggested Citation

  • Nicole Bäuerle & Anna Jaśkiewicz, 2024. "Markov decision processes with risk-sensitive criteria: an overview," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 99(1), pages 141-178, April.
  • Handle: RePEc:spr:mathme:v:99:y:2024:i:1:d:10.1007_s00186-024-00857-0
    DOI: 10.1007/s00186-024-00857-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00186-024-00857-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00186-024-00857-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Homem-de-Mello, Tito & Pagnoncelli, Bernardo K., 2016. "Risk aversion in multistage stochastic programming: A modeling and algorithmic perspective," European Journal of Operational Research, Elsevier, vol. 249(1), pages 188-199.
    2. Shapiro, Alexander & Tekaya, Wajdi & da Costa, Joari Paulo & Soares, Murilo Pereira, 2013. "Risk neutral and risk averse Stochastic Dual Dynamic Programming method," European Journal of Operational Research, Elsevier, vol. 224(2), pages 375-391.
    3. Philippe Weil, 1990. "Nonexpected Utility in Macroeconomics," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 105(1), pages 29-42.
    4. Mokrane Bouakiz & Matthew J. Sobel, 1992. "Inventory Control with an Exponential Utility Criterion," Operations Research, INFORMS, vol. 40(3), pages 603-608, June.
    5. Nicole Bäuerle & Ulrich Rieder, 2014. "More Risk-Sensitive Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 39(1), pages 105-120, February.
    6. Larry G. Epstein & Stanley E. Zin, 2013. "Substitution, risk aversion and the temporal behavior of consumption and asset returns: A theoretical framework," World Scientific Book Chapters, in: Leonard C MacLean & William T Ziemba (ed.), HANDBOOK OF THE FUNDAMENTALS OF FINANCIAL DECISION MAKING Part I, chapter 12, pages 207-239, World Scientific Publishing Co. Pte. Ltd..
    7. Dan A. Iancu & Marek Petrik & Dharmashankar Subramanian, 2015. "Tight Approximations of Dynamic Risk Measures," Mathematics of Operations Research, INFORMS, vol. 40(3), pages 655-682, March.
    8. Rolando Cavazos-Cadena, 2010. "Optimality equations and inequalities in a class of risk-sensitive average cost Markov decision chains," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 71(1), pages 47-84, February.
    9. Marcin Pitera & Łukasz Stettner, 2023. "Discrete‐time risk sensitive portfolio optimization with proportional transaction costs," Mathematical Finance, Wiley Blackwell, vol. 33(4), pages 1287-1313, October.
    10. Shapiro, Alexander, 2012. "Minimax and risk averse multistage stochastic programming," European Journal of Operational Research, Elsevier, vol. 219(3), pages 719-726.
    11. Zachary Feinstein & Birgit Rudloff, 2017. "A recursive algorithm for multivariate risk measures and a set-valued Bellman’s principle," Journal of Global Optimization, Springer, vol. 68(1), pages 47-69, May.
    12. Bloise, Gaetano & Vailakis, Yiannis, 2018. "Convex dynamic programming with (bounded) recursive utility," Journal of Economic Theory, Elsevier, vol. 173(C), pages 118-141.
    13. Uriel G. Rothblum, 1984. "Multiplicative Markov Decision Chains," Mathematics of Operations Research, INFORMS, vol. 9(1), pages 6-24, February.
    14. Duffie, Darrell & Epstein, Larry G, 1992. "Stochastic Differential Utility," Econometrica, Econometric Society, vol. 60(2), pages 353-394, March.
    15. Rudloff, Birgit & Street, Alexandre & Valladão, Davi M., 2014. "Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences," European Journal of Operational Research, Elsevier, vol. 234(3), pages 743-750.
    16. Philippe Weil, 1993. "Precautionary Savings and the Permanent Income Hypothesis," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 60(2), pages 367-383.
    17. Rolando Cavazos-Cadena & Daniel Hernández-Hernández, 2011. "Discounted Approximations for Risk-Sensitive Average Criteria in Markov Decision Chains with Finite State Space," Mathematics of Operations Research, INFORMS, vol. 36(1), pages 133-146, February.
    18. Li Xia, 2020. "Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2808-2827, December.
    19. Andy Philpott & Vitor de Matos & Erlon Finardi, 2013. "On Solving Multistage Stochastic Programs with Coherent Risk Measures," Operations Research, INFORMS, vol. 61(4), pages 957-970, August.
    20. Anderson, Evan W., 2005. "The dynamics of risk-sensitive allocations," Journal of Economic Theory, Elsevier, vol. 125(2), pages 93-150, December.
    21. Goswami, Anindya & Rana, Nimit & Siu, Tak Kuen, 2022. "Regime switching optimal growth model with risk sensitive preferences," Journal of Mathematical Economics, Elsevier, vol. 101(C).
    22. Nicole Bäuerle & Jonathan Ott, 2011. "Markov Decision Processes with Average-Value-at-Risk criteria," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 74(3), pages 361-379, December.
    23. Ozaki, Hiroyuki & Streufert, Peter A., 1996. "Dynamic programming for non-additive stochastic objectives," Journal of Mathematical Economics, Elsevier, vol. 25(4), pages 391-442.
    24. Arnab Basu & Tirthankar Bhattacharyya & Vivek S. Borkar, 2008. "A Learning Algorithm for Risk-Sensitive Cost," Mathematics of Operations Research, INFORMS, vol. 33(4), pages 880-898, November.
    25. Marinacci, Massimo & Montrucchio, Luigi, 2010. "Unique solutions for stochastic recursive utilities," Journal of Economic Theory, Elsevier, vol. 145(5), pages 1776-1804, September.
    26. Bushaj, Sabah & Büyüktahtakın, İ. Esra & Haight, Robert G., 2022. "Risk-averse multi-stage stochastic optimization for surveillance and operations planning of a forest insect infestation," European Journal of Operational Research, Elsevier, vol. 299(3), pages 1094-1110.
    27. David M. Kreps, 1977. "Decision Problems with Expected Utility Critera, I: Upper and Lower Convergent Utility," Mathematics of Operations Research, INFORMS, vol. 2(1), pages 45-53, February.
    28. Shapiro, Alexander, 2021. "Tutorial on risk neutral, distributionally robust and risk averse multistage stochastic programming," European Journal of Operational Research, Elsevier, vol. 288(1), pages 1-13.
    29. Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
    30. Jochen Gönsch & Michael Hassler & Rouven Schur, 2018. "Optimizing conditional value-at-risk in dynamic pricing," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 40(3), pages 711-750, July.
    31. Nicole Bäauerle & Ulrich Rieder, 2017. "Partially Observable Risk-Sensitive Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 42(4), pages 1180-1196, November.
    32. C. Barz & K. Waldmann, 2007. "Risk-sensitive capacity control in revenue management," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 65(3), pages 565-579, June.
    33. Ben Hambly & Renyuan Xu & Huining Yang, 2023. "Recent advances in reinforcement learning in finance," Mathematical Finance, Wiley Blackwell, vol. 33(3), pages 437-503, July.
    34. Rainer Schlosser, 2016. "Stochastic dynamic multi-product pricing with dynamic advertising and adoption effects," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 15(2), pages 153-169, April.
    35. V. S. Borkar & S. P. Meyn, 2002. "Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost," Mathematics of Operations Research, INFORMS, vol. 27(1), pages 192-209, February.
    36. Eric V. Denardo & Haechurl Park & Uriel G. Rothblum, 2007. "Risk-Sensitive and Risk-Neutral Multiarmed Bandits," Mathematics of Operations Research, INFORMS, vol. 32(2), pages 374-394, May.
    37. Stratton C. Jaquette, 1976. "A Utility Criterion for Markov Decision Processes," Management Science, INFORMS, vol. 23(1), pages 43-49, September.
    38. Rolando Cavazos-Cadena & Daniel Hernández-Hernández, 2016. "A Characterization of the Optimal Certainty Equivalent of the Average Cost via the Arrow-Pratt Sensitivity Function," Mathematics of Operations Research, INFORMS, vol. 41(1), pages 224-235, February.
    39. Staino, Alessandro & Russo, Emilio, 2020. "Nested Conditional Value-at-Risk portfolio selection: A model with temporal dependence driven by market-index volatility," European Journal of Operational Research, Elsevier, vol. 280(2), pages 741-753.
    40. Duffie, Darrel & Lions, Pierre-Louis, 1992. "PDE solutions of stochastic differential utility," Journal of Mathematical Economics, Elsevier, vol. 21(6), pages 577-606.
    41. Rolando Cavazos-Cadena, 2018. "Characterization of the Optimal Risk-Sensitive Average Cost in Denumerable Markov Decision Chains," Mathematics of Operations Research, INFORMS, vol. 43(3), pages 1025-1050, August.
    42. Holger Kraft & Frank Seifried & Mogens Steffensen, 2013. "Consumption-portfolio optimization with recursive utility in incomplete markets," Finance and Stochastics, Springer, vol. 17(1), pages 161-196, January.
    43. Wozabal, David & Rameseder, Gunther, 2020. "Optimal bidding of a virtual power plant on the Spanish day-ahead and intraday market for electricity," European Journal of Operational Research, Elsevier, vol. 280(2), pages 639-655.
    44. Aharon Ben‐Tal & Marc Teboulle, 2007. "An Old‐New Concept Of Convex Risk Measures: The Optimized Certainty Equivalent," Mathematical Finance, Wiley Blackwell, vol. 17(3), pages 449-476, July.
    45. Antoine Bommier & François Le Grand, 2019. "Risk Aversion and Precautionary Savings in Dynamic Settings," Management Science, INFORMS, vol. 65(3), pages 1386-1397, March.
    46. Tomasz Bielecki & Daniel Hernández-Hernández & Stanley R. Pliska, 1999. "Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 50(2), pages 167-188, October.
    47. David M. Kreps, 1977. "Decision Problems with Expected Utility Criteria, II: Stationarity," Mathematics of Operations Research, INFORMS, vol. 2(3), pages 266-274, August.
    48. V. S. Borkar, 2002. "Q-Learning for Risk-Sensitive Control," Mathematics of Operations Research, INFORMS, vol. 27(2), pages 294-311, May.
    49. Schur, Rouven & Gönsch, Jochen & Hassler, Michael, 2019. "Time-consistent, risk-averse dynamic pricing," European Journal of Operational Research, Elsevier, vol. 277(2), pages 587-603.
    50. Yulei Luo & Eric R. Young, 2010. "Risk-Sensitive Consumption and Savings under Rational Inattention," American Economic Journal: Macroeconomics, American Economic Association, vol. 2(4), pages 281-325, October.
    51. Lukasz Stettner, 1999. "Risk sensitive portfolio optimization," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 50(3), pages 463-474, December.
    52. Ronald A. Howard & James E. Matheson, 1972. "Risk-Sensitive Markov Decision Processes," Management Science, INFORMS, vol. 18(7), pages 356-369, March.
    53. Ari Arapostathis & Vivek S. Borkar & K. Suresh Kumar, 2016. "Risk-Sensitive Control and an Abstract Collatz–Wielandt Formula," Journal of Theoretical Probability, Springer, vol. 29(4), pages 1458-1484, December.
    54. Rolando Cavazos-Cadena & Raúl Montes-de-Oca, 2003. "The Value Iteration Algorithm in Risk-Sensitive Average Markov Decision Chains with Finite State Space," Mathematics of Operations Research, INFORMS, vol. 28(4), pages 752-776, November.
    55. Weini Zhang & Hamed Rahimian & Güzin Bayraksan, 2016. "Decomposition Algorithms for Risk-Averse Multistage Stochastic Programs with Application to Water Allocation under Uncertainty," INFORMS Journal on Computing, INFORMS, vol. 28(3), pages 385-404, August.
    56. Kreps, David M & Porteus, Evan L, 1978. "Temporal Resolution of Uncertainty and Dynamic Choice Theory," Econometrica, Econometric Society, vol. 46(1), pages 185-200, January.
    57. Duffie, Darrell & Epstein, Larry G, 1992. "Asset Pricing with Stochastic Differential Utility," The Review of Financial Studies, Society for Financial Studies, vol. 5(3), pages 411-436.
    58. Antoine Bommier & François Le Grand, 2019. "Risk Aversion and Precautionary Savings in Dynamic Settings," Post-Print hal-02312171, HAL.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bäuerle, Nicole & Glauner, Alexander, 2022. "Markov decision processes with recursive risk measures," European Journal of Operational Research, Elsevier, vol. 296(3), pages 953-966.
    2. Nicole Bäuerle & Ulrich Rieder, 2014. "More Risk-Sensitive Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 39(1), pages 105-120, February.
    3. Anis Matoussi & Hao Xing, 2016. "Convex duality for stochastic differential utility," Papers 1601.03562, arXiv.org.
    4. Li, Hanwu & Riedel, Frank & Yang, Shuzhen, 2024. "Optimal consumption for recursive preferences with local substitution — the case of certainty," Journal of Mathematical Economics, Elsevier, vol. 110(C).
    5. Shigeta, Yuki, 2022. "Quasi-hyperbolic discounting under recursive utility and consumption–investment decisions," Journal of Economic Theory, Elsevier, vol. 204(C).
    6. Dirk Becherer & Wilfried Kuissi-Kamdem & Olivier Menoukeu-Pamen, 2023. "Optimal consumption with labor income and borrowing constraints for recursive preferences," Working Papers hal-04017143, HAL.
    7. Kraft, Holger & Seifried, Frank Thomas, 2014. "Stochastic differential utility as the continuous-time limit of recursive utility," Journal of Economic Theory, Elsevier, vol. 151(C), pages 528-550.
    8. Wang, Chong & Wang, Neng & Yang, Jinqiang, 2016. "Optimal consumption and savings with stochastic income and recursive utility," Journal of Economic Theory, Elsevier, vol. 165(C), pages 292-331.
    9. Stanca Lorenzo, 2023. "Recursive preferences, correlation aversion, and the temporal resolution of uncertainty," Working papers 080, Department of Economics, Social Studies, Applied Mathematics and Statistics (Dipartimento di Scienze Economico-Sociali e Matematico-Statistiche), University of Torino.
    10. Luca De Gennaro Aquino & Sascha Desmettre & Yevhen Havrylenko & Mogens Steffensen, 2024. "Equilibrium control theory for Kihlstrom-Mirman preferences in continuous time," Papers 2407.16525, arXiv.org, revised Oct 2024.
    11. Fahrenwaldt, Matthias Albrecht & Jensen, Ninna Reitzel & Steffensen, Mogens, 2020. "Nonrecursive separation of risk and time preferences," Journal of Mathematical Economics, Elsevier, vol. 90(C), pages 95-108.
    12. Joshua Aurand & Yu‐Jui Huang, 2023. "Epstein‐Zin utility maximization on a random horizon," Mathematical Finance, Wiley Blackwell, vol. 33(4), pages 1370-1411, October.
    13. Aase, Knut K., 2014. "Recursive utility and jump-diffusions," Discussion Papers 2014/9, Norwegian School of Economics, Department of Business and Management Science.
    14. Jaroslav Borovička & John Stachurski, 2020. "Necessary and Sufficient Conditions for Existence and Uniqueness of Recursive Utilities," Journal of Finance, American Finance Association, vol. 75(3), pages 1457-1493, June.
    15. Thomas Douenne, 2020. "Disaster Risks, Disaster Strikes, and Economic Growth: the Role of Preferences," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 38, pages 251-272, October.
    16. Dumas, Bernard & Uppal, Raman & Wang, Tan, 2000. "Efficient Intertemporal Allocations with Recursive Utility," Journal of Economic Theory, Elsevier, vol. 93(2), pages 240-259, August.
    17. Garcia, Rene & Renault, Eric & Semenov, Andrei, 2006. "Disentangling risk aversion and intertemporal substitution through a reference level," Finance Research Letters, Elsevier, vol. 3(3), pages 181-193, September.
    18. Turnovsky, Stephen J. & Smith, William T., 2006. "Equilibrium consumption and precautionary savings in a stochastically growing economy," Journal of Economic Dynamics and Control, Elsevier, vol. 30(2), pages 243-278, February.
    19. Roche, Hervé, 2011. "Asset prices in an exchange economy when agents have heterogeneous homothetic recursive preferences and no risk free bond is available," Journal of Economic Dynamics and Control, Elsevier, vol. 35(1), pages 80-96, January.
    20. Smith, William & Son, Young Seob, 2005. "Can the desire to conserve our natural resources be self-defeating?," Journal of Environmental Economics and Management, Elsevier, vol. 49(1), pages 52-67, January.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:mathme:v:99:y:2024:i:1:d:10.1007_s00186-024-00857-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.