IDEAS home Printed from https://ideas.repec.org/a/bla/popmgt/v29y2020i12p2808-2827.html
   My bibliography  Save this article

Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

Author

Listed:
  • Li Xia

Abstract

This study investigates the optimization problem of an infinite stage discrete time Markov decision process (MDP) with a long‐run average metric considering both mean and variance of rewards together. Such performance metric is important since the mean indicates average returns and the variance indicates risk or fairness. However, the variance metric couples the rewards at all stages, the traditional dynamic programming is inapplicable as the principle of time consistency fails. We study this problem from a new perspective called the sensitivity‐based optimization theory. A performance difference formula is derived and it can quantify the difference of the mean‐variance combined metrics of MDPs under any two different policies. The difference formula can be utilized to generate new policies with strictly improved mean‐variance performance. A necessary condition of the optimal policy and the optimality of deterministic policies are derived. We further develop an iterative algorithm with a form of policy iteration, which is proved to converge to local optima both in the mixed and randomized policy space. Specially, when the mean reward is constant in policies, the algorithm is guaranteed to converge to the global optimum. Finally, we apply our approach to study the fluctuation reduction of wind power in an energy storage system, which demonstrates the potential applicability of our optimization method.

Suggested Citation

  • Li Xia, 2020. "Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2808-2827, December.
  • Handle: RePEc:bla:popmgt:v:29:y:2020:i:12:p:2808-2827
    DOI: 10.1111/poms.13252
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/poms.13252
    Download Restriction: no

    File URL: https://libkey.io/10.1111/poms.13252?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Li, Y.Z. & Wu, Q.H. & Li, M.S. & Zhan, J.P., 2014. "Mean-variance model for power system economic dispatch with wind power integrated," Energy, Elsevier, vol. 72(C), pages 510-520.
    2. Guo, Xianping & Ye, Liuer & Yin, George, 2012. "A mean–variance optimization problem for discounted Markov decision processes," European Journal of Operational Research, Elsevier, vol. 220(2), pages 423-429.
    3. Andrzej Ruszczyński & Alexander Shapiro, 2006. "Conditional Risk Mappings," Mathematics of Operations Research, INFORMS, vol. 31(3), pages 544-561, August.
    4. Panos Parpas & Berç Rustem, 2007. "Computational Assessment of Nested Benders and Augmented Lagrangian Decomposition for Mean-Variance Multistage Stochastic Problems," INFORMS Journal on Computing, INFORMS, vol. 19(2), pages 239-247, May.
    5. Erick Delage & Shie Mannor, 2010. "Percentile Optimization for Markov Decision Processes with Parameter Uncertainty," Operations Research, INFORMS, vol. 58(1), pages 203-213, February.
    6. Panos Kouvelis & Zhan Pang & Qing Ding, 2018. "Integrated Commodity Inventory Management and Financial Hedging: A Dynamic Mean†Variance Analysis," Production and Operations Management, Production and Operations Management Society, vol. 27(6), pages 1052-1073, June.
    7. Mannor, Shie & Tsitsiklis, John N., 2013. "Algorithmic aspects of mean–variance optimization in Markov decision processes," European Journal of Operational Research, Elsevier, vol. 231(3), pages 645-653.
    8. Philippe Artzner & Freddy Delbaen & Jean‐Marc Eber & David Heath, 1999. "Coherent Measures of Risk," Mathematical Finance, Wiley Blackwell, vol. 9(3), pages 203-228, July.
    9. Kun-Jen Chung, 1994. "Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case," Operations Research, INFORMS, vol. 42(1), pages 184-188, February.
    10. Matthew J. Sobel, 1994. "Mean-Variance Tradeoffs in an Undiscounted MDP," Operations Research, INFORMS, vol. 42(1), pages 175-183, February.
    11. David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
    12. Juzhi Zhang & Suresh P. Sethi & Tsan‐Ming Choi & T. C. E. Cheng, 2020. "Supply Chains Involving a Mean‐Variance‐Skewness‐Kurtosis Newsvendor: Analysis and Coordination," Production and Operations Management, Production and Operations Management Society, vol. 29(6), pages 1397-1430, June.
    13. Bäuerle, Nicole & Jaśkiewicz, Anna, 2015. "Risk-sensitive dividend problems," European Journal of Operational Research, Elsevier, vol. 242(1), pages 161-171.
    14. Arnab Nilim & Laurent El Ghaoui, 2005. "Robust Control of Markov Decision Processes with Uncertain Transition Matrices," Operations Research, INFORMS, vol. 53(5), pages 780-798, October.
    15. Chun‐Hung Chiu & Tsan‐Ming Choi & Xin Dai & Bin Shen & Jin‐Hui Zheng, 2018. "Optimal Advertising Budget Allocation in Luxury Fashion Markets with Social Influences: A Mean‐Variance Analysis," Production and Operations Management, Production and Operations Management Society, vol. 27(8), pages 1611-1629, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jing-Yu Ma & Quan-Lin Li, 2022. "Optimal dynamic mining policy of blockchain selfish mining through sensitivity-based optimization," Journal of Combinatorial Optimization, Springer, vol. 44(5), pages 3663-3700, December.
    2. Ma, Shuai & Ma, Xiaoteng & Xia, Li, 2023. "A unified algorithm framework for mean-variance optimization in discounted Markov decision processes," European Journal of Operational Research, Elsevier, vol. 311(3), pages 1057-1067.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ma, Shuai & Ma, Xiaoteng & Xia, Li, 2023. "A unified algorithm framework for mean-variance optimization in discounted Markov decision processes," European Journal of Operational Research, Elsevier, vol. 311(3), pages 1057-1067.
    2. Alessandro Arlotto & Noah Gans & J. Michael Steele, 2014. "Markov Decision Problems Where Means Bound Variances," Operations Research, INFORMS, vol. 62(4), pages 864-875, August.
    3. Anthony Coache & Sebastian Jaimungal, 2021. "Reinforcement Learning with Dynamic Convex Risk Measures," Papers 2112.13414, arXiv.org, revised Nov 2022.
    4. Zachary Feinstein & Birgit Rudloff, 2018. "Scalar multivariate risk measures with a single eligible asset," Papers 1807.10694, arXiv.org, revised Feb 2021.
    5. Alois Pichler & Ruben Schlotter, 2020. "Quantification of Risk in Classical Models of Finance," Papers 2004.04397, arXiv.org, revised Feb 2021.
    6. Saeed Marzban & Erick Delage & Jonathan Yumeng Li, 2020. "Equal Risk Pricing and Hedging of Financial Derivatives with Convex Risk Measures," Papers 2002.02876, arXiv.org, revised Sep 2020.
    7. Zeynep Turgay & Fikri Karaesmen & Egemen Lerzan Örmeci, 2018. "Structural properties of a class of robust inventory and queueing control problems," Naval Research Logistics (NRL), John Wiley & Sons, vol. 65(8), pages 699-716, December.
    8. Haoran Wang & Xun Yu Zhou, 2020. "Continuous‐time mean–variance portfolio selection: A reinforcement learning framework," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1273-1308, October.
    9. Dan A. Iancu & Marek Petrik & Dharmashankar Subramanian, 2015. "Tight Approximations of Dynamic Risk Measures," Mathematics of Operations Research, INFORMS, vol. 40(3), pages 655-682, March.
    10. Zachary Feinstein & Birgit Rudloff, 2012. "Multiportfolio time consistency for set-valued convex and coherent risk measures," Papers 1212.5563, arXiv.org, revised Oct 2014.
    11. Haoran Wang & Shi Yu, 2021. "Robo-Advising: Enhancing Investment with Inverse Optimization and Deep Reinforcement Learning," Papers 2105.09264, arXiv.org.
    12. Schur, Rouven & Gönsch, Jochen & Hassler, Michael, 2019. "Time-consistent, risk-averse dynamic pricing," European Journal of Operational Research, Elsevier, vol. 277(2), pages 587-603.
    13. Shie Mannor & Ofir Mebel & Huan Xu, 2016. "Robust MDPs with k -Rectangular Uncertainty," Mathematics of Operations Research, INFORMS, vol. 41(4), pages 1484-1509, November.
    14. Shapiro, Alexander, 2021. "Tutorial on risk neutral, distributionally robust and risk averse multistage stochastic programming," European Journal of Operational Research, Elsevier, vol. 288(1), pages 1-13.
    15. Anthony Coache & Sebastian Jaimungal & 'Alvaro Cartea, 2022. "Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning," Papers 2206.14666, arXiv.org, revised May 2023.
    16. Saghafian, Soroush, 2018. "Ambiguous partially observable Markov decision processes: Structural results and applications," Journal of Economic Theory, Elsevier, vol. 178(C), pages 1-35.
    17. Sun, Xuting & Chung, Sai-Ho & Choi, Tsan-Ming & Sheu, Jiuh-Biing & Ma, Hoi Lam, 2020. "Combating lead-time uncertainty in global supply chain's shipment-assignment: Is it wise to be risk-averse?," Transportation Research Part B: Methodological, Elsevier, vol. 138(C), pages 406-434.
    18. David L. Kaufman & Andrew J. Schaefer, 2013. "Robust Modified Policy Iteration," INFORMS Journal on Computing, INFORMS, vol. 25(3), pages 396-410, August.
    19. Bren, Austin & Saghafian, Soroush, 2018. "Data-Driven Percentile Optimization for Multi-Class Queueing Systems with Model Ambiguity: Theory and Application," Working Paper Series rwp18-008, Harvard University, John F. Kennedy School of Government.
    20. Zachary Feinstein & Birgit Rudloff, 2018. "Time consistency for scalar multivariate risk measures," Papers 1810.04978, arXiv.org, revised Nov 2021.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:popmgt:v:29:y:2020:i:12:p:2808-2827. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1937-5956 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.