IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2505.04553.html

Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions

Author

Listed:
  • Shanyu Han
  • Yang Liu
  • Xiang Yu

Abstract

We propose a reinforcement learning (RL) framework under a broad class of risk objectives, characterized by convex scoring functions. This class covers many common risk measures, such as variance, Expected Shortfall, entropic Value-at-Risk, and mean-risk utility. To resolve the time-inconsistency issue, we consider an augmented state space and an auxiliary variable and recast the problem as a two-state optimization problem. We propose a customized Actor-Critic algorithm and establish some theoretical approximation guarantees. A key theoretical contribution is that our results do not require the Markov decision process to be continuous. Additionally, we propose an auxiliary variable sampling method inspired by the alternating minimization algorithm, which is convergent under certain conditions. We validate our approach in simulation experiments with a financial application in statistical arbitrage trading, demonstrating the effectiveness of the algorithm.

Suggested Citation

  • Shanyu Han & Yang Liu & Xiang Yu, 2025. "Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions," Papers 2505.04553, arXiv.org, revised May 2025.
  • Handle: RePEc:arx:papers:2505.04553
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2505.04553
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. R. Rockafellar & Stan Uryasev & Michael Zabarankin, 2006. "Generalized deviations in risk analysis," Finance and Stochastics, Springer, vol. 10(1), pages 51-74, January.
    2. Gneiting, Tilmann, 2011. "Making and Evaluating Point Forecasts," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 746-762.
    3. Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
    4. Bäuerle, Nicole & Glauner, Alexander, 2022. "Markov decision processes with recursive risk measures," European Journal of Operational Research, Elsevier, vol. 296(3), pages 953-966.
    5. Rafael M Frongillo & Ian A Kash, 2021. "Elicitation complexity of statistical properties [A characterization of scoring rules for linear properties]," Biometrika, Biometrika Trust, vol. 108(4), pages 857-879.
    6. Yanwei Jia, 2024. "Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty," Papers 2404.12598, arXiv.org, revised Mar 2026.
    7. Alexander J. McNeil & Rüdiger Frey & Paul Embrechts, 2015. "Quantitative Risk Management: Concepts, Techniques and Tools Revised edition," Economics Books, Princeton University Press, edition 2, number 10496, December.
    8. Johanna F. Ziegel, 2016. "Coherence And Elicitability," Mathematical Finance, Wiley Blackwell, vol. 26(4), pages 901-918, October.
    9. Fissler, Tobias & Pesenti, Silvana M., 2023. "Sensitivity measures based on scoring functions," European Journal of Operational Research, Elsevier, vol. 307(3), pages 1408-1423.
    10. Cheridito, Patrick & Stadje, Mitja, 2009. "Time-inconsistency of VaR and time-consistent alternatives," Finance Research Letters, Elsevier, vol. 6(1), pages 40-46, March.
    11. Fabio Bellini & Valeria Bignozzi, 2015. "On elicitable risk measures," Quantitative Finance, Taylor & Francis Journals, vol. 15(5), pages 725-733, May.
    12. Yuyu Chen & Peng Liu & Yang Liu & Ruodu Wang, 2022. "Ordering and inequalities for mixtures on risk aggregation," Mathematical Finance, Wiley Blackwell, vol. 32(1), pages 421-451, January.
    13. Haoran Wang & Xun Yu Zhou, 2020. "Continuous‐time mean–variance portfolio selection: A reinforcement learning framework," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1273-1308, October.
    14. Nicole Bäuerle & Jonathan Ott, 2011. "Markov Decision Processes with Average-Value-at-Risk criteria," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 74(3), pages 361-379, December.
    15. Righi, Marcelo Brutti & Müller, Fernanda Maria & Moresco, Marlon Ruoso, 2025. "A risk measurement approach from risk-averse stochastic optimization of score functions," Insurance: Mathematics and Economics, Elsevier, vol. 120(C), pages 42-50.
    16. Aharon Ben‐Tal & Marc Teboulle, 2007. "An Old‐New Concept Of Convex Risk Measures: The Optimized Certainty Equivalent," Mathematical Finance, Wiley Blackwell, vol. 17(3), pages 449-476, July.
    17. Tobias Fissler & Fangda Liu & Ruodu Wang & Linxiao Wei, 2024. "Elicitability and identifiability of tail risk measures," Papers 2404.14136, arXiv.org, revised Oct 2025.
    18. Ben Hambly & Renyuan Xu & Huining Yang, 2023. "Recent advances in reinforcement learning in finance," Mathematical Finance, Wiley Blackwell, vol. 33(3), pages 437-503, July.
    19. Tolulope Fadina & Yang Liu & Ruodu Wang, 2024. "A framework for measures of risk under uncertainty," Finance and Stochastics, Springer, vol. 28(2), pages 363-390, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Federico Cacciamani & Roberto Daluiso & Marco Pinciroli & Michele Trapletti & Edoardo Vittori, 2026. "Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning," Papers 2602.12030, arXiv.org.
    2. Shanyu Han & Yangbo He & Yang Liu, 2025. "Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning," Papers 2512.24580, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ruodu Wang & Yunran Wei, 2020. "Risk functionals with convex level sets," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1337-1367, October.
    2. Tadese, Mekonnen & Drapeau, Samuel, 2020. "Relative bound and asymptotic comparison of expectile with respect to expected shortfall," Insurance: Mathematics and Economics, Elsevier, vol. 93(C), pages 387-399.
    3. Tobias Fissler & Fangda Liu & Ruodu Wang & Linxiao Wei, 2024. "Elicitability and identifiability of tail risk measures," Papers 2404.14136, arXiv.org, revised Oct 2025.
    4. Righi, Marcelo Brutti & Müller, Fernanda Maria & Moresco, Marlon Ruoso, 2025. "A risk measurement approach from risk-averse stochastic optimization of score functions," Insurance: Mathematics and Economics, Elsevier, vol. 120(C), pages 42-50.
    5. Yinhuan Li & Chenxin Lyu & Ruodu Wang, 2026. "Adaptive Window Selection for Financial Risk Forecasting," Papers 2603.01157, arXiv.org.
    6. Paul Embrechts & Tiantian Mao & Qiuqi Wang & Ruodu Wang, 2021. "Bayes risk, elicitability, and the Expected Shortfall," Mathematical Finance, Wiley Blackwell, vol. 31(4), pages 1190-1217, October.
    7. Fabio Bellini & Muqiao Huang & Qiuqi Wang & Ruodu Wang, 2025. "Lambda Expected Shortfall," Papers 2512.23139, arXiv.org, revised Jan 2026.
    8. Nicole Bäuerle & Anna Jaśkiewicz, 2024. "Markov decision processes with risk-sensitive criteria: an overview," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 99(1), pages 141-178, April.
    9. Marie Kratz & Yen H Lok & Alexander J Mcneil, 2016. "Multinomial var backtests: A simple implicit approach to backtesting expected shortfall," Working Papers hal-01424279, HAL.
    10. Xia Han & Liyuan Lin & Ruodu Wang, 2023. "Diversification quotients based on VaR and ES," Papers 2301.03517, arXiv.org, revised May 2023.
    11. Han, Xia & Lin, Liyuan & Wang, Ruodu, 2023. "Diversification quotients based on VaR and ES," Insurance: Mathematics and Economics, Elsevier, vol. 113(C), pages 185-197.
    12. Mohammed Berkhouch & Fernanda Maria Müller & Ghizlane Lakhnati & Marcelo Brutti Righi, 2022. "Deviation-Based Model Risk Measures," Computational Economics, Springer;Society for Computational Economics, vol. 59(2), pages 527-547, February.
    13. Miao, Kathleen E. & Pesenti, Silvana M., 2025. "Robust elicitable functionals," European Journal of Operational Research, Elsevier, vol. 326(2), pages 311-325.
    14. Xia Han & Ruodu Wang & Qinyu Wu, 2025. "Higher-order Gini indices: An axiomatic approach," Papers 2508.10663, arXiv.org, revised Sep 2025.
    15. Yang Liu & Yunran Wei & Xintao Ye, 2026. "Weighted Generalized Risk Measure and Risk Quadrangle: Characterization, Optimization and Application," Papers 2603.10327, arXiv.org, revised Mar 2026.
    16. Valeria Bignozzi & Matteo Burzoni & Cosimo Munari, 2020. "Risk Measures Based on Benchmark Loss Distributions," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 87(2), pages 437-475, June.
    17. Mucahit Aygun & Fabio Bellini & Roger J. A. Laeven, 2023. "Elicitability of Return Risk Measures," Papers 2302.13070, arXiv.org, revised Mar 2023.
    18. Xia Han & Liyuan Lin & Hao Wang & Ruodu Wang, 2024. "Diversification quotient based on expectiles," Papers 2411.14646, arXiv.org, revised Nov 2024.
    19. Mucahit Aygun & Fabio Bellini & Roger J. A. Laeven, 2025. "Generalized Orlicz premia," Papers 2507.09181, arXiv.org.
    20. Gyöngyi Bugár, 2019. "A Breakthrough Idea in Risk Measure Validation – Is the Way Paved for an Effective Expected Shortfall Backtest?," Financial and Economic Review, Magyar Nemzeti Bank (Central Bank of Hungary), vol. 18(4), pages 130-145.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2505.04553. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.