Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions

Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions

Author

Listed:

Shanyu Han
Yang Liu
Xiang Yu

Abstract

We propose a reinforcement learning (RL) framework under a broad class of risk objectives, characterized by convex scoring functions. This class covers many common risk measures, such as variance, Expected Shortfall, entropic Value-at-Risk, and mean-risk utility. To resolve the time-inconsistency issue, we consider an augmented state space and an auxiliary variable and recast the problem as a two-state optimization problem. We propose a customized Actor-Critic algorithm and establish some theoretical approximation guarantees. A key theoretical contribution is that our results do not require the Markov decision process to be continuous. Additionally, we propose an auxiliary variable sampling method inspired by the alternating minimization algorithm, which is convergent under certain conditions. We validate our approach in simulation experiments with a financial application in statistical arbitrage trading, demonstrating the effectiveness of the algorithm.

Suggested Citation

Shanyu Han & Yang Liu & Xiang Yu, 2025. "Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions," Papers 2505.04553, arXiv.org, revised May 2025.

Handle: RePEc:arx:papers:2505.04553

Download full text from publisher

References listed on IDEAS

R. Rockafellar & Stan Uryasev & Michael Zabarankin, 2006. "Generalized deviations in risk analysis," Finance and Stochastics, Springer, vol. 10(1), pages 51-74, January.
Gneiting, Tilmann, 2011. "Making and Evaluating Point Forecasts," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 746-762.
Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
Bäuerle, Nicole & Glauner, Alexander, 2022. "Markov decision processes with recursive risk measures," European Journal of Operational Research, Elsevier, vol. 296(3), pages 953-966.
Rafael M Frongillo & Ian A Kash, 2021. "Elicitation complexity of statistical properties [A characterization of scoring rules for linear properties]," Biometrika, Biometrika Trust, vol. 108(4), pages 857-879.
Yanwei Jia, 2024. "Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty," Papers 2404.12598, arXiv.org.
Alexander J. McNeil & Rüdiger Frey & Paul Embrechts, 2015. "Quantitative Risk Management: Concepts, Techniques and Tools Revised edition," Economics Books, Princeton University Press, edition 2, number 10496, December.
Fissler, Tobias & Pesenti, Silvana M., 2023. "Sensitivity measures based on scoring functions," European Journal of Operational Research, Elsevier, vol. 307(3), pages 1408-1423.
Cheridito, Patrick & Stadje, Mitja, 2009. "Time-inconsistency of VaR and time-consistent alternatives," Finance Research Letters, Elsevier, vol. 6(1), pages 40-46, March.
Fabio Bellini & Valeria Bignozzi, 2015. "On elicitable risk measures," Quantitative Finance, Taylor & Francis Journals, vol. 15(5), pages 725-733, May.
Yuyu Chen & Peng Liu & Yang Liu & Ruodu Wang, 2022. "Ordering and inequalities for mixtures on risk aggregation," Mathematical Finance, Wiley Blackwell, vol. 32(1), pages 421-451, January.
Haoran Wang & Xun Yu Zhou, 2020. "Continuous‐time mean–variance portfolio selection: A reinforcement learning framework," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1273-1308, October.
Nicole Bäuerle & Jonathan Ott, 2011. "Markov Decision Processes with Average-Value-at-Risk criteria," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 74(3), pages 361-379, December.
Righi, Marcelo Brutti & Müller, Fernanda Maria & Moresco, Marlon Ruoso, 2025. "A risk measurement approach from risk-averse stochastic optimization of score functions," Insurance: Mathematics and Economics, Elsevier, vol. 120(C), pages 42-50.
- Marcelo Brutti Righi & Fernanda Maria Muller & Marlon Ruoso Moresco, 2022. "A risk measurement approach from risk-averse stochastic optimization of score functions," Papers 2208.14809, arXiv.org, revised May 2023.
Aharon Ben‐Tal & Marc Teboulle, 2007. "An Old‐New Concept Of Convex Risk Measures: The Optimized Certainty Equivalent," Mathematical Finance, Wiley Blackwell, vol. 17(3), pages 449-476, July.
Tobias Fissler & Fangda Liu & Ruodu Wang & Linxiao Wei, 2024. "Elicitability and identifiability of tail risk measures," Papers 2404.14136, arXiv.org, revised Oct 2025.
Ben Hambly & Renyuan Xu & Huining Yang, 2023. "Recent advances in reinforcement learning in finance," Mathematical Finance, Wiley Blackwell, vol. 33(3), pages 437-503, July.
Tolulope Fadina & Yang Liu & Ruodu Wang, 2024. "A framework for measures of risk under uncertainty," Finance and Stochastics, Springer, vol. 28(2), pages 363-390, April.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Ruodu Wang & Yunran Wei, 2020. "Risk functionals with convex level sets," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1337-1367, October.
Nicole Bäuerle & Anna Jaśkiewicz, 2024. "Markov decision processes with risk-sensitive criteria: an overview," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 99(1), pages 141-178, April.
Tadese, Mekonnen & Drapeau, Samuel, 2020. "Relative bound and asymptotic comparison of expectile with respect to expected shortfall," Insurance: Mathematics and Economics, Elsevier, vol. 93(C), pages 387-399.
Xia Han & Ruodu Wang & Qinyu Wu, 2025. "Higher-order Gini indices: An axiomatic approach," Papers 2508.10663, arXiv.org, revised Sep 2025.
Tobias Fissler & Fangda Liu & Ruodu Wang & Linxiao Wei, 2024. "Elicitability and identifiability of tail risk measures," Papers 2404.14136, arXiv.org, revised Oct 2025.
Righi, Marcelo Brutti & Müller, Fernanda Maria & Moresco, Marlon Ruoso, 2025. "A risk measurement approach from risk-averse stochastic optimization of score functions," Insurance: Mathematics and Economics, Elsevier, vol. 120(C), pages 42-50.
- Marcelo Brutti Righi & Fernanda Maria Muller & Marlon Ruoso Moresco, 2022. "A risk measurement approach from risk-averse stochastic optimization of score functions," Papers 2208.14809, arXiv.org, revised May 2023.
Paul Embrechts & Tiantian Mao & Qiuqi Wang & Ruodu Wang, 2021. "Bayes risk, elicitability, and the Expected Shortfall," Mathematical Finance, Wiley Blackwell, vol. 31(4), pages 1190-1217, October.
Samuel Drapeau & Mekonnen Tadese, 2019. "Relative Bound and Asymptotic Comparison of Expectile with Respect to Expected Shortfall," Papers 1906.09729, arXiv.org, revised Jun 2020.
Tobias Fissler & Yannick Hoga, 2024. "How to Compare Copula Forecasts?," Papers 2410.04165, arXiv.org.
Bellini, Fabio & Klar, Bernhard & Müller, Alfred & Rosazza Gianin, Emanuela, 2014. "Generalized quantiles as risk measures," Insurance: Mathematics and Economics, Elsevier, vol. 54(C), pages 41-48.
Xia Han & Liyuan Lin & Ruodu Wang, 2022. "Diversification quotients: Quantifying diversification via risk measures," Papers 2206.13679, arXiv.org, revised Jul 2024.
Zhanyi Jiao & Qiuqi Wang & Yimiao Zhao, 2025. "Standard and comparative e-backtests for general risk measures," Papers 2511.05840, arXiv.org.
Marie Kratz & Yen H Lok & Alexander J Mcneil, 2016. "Multinomial var backtests: A simple implicit approach to backtesting expected shortfall," Working Papers hal-01424279, HAL.
Jiang, Yifu & Olmo, Jose & Atwi, Majed, 2025. "High-dimensional multi-period portfolio allocation using deep reinforcement learning," International Review of Economics & Finance, Elsevier, vol. 98(C).
Xia Han & Liyuan Lin & Ruodu Wang, 2023. "Diversification quotients based on VaR and ES," Papers 2301.03517, arXiv.org, revised May 2023.
Han, Xia & Lin, Liyuan & Wang, Ruodu, 2023. "Diversification quotients based on VaR and ES," Insurance: Mathematics and Economics, Elsevier, vol. 113(C), pages 185-197.
Edgars Jakobsons & Steven Vanduffel, 2015. "Dependence Uncertainty Bounds for the Expectile of a Portfolio," Risks, MDPI, vol. 3(4), pages 1-25, December.
Mohammed Berkhouch & Fernanda Maria Müller & Ghizlane Lakhnati & Marcelo Brutti Righi, 2022. "Deviation-Based Model Risk Measures," Computational Economics, Springer;Society for Computational Economics, vol. 59(2), pages 527-547, February.
Tobias Fissler & Jana Hlavinová & Birgit Rudloff, 2021. "Elicitability and identifiability of set-valued measures of systemic risk," Finance and Stochastics, Springer, vol. 25(1), pages 133-165, January.
Miao, Kathleen E. & Pesenti, Silvana M., 2025. "Robust elicitable functionals," European Journal of Operational Research, Elsevier, vol. 326(2), pages 311-325.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-CMP-2025-06-16 (Computational Economics)
NEP-RMG-2025-06-16 (Risk Management)
NEP-UPT-2025-06-16 (Utility Models and Prospect Theory)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2505.04553. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data