Robust Risk-Aware Reinforcement Learning

My bibliography Save this paper

Robust Risk-Aware Reinforcement Learning

Author

Listed:

Sebastian Jaimungal
Silvana Pesenti
Ye Sheng Wang
Hariom Tatsat

Registered:

Abstract

We present a reinforcement learning (RL) approach for robust optimisation of risk-aware performance criteria. To allow agents to express a wide variety of risk-reward profiles, we assess the value of a policy using rank dependent expected utility (RDEU). RDEU allows the agent to seek gains, while simultaneously protecting themselves against downside risk. To robustify optimal policies against model uncertainty, we assess a policy not by its distribution, but rather, by the worst possible distribution that lies within a Wasserstein ball around it. Thus, our problem formulation may be viewed as an actor/agent choosing a policy (the outer problem), and the adversary then acting to worsen the performance of that strategy (the inner problem). We develop explicit policy gradient formulae for the inner and outer problems, and show its efficacy on three prototypical financial problems: robust portfolio allocation, optimising a benchmark, and statistical arbitrage.

Suggested Citation

Sebastian Jaimungal & Silvana Pesenti & Ye Sheng Wang & Hariom Tatsat, 2021. "Robust Risk-Aware Reinforcement Learning," Papers 2108.10403, arXiv.org, revised Dec 2021.

Handle: RePEc:arx:papers:2108.10403

Download full text from publisher

References listed on IDEAS

Paul Milgrom & Ilya Segal, 2002. "Envelope Theorems for Arbitrary Choice Sets," Econometrica, Econometric Society, vol. 70(2), pages 583-601, March.
Yaari, Menahem E, 1987. "The Dual Theory of Choice under Risk," Econometrica, Econometric Society, vol. 55(1), pages 95-115, January.
Georg Pflug & David Wozabal, 2007. "Ambiguity in portfolio selection," Quantitative Finance, Taylor & Francis Journals, vol. 7(4), pages 435-442.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
Christa Cuchiero & Guido Gazzani & Irene Klein, 2022. "Risk measures under model uncertainty: a Bayesian viewpoint," Papers 2204.07115, arXiv.org.
Sebastian Jaimungal, 2022. "Reinforcement learning and stochastic optimisation," Finance and Stochastics, Springer, vol. 26(1), pages 103-129, January.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Chatterjee, Kalyan & Vijay Krishna, R., 2011. "A nonsmooth approach to nonexpected utility theory under risk," Mathematical Social Sciences, Elsevier, vol. 62(3), pages 166-175.
Silvana Pesenti & Sebastian Jaimungal, 2020. "Portfolio Optimisation within a Wasserstein Ball," Papers 2012.04500, arXiv.org, revised Jun 2022.
Xia Han & Ruodu Wang & Xun Yu Zhou, 2022. "Choquet regularization for reinforcement learning," Papers 2208.08497, arXiv.org.
Viet Anh Nguyen & Soroosh Shafiee & Damir Filipovi'c & Daniel Kuhn, 2021. "Mean-Covariance Robust Risk Measurement," Papers 2112.09959, arXiv.org, revised Nov 2023.
Kaido, Hiroaki, 2017. "Asymptotically Efficient Estimation Of Weighted Average Derivatives With An Interval Censored Variable," Econometric Theory, Cambridge University Press, vol. 33(5), pages 1218-1241, October.
- Hiroaki Kaido, 2013. "Asymptotically Efficient Estimation of Weighted Average Derivatives with an Inverval Censored Variable," Boston University - Department of Economics - Working Papers Series 2013-022, Boston University - Department of Economics.
- Hiroaki Kaido, 2014. "Asymptotically efficient estimation of weighted average derivatives with an interval censored variable," CeMMAP working papers 03/14, Institute for Fiscal Studies.
- Hiroaki Kaido, 2014. "Asymptotically efficient estimation of weighted average derivatives with an interval censored variable," CeMMAP working papers CWP03/14, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Król, Michał, 2012. "Product differentiation decisions under ambiguous consumer demand and pessimistic expectations," International Journal of Industrial Organization, Elsevier, vol. 30(6), pages 593-604.
- Michal Król, 2011. "Product differentiation decisions under ambiguous consumer demand and pessimistic expectations," Economics Discussion Paper Series 1103, Economics, The University of Manchester.
Loebbing, Jonas, 2018. "An Elementary Theory of Endogenous Technical Change and Wage Inequality," VfS Annual Conference 2018 (Freiburg, Breisgau): Digital Economy 181603, Verein für Socialpolitik / German Economic Association.
Zhi Chen & Melvyn Sim & Huan Xu, 2019. "Distributionally Robust Optimization with Infinitely Constrained Ambiguity Sets," Operations Research, INFORMS, vol. 67(5), pages 1328-1344, September.
Stefanie Stantcheva, 2020. "Dynamic Taxation," Annual Review of Economics, Annual Reviews, vol. 12(1), pages 801-831, August.
- Stantcheva, Stefanie, 2020. "Dynamic Taxation," CEPR Discussion Papers 14347, C.E.P.R. Discussion Papers.
- Stefanie Stantcheva, 2020. "Dynamic Taxation," NBER Working Papers 26704, National Bureau of Economic Research, Inc.
Drouhin, Nicolas, 2015. "A rank-dependent utility model of uncertain lifetime," Journal of Economic Dynamics and Control, Elsevier, vol. 53(C), pages 208-224.
- Nicolas Drouhin, 2015. "A rank-dependent utility model of uncertain lifetime," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-01238589, HAL.
- Nicolas Drouhin, 2015. "A rank-dependent utility model of uncertain lifetime," Post-Print halshs-01238589, HAL.
Alain Chateauneuf & Patrick Moyes, 2005. "Lorenz non-consistent welfare and inequality measurement," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 2(2), pages 61-87, January.
- Alain Chateauneuf & Patrick Moyes, 2004. "Lorenz non-consistent welfare and inequality measurement," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 2(2), pages 61-87, August.
- Alain Chateauneuf & Patrick Moyes, 2004. "Lorenz non-consistent welfare and inequality measurement," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-00156441, HAL.
- Alain Chateauneuf & Patrick Moyes, 2004. "Lorenz non-consistent welfare and inequality measurement," Post-Print hal-00156441, HAL.
- Alain Chateauneuf & Patrick Moyes, 2004. "Lorenz non-consistent welfare and inequality measurement," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-00160177, HAL.
- Alain Chateauneuf & Patrick Moyes, 2004. "Lorenz Non-Consistent Welfare and Inequality Measurement," IDEP Working Papers 0406, Institut d'economie publique (IDEP), Marseille, France, revised May 2004.
- Alain Chateauneuf & Patrick Moyes, 2004. "Lorenz non-consistent welfare and inequality measurement," Post-Print hal-00160177, HAL.
Koessler, Frédéric & Skreta, Vasiliki, 2016. "Informed seller with taste heterogeneity," Journal of Economic Theory, Elsevier, vol. 165(C), pages 456-471.
- Frédéric Koessler & Vassiliki Skreta, 2016. "Informed seller with taste heterogeneity," PSE - Labex "OSE-Ouvrir la Science Economique" halshs-01379293, HAL.
- Frédéric Koessler & Vassiliki Skreta, 2016. "Informed seller with taste heterogeneity," PSE-Ecole d'économie de Paris (Postprint) halshs-01379293, HAL.
- Frédéric Koessler & Vassiliki Skreta, 2016. "Informed seller with taste heterogeneity," Post-Print halshs-01379293, HAL.
Epstein, Larry G. & Zin, Stanley E., 2001. "The independence axiom and asset returns," Journal of Empirical Finance, Elsevier, vol. 8(5), pages 537-572, December.
- Larry G. Epstein & Stanley E. Zin, 1991. "The Independence Axiom and Asset Returns," NBER Technical Working Papers 0109, National Bureau of Economic Research, Inc.
Samuel Daudin, 2022. "Optimal Control of Diffusion Processes with Terminal Constraint in Law," Journal of Optimization Theory and Applications, Springer, vol. 195(1), pages 1-41, October.
Itzhak Gilboa & Andrew Postlewaite & Larry Samuelson & David Schmeidler, 2019. "What are axiomatizations good for?," Theory and Decision, Springer, vol. 86(3), pages 339-359, May.
- Itzhak Gilboa & Andrew Postlewaite & Larry Samuelson & David Schmeidler, 2018. "What Are Axiomatizations Good For?," Working Papers hal-01933876, HAL.
- Itzhak Gilboa & Andrew Postlewaite & Larry Samuelson & David Schmeidler, 2018. "What Are Axiomatizations Good For?," PIER Working Paper Archive 18-026, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 22 Oct 2018.
- Gilboa, Itzhak & Postlewaite, Andrew & Samuelson, Larry & Schmeidler, David, 2018. "What are Axiomatizations Good for?," HEC Research Papers Series 1318, HEC Paris.
Filiz-Ozbay, Emel & Guryan, Jonathan & Hyndman, Kyle & Kearney, Melissa & Ozbay, Erkut Y., 2015. "Do lottery payments induce savings behavior? Evidence from the lab," Journal of Public Economics, Elsevier, vol. 126(C), pages 1-24.
- Emel Filiz-Ozbay & Jonathan Guryan & Kyle Hyndman & Melissa Schettini Kearney & Erkut Y. Ozbay, 2013. "Do Lottery Payments Induce Savings Behavior: Evidence from the Lab," NBER Working Papers 19130, National Bureau of Economic Research, Inc.
Cerreia-Vioglio, Simone & Maccheroni, Fabio & Marinacci, Massimo & Montrucchio, Luigi, 2012. "Probabilistic sophistication, second order stochastic dominance and uncertainty aversion," Journal of Mathematical Economics, Elsevier, vol. 48(5), pages 271-283.
- Simone Cerreia-Vioglio & Fabio Maccheroni & Massimo Marinacci & Luigi Montrucchio, 2010. "Probabilistic Sophistication, Second Order Stochastic Dominance, and Uncertainty Aversion," Carlo Alberto Notebooks 174, Collegio Carlo Alberto.
Eduardo Perez & Delphine Prady, 2012. "Complicating to Persuade?," Working Papers hal-03583827, HAL.
- Eduardo Perez & Delphine Prady, 2012. "Complicating to Persuade?," SciencePo Working papers hal-03583827, HAL.
- Eduardo Perez-Richet & Delphine Prady, 2012. "Complicating to Persuade?," Working Papers hal-00675135, HAL.
- Eduardo Perez & Delphine Prady, 2012. "Complicating to Persuade?," Sciences Po publications info:hdl:2441/5mao0mthj59, Sciences Po.
,, 2014. "Second order beliefs models of choice under imprecise risk: non-additive second order beliefs vs. nonlinear second order utility," Theoretical Economics, Econometric Society, vol. 9(3), September.
- Raphaël Giraud, 2014. "Second order beliefs models of choice under imprecise risk: non-additive second order beliefs vs. nonlinear second order utility," Post-Print hal-02878112, HAL.
Stephen P. Jenkins & Philippe Van Kerm, 2016. "Assessing Individual Income Growth," Economica, London School of Economics and Political Science, vol. 83(332), pages 679-703, October.
- Jenkins, Stephen P. & van Kerm, Philippe, 2016. "Assessing individual income growth," LSE Research Online Documents on Economics 66995, London School of Economics and Political Science, LSE Library.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-CMP-2021-08-30 (Computational Economics)
NEP-ISF-2021-08-30 (Islamic Finance)
NEP-RMG-2021-08-30 (Risk Management)
NEP-UPT-2021-08-30 (Utility Models and Prospect Theory)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2108.10403. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Robust Risk-Aware Reinforcement Learning

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data