IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2208.08497.html
   My bibliography  Save this paper

Choquet regularization for reinforcement learning

Author

Listed:
  • Xia Han
  • Ruodu Wang
  • Xun Yu Zhou

Abstract

We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as $\epsilon$-greedy, exponential, uniform and Gaussian.

Suggested Citation

  • Xia Han & Ruodu Wang & Xun Yu Zhou, 2022. "Choquet regularization for reinforcement learning," Papers 2208.08497, arXiv.org.
  • Handle: RePEc:arx:papers:2208.08497
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2208.08497
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. R. Rockafellar & Stan Uryasev & Michael Zabarankin, 2006. "Generalized deviations in risk analysis," Finance and Stochastics, Springer, vol. 10(1), pages 51-74, January.
    2. De Waegenaere, Anja & Wakker, Peter P., 2001. "Nonmonotonic Choquet integrals," Journal of Mathematical Economics, Elsevier, vol. 36(1), pages 45-60, September.
    3. Georgios Psarrakos & Jorge Navarro, 2013. "Generalized cumulative residual entropy and record values," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 76(5), pages 623-640, July.
    4. Sunoj, S.M. & Sankaran, P.G., 2012. "Quantile based entropy function," Statistics & Probability Letters, Elsevier, vol. 82(6), pages 1049-1053.
    5. Acerbi, Carlo, 2002. "Spectral measures of risk: A coherent representation of subjective risk aversion," Journal of Banking & Finance, Elsevier, vol. 26(7), pages 1505-1518, July.
    6. Yaari, Menahem E, 1987. "The Dual Theory of Choice under Risk," Econometrica, Econometric Society, vol. 55(1), pages 95-115, January.
    7. Furman, Edward & Wang, Ruodu & Zitikis, Ričardas, 2017. "Gini-type measures of risk and variability: Gini shortfall, capital allocations, and heavy-tailed risks," Journal of Banking & Finance, Elsevier, vol. 83(C), pages 70-84.
    8. Tversky, Amos & Kahneman, Daniel, 1992. "Advances in Prospect Theory: Cumulative Representation of Uncertainty," Journal of Risk and Uncertainty, Springer, vol. 5(4), pages 297-323, October.
    9. Haoran Wang & Xun Yu Zhou, 2020. "Continuous‐time mean–variance portfolio selection: A reinforcement learning framework," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1273-1308, October.
    10. Wang, Qiuqi & Wang, Ruodu & Wei, Yunran, 2020. "Distortion Riskmetrics On General Spaces," ASTIN Bulletin, Cambridge University Press, vol. 50(3), pages 827-851, September.
    11. Schmeidler, David, 1989. "Subjective Probability and Expected Utility without Additivity," Econometrica, Econometric Society, vol. 57(3), pages 571-587, May.
    12. Ruodu Wang & Yunran Wei & Gordon E. Willmot, 2020. "Characterization, Robustness, and Aggregation of Signed Choquet Integrals," Mathematics of Operations Research, INFORMS, vol. 45(3), pages 993-1015, August.
    13. Philippe Artzner & Freddy Delbaen & Jean‐Marc Eber & David Heath, 1999. "Coherent Measures of Risk," Mathematical Finance, Wiley Blackwell, vol. 9(3), pages 203-228, July.
    14. Bellini, Fabio & Fadina, Tolulope & Wang, Ruodu & Wei, Yunran, 2022. "Parametric measures of variability induced by risk measures," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 270-284.
    15. Rothschild, Michael & Stiglitz, Joseph E., 1970. "Increasing risk: I. A definition," Journal of Economic Theory, Elsevier, vol. 2(3), pages 225-243, September.
    16. Fabio Bellini & Tolulope Fadina & Ruodu Wang & Yunran Wei, 2020. "Parametric measures of variability induced by risk measures," Papers 2012.05219, arXiv.org, revised Apr 2022.
    17. Ruodu Wang & Ričardas Zitikis, 2021. "An Axiomatic Foundation for the Expected Shortfall," Management Science, INFORMS, vol. 67(3), pages 1413-1429, March.
    18. Quiggin, John, 1982. "A theory of anticipated utility," Journal of Economic Behavior & Organization, Elsevier, vol. 3(4), pages 323-343, December.
    19. Liu, Fangda & Cai, Jun & Lemieux, Christiane & Wang, Ruodu, 2020. "Convex risk functionals: Representation and applications," Insurance: Mathematics and Economics, Elsevier, vol. 90(C), pages 66-79.
    20. Georg Pflug & David Wozabal, 2007. "Ambiguity in portfolio selection," Quantitative Finance, Taylor & Francis Journals, vol. 7(4), pages 435-442.
    21. Gilboa, Itzhak & Schmeidler, David, 1989. "Maxmin expected utility with non-unique prior," Journal of Mathematical Economics, Elsevier, vol. 18(2), pages 141-153, April.
    22. Bogdan Grechuk & Anton Molyboha & Michael Zabarankin, 2009. "Maximum Entropy Principle with General Deviation Measures," Mathematics of Operations Research, INFORMS, vol. 34(2), pages 445-467, May.
    23. Yanwei Jia & Xun Yu Zhou, 2022. "q-Learning in Continuous Time," Papers 2207.00713, arXiv.org, revised Apr 2023.
    24. Wang, Shaun S. & Young, Virginia R. & Panjer, Harry H., 1997. "Axiomatic characterization of insurance prices," Insurance: Mathematics and Economics, Elsevier, vol. 21(2), pages 173-183, November.
    25. Hans Föllmer & Alexander Schied, 2002. "Convex measures of risk and trading constraints," Finance and Stochastics, Springer, vol. 6(4), pages 429-447.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ruodu Wang & Yunran Wei & Gordon E. Willmot, 2020. "Characterization, Robustness, and Aggregation of Signed Choquet Integrals," Mathematics of Operations Research, INFORMS, vol. 45(3), pages 993-1015, August.
    2. Jean-Gabriel Lauzier & Liyuan Lin & Ruodu Wang, 2023. "Risk sharing, measuring variability, and distortion riskmetrics," Papers 2302.04034, arXiv.org.
    3. Silvana Pesenti & Qiuqi Wang & Ruodu Wang, 2020. "Optimizing distortion riskmetrics with distributional uncertainty," Papers 2011.04889, arXiv.org, revised Feb 2022.
    4. Albrecht, Peter & Huggenberger, Markus, 2017. "The fundamental theorem of mutual insurance," Insurance: Mathematics and Economics, Elsevier, vol. 75(C), pages 180-188.
    5. Xia Han & Ruodu Wang & Qinyu Wu, 2023. "Monotonic mean-deviation risk measures," Papers 2312.01034, arXiv.org.
    6. Xia Han & Bin Wang & Ruodu Wang & Qinyu Wu, 2021. "Risk Concentration and the Mean-Expected Shortfall Criterion," Papers 2108.05066, arXiv.org, revised Apr 2022.
    7. Bellini, Fabio & Fadina, Tolulope & Wang, Ruodu & Wei, Yunran, 2022. "Parametric measures of variability induced by risk measures," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 270-284.
    8. Furman, Edward & Wang, Ruodu & Zitikis, Ričardas, 2017. "Gini-type measures of risk and variability: Gini shortfall, capital allocations, and heavy-tailed risks," Journal of Banking & Finance, Elsevier, vol. 83(C), pages 70-84.
    9. Ruodu Wang & Ričardas Zitikis, 2021. "An Axiomatic Foundation for the Expected Shortfall," Management Science, INFORMS, vol. 67(3), pages 1413-1429, March.
    10. Haiyan Liu & Bin Wang & Ruodu Wang & Sheng Chao Zhuang, 2023. "Distorted optimal transport," Papers 2308.11238, arXiv.org.
    11. Wang, Qiuqi & Wang, Ruodu & Zitikis, Ričardas, 2022. "Risk measures induced by efficient insurance contracts," Insurance: Mathematics and Economics, Elsevier, vol. 103(C), pages 56-65.
    12. Samuel Solgon Santos & Marcelo Brutti Righi & Eduardo de Oliveira Horta, 2022. "The limitations of comonotonic additive risk measures: a literature review," Papers 2212.13864, arXiv.org, revised Jan 2024.
    13. Yi Shen & Zachary Van Oosten & Ruodu Wang, 2024. "Partially Law-Invariant Risk Measures," Papers 2401.17265, arXiv.org.
    14. Xia Han & Liyuan Lin & Ruodu Wang, 2022. "Diversification quotients: Quantifying diversification via risk measures," Papers 2206.13679, arXiv.org, revised Mar 2024.
    15. Cillo, Alessandra & Delquié, Philippe, 2014. "Mean-risk analysis with enhanced behavioral content," European Journal of Operational Research, Elsevier, vol. 239(3), pages 764-775.
    16. Fabio Bellini & Tolulope Fadina & Ruodu Wang & Yunran Wei, 2020. "Parametric measures of variability induced by risk measures," Papers 2012.05219, arXiv.org, revised Apr 2022.
    17. Wei Wang & Huifu Xu, 2023. "Preference robust state-dependent distortion risk measure on act space and its application in optimal decision making," Computational Management Science, Springer, vol. 20(1), pages 1-51, December.
    18. Andreas H Hamel, 2018. "Monetary Measures of Risk," Papers 1812.04354, arXiv.org.
    19. Liu, Fangda & Cai, Jun & Lemieux, Christiane & Wang, Ruodu, 2020. "Convex risk functionals: Representation and applications," Insurance: Mathematics and Economics, Elsevier, vol. 90(C), pages 66-79.
    20. Denuit Michel & Dhaene Jan & Goovaerts Marc & Kaas Rob & Laeven Roger, 2006. "Risk measurement with equivalent utility principles," Statistics & Risk Modeling, De Gruyter, vol. 24(1), pages 1-25, July.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2208.08497. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.