IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-01396680.html
   My bibliography  Save this paper

Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces

Author

Listed:
  • Jérôme Renault

    (TSE-R - Toulouse School of Economics - UT Capitole - Université Toulouse Capitole - UT - Université de Toulouse - INRA - Institut National de la Recherche Agronomique - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique)

  • Xavier Venel

    (PSE - Paris School of Economics - UP1 - Université Paris 1 Panthéon-Sorbonne - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris Sciences et Lettres - EHESS - École des hautes études en sciences sociales - ENPC - École des Ponts ParisTech - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique)

Abstract

We study long-term Markov Decision Processes and Gambling Houses, with applications to any partial observation MDPs with finitely many states and zero-sum repeated games with an informed controller. We consider a decision-maker which is maximizing the weighted sum t≥1 θtrt, where rt is the expected reward of the t-th stage. We prove the existence of a very strong notion of long-term value called general uniform value, representing the fact that the decision-maker can play well independently of the evaluations (θt) t≥1 over stages, provided the total variation (or impatience) t≥1 |θt+1 − θt| is small enough. This result generalizes previous results of Rosenberg, Solan and Vieille [35] and Renault [31] that focus on arithmetic means and discounted evaluations. Moreover, we give a variational characterization of the general uniform value via the introduction of appropriate invariant measures for the decision problems, generalizing the fundamental theorem of gambling or the Aumann-Maschler cavu formula for repeated games with incomplete information. Apart the introduction of appropriate invariant measures, the main innovation in our proofs is the introduction of a new metric d * such that partial observation MDP's and repeated games with an informed controller may be associated to auxiliary problems that are non-expansive with respect to d *. Given two Borel probabilities over a compact subset X of a normed vector space, we define d * (u, v) = sup f ∈D 1 |u(f) − v(f)|, where D1 is the set of functions satisfying: ∀x, y ∈ X, ∀a, b ≥ 0, af (x) − bf (y) ≤ ax − by. The particular case where X is a simplex endowed with the L 1-norm is particularly interesting: d * is the largest distance over the probabilities with finite support over X which makes every disintegration non-expansive. Moreover, we obtain a Kantorovich-Rubinstein type duality formula for d * (u, v) involving couples of measures (α, β) over X × X such that the first marginal of α is u and the second marginal of β is v. MSC Classification: Primary: 90C40 ; Secondary: 60J20, 91A15.

Suggested Citation

  • Jérôme Renault & Xavier Venel, 2017. "Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces," Post-Print hal-01396680, HAL.
  • Handle: RePEc:hal:journl:hal-01396680
    DOI: 10.1287/moor.2016.0814
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a search for a similarly titled item that would be available.

    Other versions of this item:

    References listed on IDEAS

    as
    1. MERTENS, Jean-François, 1987. "Repeated games. Proceedings of the International Congress of Mathematicians," LIDAM Reprints CORE 788, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    2. Jérôme Renault, 2006. "The Value of Markov Chain Games with Lack of Information on One Side," Mathematics of Operations Research, INFORMS, vol. 31(3), pages 490-512, August.
    3. VIEILLE, Nicolas & ROSENBERG, Dinah & SOLAN, Eilon, 2002. "Stochastic games with a single controller and incomplete information," HEC Research Papers Series 754, HEC Paris.
    4. Dinah Rosenberg & Eilon Solan & Nicolas Vieille, 2000. "Blackwell Optimality in Markov Decision Processes with Partial Observation," Discussion Papers 1292, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
    5. Ehud Lehrer & Sylvain Sorin, 1992. "A Uniform Tauberian Theorem in Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 17(2), pages 303-307, May.
    6. John C. Harsanyi, 1967. "Games with Incomplete Information Played by "Bayesian" Players, I-III Part I. The Basic Model," Management Science, INFORMS, vol. 14(3), pages 159-182, November.
    7. Abraham Neyman, 2008. "Existence of optimal strategies in Markov games with incomplete information," International Journal of Game Theory, Springer;Game Theory Society, vol. 37(4), pages 581-596, December.
    8. Robert J. Aumann, 1995. "Repeated Games with Incomplete Information," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262011476, December.
    9. A. Hordijk & L. C. M. Kallenberg, 1979. "Linear Programming and Markov Decision Chains," Management Science, INFORMS, vol. 25(4), pages 352-362, April.
    10. MERTENS, Jean-François & ZAMIR, Shmuel, 1985. "Formulation of Bayesian analysis for games with incomplete information," LIDAM Reprints CORE 608, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    11. Truman Bewley & Elon Kohlberg, 1976. "The Asymptotic Theory of Stochastic Games," Mathematics of Operations Research, INFORMS, vol. 1(3), pages 197-208, August.
    12. Jérôme Renault, 2012. "The Value of Repeated Games with an Informed Controller," Mathematics of Operations Research, INFORMS, vol. 37(1), pages 154-179, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Koessler, Frederic & Laclau, Marie & Renault, Jérôme & Tomala, Tristan, 2022. "Long information design," Theoretical Economics, Econometric Society, vol. 17(2), May.
    2. Fabien Gensbittel & Marcin Peski & Jérôme Renault, 2019. "The Large Space Of Information Structures," Working Papers hal-02075905, HAL.
    3. Li, Jin & Quincampoix, Marc & Renault, Jérôme & Buckdahn, Rainer, 2019. "Representation formulas for limit values of long run stochastic optimal controls," TSE Working Papers 19-1007, Toulouse School of Economics (TSE).
    4. Frédéric Koessler & Marie Laclau & Jerôme Renault & Tristan Tomala, 2022. "Long information design," Post-Print hal-03700394, HAL.
    5. Frédéric Koessler & Marie Laclau & Jerôme Renault & Tristan Tomala, 2022. "Long information design," PSE-Ecole d'économie de Paris (Postprint) hal-03700394, HAL.
    6. Rida Laraki & Jérôme Renault, 2020. "Acyclic Gambling Games," Mathematics of Operations Research, INFORMS, vol. 45(4), pages 1237-1257, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Laraki, Rida & Sorin, Sylvain, 2015. "Advances in Zero-Sum Dynamic Games," Handbook of Game Theory with Economic Applications,, Elsevier.
    2. Sylvain Sorin, 2011. "Zero-Sum Repeated Games: Recent Advances and New Links with Differential Games," Dynamic Games and Applications, Springer, vol. 1(1), pages 172-207, March.
    3. Bruno Ziliotto, 2016. "A Tauberian Theorem for Nonexpansive Operators and Applications to Zero-Sum Stochastic Games," Mathematics of Operations Research, INFORMS, vol. 41(4), pages 1522-1534, November.
    4. Xiaoxi Li & Xavier Venel, 2016. "Recursive games: Uniform value, Tauberian theorem and the Mertens conjecture " M axmin = lim v n = lim v λ "," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-01302553, HAL.
    5. Xiaoxi Li & Xavier Venel, 2016. "Recursive games: Uniform value, Tauberian theorem and the Mertens conjecture " M axmin = lim v n = lim v λ "," PSE-Ecole d'économie de Paris (Postprint) hal-01302553, HAL.
    6. Ashkenazi-Golan, Galit & Rainer, Catherine & Solan, Eilon, 2020. "Solving two-state Markov games with incomplete information on one side," Games and Economic Behavior, Elsevier, vol. 122(C), pages 83-104.
    7. Xiaoxi Li & Xavier Venel, 2016. "Recursive games: Uniform value, Tauberian theorem and the Mertens conjecture " M axmin = lim v n = lim v λ "," Post-Print hal-01302553, HAL.
    8. Abraham Neyman & Sylvain Sorin, 2010. "Repeated games with public uncertain duration process," International Journal of Game Theory, Springer;Game Theory Society, vol. 39(1), pages 29-52, March.
    9. Hugo Gimbert & Jérôme Renault & Sylvain Sorin & Xavier Venel & Wieslaw Zielonka, 2016. "On the values of repeated games with signals," PSE-Ecole d'économie de Paris (Postprint) hal-01006951, HAL.
    10. Dhruva Kartik & Ashutosh Nayyar, 2021. "Upper and Lower Values in Zero-Sum Stochastic Games with Asymmetric Information," Dynamic Games and Applications, Springer, vol. 11(2), pages 363-388, June.
    11. Xavier Venel, 2015. "Commutative Stochastic Games," Mathematics of Operations Research, INFORMS, vol. 40(2), pages 403-428, February.
    12. Fabien Gensbittel & Jérôme Renault, 2015. "The Value of Markov Chain Games with Incomplete Information on Both Sides," Mathematics of Operations Research, INFORMS, vol. 40(4), pages 820-841, October.
    13. repec:dau:papers:123456789/10880 is not listed on IDEAS
    14. Guillaume Vigeral, 2013. "A Zero-Sum Stochastic Game with Compact Action Sets and no Asymptotic Value," Dynamic Games and Applications, Springer, vol. 3(2), pages 172-186, June.
    15. Jérôme Renault, 2012. "The Value of Repeated Games with an Informed Controller," Mathematics of Operations Research, INFORMS, vol. 37(1), pages 154-179, February.
    16. Pierre Cardaliaguet & Catherine Rainer & Dinah Rosenberg & Nicolas Vieille, 2016. "Markov Games with Frequent Actions and Incomplete Information—The Limit Case," Mathematics of Operations Research, INFORMS, vol. 41(1), pages 49-71, February.
    17. Jérôme Bolte & Stéphane Gaubert & Guillaume Vigeral, 2015. "Definable Zero-Sum Stochastic Games," Mathematics of Operations Research, INFORMS, vol. 40(1), pages 171-191, February.
    18. Laraki, Rida & Renault, Jérôme, 2017. "Acyclic Gambling Games," TSE Working Papers 17-768, Toulouse School of Economics (TSE).
    19. Mandel, Antoine & Venel, Xavier, 2020. "Dynamic competition over social networks," European Journal of Operational Research, Elsevier, vol. 280(2), pages 597-608.
    20. Guilhem Lecouteux, 2018. "Bayesian game theorists and non-Bayesian players," The European Journal of the History of Economic Thought, Taylor & Francis Journals, vol. 25(6), pages 1420-1454, November.
    21. Johannes Hörner & Satoru Takahashi & Nicolas Vieille, 2015. "Truthful Equilibria in Dynamic Bayesian Games," Econometrica, Econometric Society, vol. 83(5), pages 1795-1848, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-01396680. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.