IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2108.06655.html
   My bibliography  Save this paper

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

Author

Listed:
  • Yanwei Jia
  • Xun Yu Zhou

Abstract

We propose a unified framework to study policy evaluation (PE) and the associated temporal difference (TD) methods for reinforcement learning in continuous time and space. We show that PE is equivalent to maintaining the martingale condition of a process. From this perspective, we find that the mean--square TD error approximates the quadratic variation of the martingale and thus is not a suitable objective for PE. We present two methods to use the martingale characterization for designing PE algorithms. The first one minimizes a "martingale loss function", whose solution is proved to be the best approximation of the true value function in the mean--square sense. This method interprets the classical gradient Monte-Carlo algorithm. The second method is based on a system of equations called the "martingale orthogonality conditions" with test functions. Solving these equations in different ways recovers various classical TD algorithms, such as TD($\lambda$), LSTD, and GTD. Different choices of test functions determine in what sense the resulting solutions approximate the true value function. Moreover, we prove that any convergent time-discretized algorithm converges to its continuous-time counterpart as the mesh size goes to zero, and we provide the convergence rate. We demonstrate the theoretical results and corresponding algorithms with numerical experiments and applications.

Suggested Citation

  • Yanwei Jia & Xun Yu Zhou, 2021. "Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach," Papers 2108.06655, arXiv.org, revised Feb 2022.
  • Handle: RePEc:arx:papers:2108.06655
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2108.06655
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hansen, Lars Peter, 1982. "Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, Econometric Society, vol. 50(4), pages 1029-1054, July.
    2. Ole E. Barndorff-Nielsen & Neil Shephard, 2002. "Estimating quadratic variation using realized variance," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 17(5), pages 457-477.
    3. N. El Karoui & S. Peng & M. C. Quenez, 1997. "Backward Stochastic Differential Equations in Finance," Mathematical Finance, Wiley Blackwell, vol. 7(1), pages 1-71, January.
    4. Hansen, Lars Peter & Heaton, John & Yaron, Amir, 1996. "Finite-Sample Properties of Some Alternative GMM Estimators," Journal of Business & Economic Statistics, American Statistical Association, vol. 14(3), pages 262-280, July.
    5. Ole E. Barndorff-Nielsen & Neil Shephard, 2002. "Estimating quadratic variation using realized variance," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 17(5), pages 457-477.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhou Fang, 2023. "Continuous-Time Path-Dependent Exploratory Mean-Variance Portfolio Construction," Papers 2303.02298, arXiv.org.
    2. Yanwei Jia & Xun Yu Zhou, 2021. "Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms," Papers 2111.11232, arXiv.org, revised Jul 2022.
    3. Xiangyu Cui & Xun Li & Yun Shi & Si Zhao, 2023. "Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning," Papers 2312.15385, arXiv.org.
    4. Ben Hambly & Renyuan Xu & Huining Yang, 2023. "Recent advances in reinforcement learning in finance," Mathematical Finance, Wiley Blackwell, vol. 33(3), pages 437-503, July.
    5. Min Dai & Hanqing Jin & Xi Yang, 2024. "Data-driven Option Pricing," Papers 2401.11158, arXiv.org.
    6. Zhou Fang & Haiqing Xu, 2023. "Market Making of Options via Reinforcement Learning," Papers 2307.01814, arXiv.org.
    7. Zhou Fang & Haiqing Xu, 2023. "Over-the-Counter Market Making via Reinforcement Learning," Papers 2307.01816, arXiv.org.
    8. Yanwei Jia & Xun Yu Zhou, 2022. "q-Learning in Continuous Time," Papers 2207.00713, arXiv.org, revised Apr 2023.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bontemps, Christian & Meddahi, Nour, 2005. "Testing normality: a GMM approach," Journal of Econometrics, Elsevier, vol. 124(1), pages 149-186, January.
    2. Christensen, Bent Jesper & Posch, Olaf & van der Wel, Michel, 2016. "Estimating dynamic equilibrium models using mixed frequency macro and financial data," Journal of Econometrics, Elsevier, vol. 194(1), pages 116-137.
    3. Bansal, Ravi & Kiku, Dana & Yaron, Amir, 2016. "Risks for the long run: Estimation with time aggregation," Journal of Monetary Economics, Elsevier, vol. 82(C), pages 52-69.
    4. Chang, Jinyuan & Chen, Song Xi & Chen, Xiaohong, 2015. "High dimensional generalized empirical likelihood for moment restrictions with dependent data," Journal of Econometrics, Elsevier, vol. 185(1), pages 283-304.
    5. Dante Amengual & Marine Carrasco & Enrique Sentana, 2017. "Testing Distributional Assumptions Using a Continuum of Moments," Working Papers wp2018_1709, CEMFI.
    6. Eleftheria Kafousaki & Stavros Degiannakis, 2023. "Forecasting VIX: the illusion of forecast evaluation criteria," Economics and Business Letters, Oviedo University Press, vol. 12(3), pages 231-240.
    7. Lee, Hwang Hee & Hyun, Jung-Soon, 2019. "The asymmetric effect of equity volatility on credit default swap spreads," Journal of Banking & Finance, Elsevier, vol. 98(C), pages 125-136.
    8. Whitney K. Newey & Frank Windmeijer, 2005. "GMM with many weak moment conditions," CeMMAP working papers CWP18/05, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    9. Jens J. Krüger, 2014. "A multivariate evaluation of German output growth and inflation forecasts," Economics Bulletin, AccessEcon, vol. 34(3), pages 1410-1418.
    10. Joachim Inkmann, 2000. "Finite Sample Properties of One-Step, Two-Step and Bootstrap Empirical Likelihood Approaches to Efficient GMM Estimation," Econometric Society World Congress 2000 Contributed Papers 0332, Econometric Society.
    11. Otsu, Taisuke, 2010. "On Bahadur efficiency of empirical likelihood," Journal of Econometrics, Elsevier, vol. 157(2), pages 248-256, August.
    12. Diego Amaya & Jean-François Bégin & Geneviève Gauthier, 2022. "The Informational Content of High-Frequency Option Prices," Management Science, INFORMS, vol. 68(3), pages 2166-2201, March.
    13. Lozano, Martín & Rubio, Gonzalo, 2011. "Evaluating alternative methods for testing asset pricing models with historical data," Journal of Empirical Finance, Elsevier, vol. 18(1), pages 136-146, January.
    14. Ozcan Ceylan, 2015. "Limited information-processing capacity and asymmetric stock correlations," Quantitative Finance, Taylor & Francis Journals, vol. 15(6), pages 1031-1039, June.
    15. repec:hal:wpspec:info:hdl:2441/3vl5fe4i569nbr005tctlc8ll5 is not listed on IDEAS
    16. Marjan Petreski, 2010. "An Overhaul of a Doctrine: Has Inflation Targeting Opened a New Era in Developing-country Peggers?," FIW Working Paper series 057, FIW.
    17. Frank Kleibergen, 2004. "Expansions of GMM statistics that indicate their properties under weak and/or many instruments and the bootstrap," Econometric Society 2004 North American Summer Meetings 408, Econometric Society.
    18. Masakatsu Okubo, 2011. "The Intertemporal Elasticity of Substitution: An Analysis Based on Japanese Data," Economica, London School of Economics and Political Science, vol. 78(310), pages 367-390, April.
    19. Xu Cheng & Winston Wei Dou & Zhipeng Liao, 2022. "Macro‐Finance Decoupling: Robust Evaluations of Macro Asset Pricing Models," Econometrica, Econometric Society, vol. 90(2), pages 685-713, March.
    20. Parente, Paulo M.D.C. & Smith, Richard J., 2011. "Gel Methods For Nonsmooth Moment Indicators," Econometric Theory, Cambridge University Press, vol. 27(1), pages 74-113, February.
    21. Shane M. Sherlund, 2004. "Quasi Empirical Likelihood Estimation of Moment Condition Models," Econometric Society 2004 North American Summer Meetings 507, Econometric Society.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2108.06655. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.