IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2011.10300.html
   My bibliography  Save this paper

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

Author

Listed:
  • Ben Hambly
  • Renyuan Xu
  • Huining Yang

Abstract

We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. In particular, we consider the convergence of policy gradient methods in the setting of known and unknown parameters. We are able to produce a global linear convergence guarantee for this approach in the setting of finite time horizon and stochastic state dynamics under weak assumptions. The convergence of a projected policy gradient method is also established in order to handle problems with constraints. We illustrate the performance of the algorithm with two examples. The first example is the optimal liquidation of a holding in an asset. We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly. The empirical evidence suggests that the policy gradient method can learn the global optimal solution for a larger class of stochastic systems containing the LQR framework and that it is more robust with respect to model mis-specification when compared to a model-based approach. The second example is an LQR system in a higher dimensional setting with synthetic data.

Suggested Citation

  • Ben Hambly & Renyuan Xu & Huining Yang, 2020. "Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon," Papers 2011.10300, arXiv.org, revised Jun 2021.
  • Handle: RePEc:arx:papers:2011.10300
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2011.10300
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Aur'elien Alfonsi & Antje Fruth & Alexander Schied, 2007. "Optimal execution strategies in limit order books with general shape functions," Papers 0708.1756, arXiv.org, revised Feb 2010.
    2. Rama Cont & Arseniy Kukanov & Sasha Stoikov, 2014. "The Price Impact of Order Book Events," Journal of Financial Econometrics, Oxford University Press, vol. 12(1), pages 47-88.
    3. Arthur Charpentier & Romuald Elie & Carl Remlinger, 2020. "Reinforcement Learning in Economics and Finance," Papers 2003.10014, arXiv.org.
    4. Aurelien Alfonsi & Antje Fruth & Alexander Schied, 2010. "Optimal execution strategies in limit order books with general shape functions," Quantitative Finance, Taylor & Francis Journals, vol. 10(2), pages 143-157.
    5. Wenhang Bao & Xiao-yang Liu, 2019. "Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis," Papers 1906.11046, arXiv.org.
    6. Jim Gatheral & Alexander Schied, 2011. "Optimal Trade Execution Under Geometric Brownian Motion In The Almgren And Chriss Framework," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 14(03), pages 353-368.
    7. Dieter Hendricks & Diane Wilcox, 2014. "A reinforcement learning extension to the Almgren-Chriss model for optimal trade execution," Papers 1403.2229, arXiv.org.
    8. Robert Almgren, 2003. "Optimal execution with nonlinear impact functions and trading-enhanced risk," Applied Mathematical Finance, Taylor & Francis Journals, vol. 10(1), pages 1-18.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Anthony Coache & Sebastian Jaimungal, 2021. "Reinforcement Learning with Dynamic Convex Risk Measures," Papers 2112.13414, arXiv.org, revised Nov 2022.
    2. Houssem Jerbi & Obaid Alshammari & Sondess Ben Aoun & Mourad Kchaou & Theodore E. Simos & Spyridon D. Mourtas & Vasilios N. Katsikis, 2023. "Hermitian Solutions of the Quaternion Algebraic Riccati Equations through Zeroing Neural Networks with Application to Quadrotor Control," Mathematics, MDPI, vol. 12(1), pages 1-19, December.
    3. Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
    4. Ben Hambly & Renyuan Xu & Huining Yang, 2023. "Recent advances in reinforcement learning in finance," Mathematical Finance, Wiley Blackwell, vol. 33(3), pages 437-503, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Olivier Guéant, 2016. "The Financial Mathematics of Market Liquidity: From Optimal Execution to Market Making," Post-Print hal-01393136, HAL.
    2. Kashyap, Ravi, 2020. "David vs Goliath (You against the Markets), A dynamic programming approach to separate the impact and timing of trading costs," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 545(C).
    3. Xiaoyue Li & John M. Mulvey, 2023. "Optimal Portfolio Execution in a Regime-switching Market with Non-linear Impact Costs: Combining Dynamic Program and Neural Network," Papers 2306.08809, arXiv.org.
    4. Olivier Guéant & Charles-Albert Lehalle, 2015. "General Intensity Shapes In Optimal Liquidation," Mathematical Finance, Wiley Blackwell, vol. 25(3), pages 457-495, July.
    5. S. C. P. Yam & W. Zhou, 2017. "Optimal Liquidation of Child Limit Orders," Mathematics of Operations Research, INFORMS, vol. 42(2), pages 517-545, May.
    6. Cattivelli, Luca & Pirino, Davide, 2019. "A SHARP model of bid–ask spread forecasts," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1211-1225.
    7. Arne Lokka & Junwei Xu, 2020. "Optimal liquidation trajectories for the Almgren-Chriss model with Levy processes," Papers 2002.03376, arXiv.org, revised Sep 2020.
    8. Qinghua Li, 2014. "Facilitation and Internalization Optimal Strategy in a Multilateral Trading Context," Papers 1404.7320, arXiv.org, revised Jan 2015.
    9. Lokka, A. & Xu, Junwei, 2020. "Optimal liquidation trajectories for the Almgren-Chriss model," LSE Research Online Documents on Economics 106977, London School of Economics and Political Science, LSE Library.
    10. Daniel Hern'andez-Hern'andez & Harold A. Moreno-Franco & Jos'e Luis P'erez, 2017. "Periodic strategies in optimal execution with multiplicative price impact," Papers 1705.00284, arXiv.org, revised May 2018.
    11. Fengpei Li & Vitalii Ihnatiuk & Ryan Kinnear & Anderson Schneider & Yuriy Nevmyvaka, 2022. "Do price trajectory data increase the efficiency of market impact estimation?," Papers 2205.13423, arXiv.org, revised Mar 2023.
    12. Christopher Lorenz & Alexander Schied, 2013. "Drift dependence of optimal trade execution strategies under transient price impact," Finance and Stochastics, Springer, vol. 17(4), pages 743-770, October.
    13. Aurélien Alfonsi & Alexander Schied, 2010. "Optimal trade execution and absence of price manipulations in limit order book models," Post-Print hal-00397652, HAL.
    14. Charles-Albert Lehalle & Eyal Neuman, 2019. "Incorporating signals into optimal trading," Finance and Stochastics, Springer, vol. 23(2), pages 275-311, April.
    15. Miles Kumaresan & Nataša Krejić, 2015. "Optimal trading of algorithmic orders in a liquidity fragmented market place," Annals of Operations Research, Springer, vol. 229(1), pages 521-540, June.
    16. Schnaubelt, Matthias, 2022. "Deep reinforcement learning for the optimal placement of cryptocurrency limit orders," European Journal of Operational Research, Elsevier, vol. 296(3), pages 993-1006.
    17. Seungki Min & Costis Maglaras & Ciamac C. Moallemi, 2018. "Cross-Sectional Variation of Intraday Liquidity, Cross-Impact, and their Effect on Portfolio Execution," Papers 1811.05524, arXiv.org.
    18. Olivier Gu'eant, 2013. "Permanent market impact can be nonlinear," Papers 1305.0413, arXiv.org, revised Mar 2014.
    19. Jan Kallsen & Johannes Muhle-Karbe, 2014. "High-Resilience Limits of Block-Shaped Order Books," Papers 1409.7269, arXiv.org.
    20. Obizhaeva, Anna A. & Wang, Jiang, 2013. "Optimal trading strategy and supply/demand dynamics," Journal of Financial Markets, Elsevier, vol. 16(1), pages 1-32.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2011.10300. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.