Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

My bibliography Save this paper

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

Author

Listed:

Ben Hambly
Renyuan Xu
Huining Yang

Registered:

Abstract

We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. In particular, we consider the convergence of policy gradient methods in the setting of known and unknown parameters. We are able to produce a global linear convergence guarantee for this approach in the setting of finite time horizon and stochastic state dynamics under weak assumptions. The convergence of a projected policy gradient method is also established in order to handle problems with constraints. We illustrate the performance of the algorithm with two examples. The first example is the optimal liquidation of a holding in an asset. We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly. The empirical evidence suggests that the policy gradient method can learn the global optimal solution for a larger class of stochastic systems containing the LQR framework and that it is more robust with respect to model mis-specification when compared to a model-based approach. The second example is an LQR system in a higher dimensional setting with synthetic data.

Suggested Citation

Ben Hambly & Renyuan Xu & Huining Yang, 2020. "Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon," Papers 2011.10300, arXiv.org, revised Jun 2021.

Handle: RePEc:arx:papers:2011.10300

Download full text from publisher

References listed on IDEAS

Rama Cont & Arseniy Kukanov & Sasha Stoikov, 2014. "The Price Impact of Order Book Events," Journal of Financial Econometrics, Oxford University Press, vol. 12(1), pages 47-88.
Wenhang Bao & Xiao-yang Liu, 2019. "Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis," Papers 1906.11046, arXiv.org.
Aur'elien Alfonsi & Antje Fruth & Alexander Schied, 2007. "Optimal execution strategies in limit order books with general shape functions," Papers 0708.1756, arXiv.org, revised Feb 2010.
Arthur Charpentier & Romuald Elie & Carl Remlinger, 2020. "Reinforcement Learning in Economics and Finance," Papers 2003.10014, arXiv.org.
Aurelien Alfonsi & Antje Fruth & Alexander Schied, 2010. "Optimal execution strategies in limit order books with general shape functions," Quantitative Finance, Taylor & Francis Journals, vol. 10(2), pages 143-157.
Jim Gatheral & Alexander Schied, 2011. "Optimal Trade Execution Under Geometric Brownian Motion In The Almgren And Chriss Framework," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 14(03), pages 353-368.
Dieter Hendricks & Diane Wilcox, 2014. "A reinforcement learning extension to the Almgren-Chriss model for optimal trade execution," Papers 1403.2229, arXiv.org.
Robert Almgren, 2003. "Optimal execution with nonlinear impact functions and trading-enhanced risk," Applied Mathematical Finance, Taylor & Francis Journals, vol. 10(1), pages 1-18.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Anthony Coache & Sebastian Jaimungal, 2021. "Reinforcement Learning with Dynamic Convex Risk Measures," Papers 2112.13414, arXiv.org, revised Nov 2022.
Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
Ben Hambly & Renyuan Xu & Huining Yang, 2023. "Recent advances in reinforcement learning in finance," Mathematical Finance, Wiley Blackwell, vol. 33(3), pages 437-503, July.
Dianetti, Jodi & Ferrari, Giorgio & Xu, Renyuan, 2025. "Exploratory Optimal Stopping: A Singular Control Formulation," Center for Mathematical Economics Working Papers 740, Center for Mathematical Economics, Bielefeld University.
Jialun Cao & David v{S}iv{s}ka & Lukasz Szpruch & Tanut Treetanthiploet, 2024. "Logarithmic regret in the ergodic Avellaneda-Stoikov market making model," Papers 2409.02025, arXiv.org, revised Jul 2025.
Jodi Dianetti & Giorgio Ferrari & Renyuan Xu, 2024. "Exploratory Optimal Stopping: A Singular Control Formulation," Papers 2408.09335, arXiv.org, revised Oct 2024.
Houssem Jerbi & Obaid Alshammari & Sondess Ben Aoun & Mourad Kchaou & Theodore E. Simos & Spyridon D. Mourtas & Vasilios N. Katsikis, 2023. "Hermitian Solutions of the Quaternion Algebraic Riccati Equations through Zeroing Neural Networks with Application to Quadrotor Control," Mathematics, MDPI, vol. 12(1), pages 1-19, December.
Sebastien Lleo & Wolfgang Runggaldier, 2025. "Exploratory Randomization for Discrete-Time Linear Exponential Quadratic Gaussian (LEQG) Problem," Papers 2501.06275, arXiv.org, revised Sep 2025.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Olivier Guéant, 2016. "The Financial Mathematics of Market Liquidity: From Optimal Execution to Market Making," Post-Print hal-01393136, HAL.
Kashyap, Ravi, 2020. "David vs Goliath (You against the Markets), A dynamic programming approach to separate the impact and timing of trading costs," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 545(C).
Xiaoyue Li & John M. Mulvey, 2023. "Optimal Portfolio Execution in a Regime-switching Market with Non-linear Impact Costs: Combining Dynamic Program and Neural Network," Papers 2306.08809, arXiv.org.
Olivier Guéant & Charles-Albert Lehalle, 2015. "General Intensity Shapes In Optimal Liquidation," Mathematical Finance, Wiley Blackwell, vol. 25(3), pages 457-495, July.
- Olivier Gu'eant & Charles-Albert Lehalle, 2012. "General Intensity Shapes in Optimal Liquidation," Papers 1204.0148, arXiv.org, revised Jun 2013.
S. C. P. Yam & W. Zhou, 2017. "Optimal Liquidation of Child Limit Orders," Mathematics of Operations Research, INFORMS, vol. 42(2), pages 517-545, May.
Philippe Bergault & Olivier Gu'eant & Hamza Bodor, 2025. "To Hedge or Not to Hedge: Optimal Strategies for Stochastic Trade Flow Management," Papers 2503.02496, arXiv.org.
Cattivelli, Luca & Pirino, Davide, 2019. "A SHARP model of bid–ask spread forecasts," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1211-1225.
Arne Lokka & Junwei Xu, 2020. "Optimal liquidation trajectories for the Almgren-Chriss model with Levy processes," Papers 2002.03376, arXiv.org, revised Sep 2020.
Qinghua Li, 2014. "Facilitation and Internalization Optimal Strategy in a Multilateral Trading Context," Papers 1404.7320, arXiv.org, revised Jan 2015.
Lokka, A. & Xu, Junwei, 2020. "Optimal liquidation trajectories for the Almgren-Chriss model," LSE Research Online Documents on Economics 106977, London School of Economics and Political Science, LSE Library.
Daniel Hern'andez-Hern'andez & Harold A. Moreno-Franco & Jos'e Luis P'erez, 2017. "Periodic strategies in optimal execution with multiplicative price impact," Papers 1705.00284, arXiv.org, revised May 2018.
Fengpei Li & Vitalii Ihnatiuk & Ryan Kinnear & Anderson Schneider & Yuriy Nevmyvaka, 2022. "Do price trajectory data increase the efficiency of market impact estimation?," Papers 2205.13423, arXiv.org, revised Mar 2023.
Christopher Lorenz & Alexander Schied, 2013. "Drift dependence of optimal trade execution strategies under transient price impact," Finance and Stochastics, Springer, vol. 17(4), pages 743-770, October.
Aurélien Alfonsi & Alexander Schied, 2010. "Optimal trade execution and absence of price manipulations in limit order book models," Post-Print hal-00397652, HAL.
Charles-Albert Lehalle & Eyal Neuman, 2019. "Incorporating signals into optimal trading," Finance and Stochastics, Springer, vol. 23(2), pages 275-311, April.
- Charles-Albert Lehalle & Eyal Neuman, 2017. "Incorporating Signals into Optimal Trading," Papers 1704.00847, arXiv.org, revised Jun 2018.
Miles Kumaresan & Nataša Krejić, 2015. "Optimal trading of algorithmic orders in a liquidity fragmented market place," Annals of Operations Research, Springer, vol. 229(1), pages 521-540, June.
Natascha Hey & Eyal Neuman & Sturmius Tuschmann, 2025. "Nonparametric Estimation of Self- and Cross-Impact," Papers 2510.06879, arXiv.org.
Schnaubelt, Matthias, 2022. "Deep reinforcement learning for the optimal placement of cryptocurrency limit orders," European Journal of Operational Research, Elsevier, vol. 296(3), pages 993-1006.
Seungki Min & Costis Maglaras & Ciamac C. Moallemi, 2018. "Cross-Sectional Variation of Intraday Liquidity, Cross-Impact, and their Effect on Portfolio Execution," Papers 1811.05524, arXiv.org.
Ulrich Horst & Evgueni Kivman, 2024. "Optimal trade execution under small market impact and portfolio liquidation with semimartingale strategies," Finance and Stochastics, Springer, vol. 28(3), pages 759-812, July.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-CMP-2020-12-07 (Computational Economics)
NEP-ORE-2020-12-07 (Operations Research)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2011.10300. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data