IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2511.00190.html

Deep reinforcement learning for optimal trading with partial information

Author

Listed:
  • Andrea Macr`i
  • Sebastian Jaimungal
  • Fabrizio Lillo

Abstract

Reinforcement Learning (RL) applied to financial problems has been the subject of a lively area of research. The use of RL for optimal trading strategies that exploit latent information in the market is, to the best of our knowledge, not widely tackled. In this paper we study an optimal trading problem, where a trading signal follows an Ornstein-Uhlenbeck process with regime-switching dynamics. We employ a blend of RL and Recurrent Neural Networks (RNN) in order to make the most at extracting underlying information from the trading signal with latent parameters. The latent parameters driving mean reversion, speed, and volatility are filtered from observations of the signal, and trading strategies are derived via RL. To address this problem, we propose three Deep Deterministic Policy Gradient (DDPG)-based algorithms that integrate Gated Recurrent Unit (GRU) networks to capture temporal dependencies in the signal. The first, a one -step approach (hid-DDPG), directly encodes hidden states from the GRU into the RL trader. The second and third are two-step methods: one (prob-DDPG) makes use of posterior regime probability estimates, while the other (reg-DDPG) relies on forecasts of the next signal value. Through extensive simulations with increasingly complex Markovian regime dynamics for the trading signal's parameters, as well as an empirical application to equity pair trading, we find that prob-DDPG achieves superior cumulative rewards and exhibits more interpretable strategies. By contrast, reg-DDPG provides limited benefits, while hid-DDPG offers intermediate performance with less interpretable strategies. Our results show that the quality and structure of the information supplied to the agent are crucial: embedding probabilistic insights into latent regimes substantially improves both profitability and robustness of reinforcement learning-based trading strategies.

Suggested Citation

  • Andrea Macr`i & Sebastian Jaimungal & Fabrizio Lillo, 2025. "Deep reinforcement learning for optimal trading with partial information," Papers 2511.00190, arXiv.org.
  • Handle: RePEc:arx:papers:2511.00190
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2511.00190
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Petter N. Kolm & Jeremy Turiel & Nicholas Westray, 2023. "Deep order flow imbalance: Extracting alpha at multiple horizons from the limit order book," Mathematical Finance, Wiley Blackwell, vol. 33(4), pages 1044-1081, October.
    2. Antonio Briola & Jeremy Turiel & Riccardo Marcaccioli & Alvaro Cauderan & Tomaso Aste, 2021. "Deep Reinforcement Learning for Active High Frequency Trading," Papers 2101.07107, arXiv.org, revised Aug 2023.
    3. Hamilton, James D, 1989. "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle," Econometrica, Econometric Society, vol. 57(2), pages 357-384, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Yuanrong & Aste, Tomaso, 2023. "Dynamic portfolio optimization with inverse covariance clustering," LSE Research Online Documents on Economics 117701, London School of Economics and Political Science, LSE Library.
    2. Milan Kumar Das & Anindya Goswami, 2019. "Testing of binary regime switching models using squeeze duration analysis," International Journal of Financial Engineering (IJFE), World Scientific Publishing Co. Pte. Ltd., vol. 6(01), pages 1-20, March.
    3. Hauzenberger, Niko & Huber, Florian & Klieber, Karin & Marcellino, Massimiliano, 2025. "Bayesian neural networks for macroeconomic analysis," Journal of Econometrics, Elsevier, vol. 249(PC).
    4. Carstensen, Kai & Heinrich, Markus & Reif, Magnus & Wolters, Maik H., 2020. "Predicting ordinary and severe recessions with a three-state Markov-switching dynamic factor model," International Journal of Forecasting, Elsevier, vol. 36(3), pages 829-850.
    5. Chkili, Walid & Nguyen, Duc Khuong, 2014. "Exchange rate movements and stock market returns in a regime-switching environment: Evidence for BRICS countries," Research in International Business and Finance, Elsevier, vol. 31(C), pages 46-56.
    6. Manuela Goretti, 2005. "The Brazilian currency turmoil of 2002: a nonlinear analysis," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 10(4), pages 289-306.
    7. Valentina Aprigliano & Danilo Liberati, 2021. "Using Credit Variables to Date Business Cycle and to Estimate the Probabilities of Recession in Real Time," Manchester School, University of Manchester, vol. 89(S1), pages 76-96, September.
    8. DAVID E. ALLEN & MICHAEL McALEER & ROBERT J. POWELL & ABHAY K. SINGH, 2018. "Non-Parametric Multiple Change Point Analysis Of The Global Financial Crisis," Annals of Financial Economics (AFE), World Scientific Publishing Co. Pte. Ltd., vol. 13(02), pages 1-23, June.
    9. Mariam Camarero & Juan Sapena & Cecilio Tamarit, 2020. "Modelling Time-Varying Parameters in Panel Data State-Space Frameworks: An Application to the Feldstein–Horioka Puzzle," Computational Economics, Springer;Society for Computational Economics, vol. 56(1), pages 87-114, June.
    10. Xi, Xiaojing & Mamon, Rogemar, 2011. "Parameter estimation of an asset price model driven by a weak hidden Markov chain," Economic Modelling, Elsevier, vol. 28(1-2), pages 36-46, January.
    11. Anne Morrison Piehl & Suzanne J. Cooper & Anthony A. Braga & David M. Kennedy, 2003. "Testing for Structural Breaks in the Evaluation of Programs," The Review of Economics and Statistics, MIT Press, vol. 85(3), pages 550-558, August.
    12. Sarah Arndt & Zeno Enders, 2023. "The Transmission of Supply Shocks in Different Inflation Regimes," CESifo Working Paper Series 10839, CESifo.
    13. Hendry, David F. & Clements, Michael P., 2003. "Economic forecasting: some lessons from recent research," Economic Modelling, Elsevier, vol. 20(2), pages 301-329, March.
    14. Perron, Pierre & Wada, Tatsuma, 2016. "Measuring business cycles with structural breaks and outliers: Applications to international data," Research in Economics, Elsevier, vol. 70(2), pages 281-303.
    15. Claudio Morana, 2014. "Factor Vector Autoregressive Estimation of Heteroskedastic Persistent and Non Persistent Processes Subject to Structural Breaks," Working Papers 273, University of Milano-Bicocca, Department of Economics, revised May 2014.
    16. Carol Alexander & Anca Dimitriu, 2003. "Equity Indexing: Conitegration and Stock Price Dispersion: A Regime Switiching Approach to market Efficiency," ICMA Centre Discussion Papers in Finance icma-dp2003-02, Henley Business School, University of Reading.
    17. Nemati, Mehdi & Saghaian, Sayed H., 2016. "Dynamics of Price Adjustment in Qualitatively Differentiated Markets in the U.S.: The Case of Organic and Conventional Apples," 2016 Annual Meeting, February 6-9, 2016, San Antonio, Texas 229950, Southern Agricultural Economics Association.
    18. Raggi, Davide & Bordignon, Silvano, 2012. "Long memory and nonlinearities in realized volatility: A Markov switching approach," Computational Statistics & Data Analysis, Elsevier, vol. 56(11), pages 3730-3742.
    19. Flavin, Thomas J. & Panopoulou, Ekaterini & Unalmis, Deren, 2008. "On the stability of domestic financial market linkages in the presence of time-varying volatility," Emerging Markets Review, Elsevier, vol. 9(4), pages 280-301, December.
    20. Marco Gallegati, 2019. "A system for dating long wave phases in economic development," Journal of Evolutionary Economics, Springer, vol. 29(3), pages 803-822, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2511.00190. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.