IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2511.02136.html

JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading

Author

Listed:
  • Valentin Mohl
  • Sascha Frey
  • Reuben Leyland
  • Kang Li
  • George Nigmatulin
  • Mihai Cucuringu
  • Stefan Zohren
  • Jakob Foerster
  • Anisoara Calinescu

Abstract

Agent-based modelling (ABM) approaches for high-frequency financial markets are difficult to calibrate and validate, partly due to the large parameter space created by defining fixed agent policies. Multi-agent reinforcement learning (MARL) enables more realistic agent behaviour and reduces the number of free parameters, but the heavy computational cost has so far limited research efforts. To address this, we introduce JaxMARL-HFT (JAX-based Multi-Agent Reinforcement Learning for High-Frequency Trading), the first GPU-accelerated open-source multi-agent reinforcement learning environment for high-frequency trading (HFT) on market-by-order (MBO) data. Extending the JaxMARL framework and building on the JAX-LOB implementation, JaxMARL-HFT is designed to handle a heterogeneous set of agents, enabling diverse observation/action spaces and reward functions. It is designed flexibly, so it can also be used for single-agent RL, or extended to act as an ABM with fixed-policy agents. Leveraging JAX enables up to a 240x reduction in end-to-end training time, compared with state-of-the-art reference implementations on the same hardware. This significant speed-up makes it feasible to exploit the large, granular datasets available in high-frequency trading, and to perform the extensive hyperparameter sweeps required for robust and efficient MARL research in trading. We demonstrate the use of JaxMARL-HFT with independent Proximal Policy Optimization (IPPO) for a two-player environment, with an order execution and a market making agent, using one year of LOB data (400 million orders), and show that these agents learn to outperform standard benchmarks. The code for the JaxMARL-HFT framework is available on GitHub.

Suggested Citation

  • Valentin Mohl & Sascha Frey & Reuben Leyland & Kang Li & George Nigmatulin & Mihai Cucuringu & Stefan Zohren & Jakob Foerster & Anisoara Calinescu, 2025. "JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading," Papers 2511.02136, arXiv.org.
  • Handle: RePEc:arx:papers:2511.02136
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2511.02136
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Brian Ning & Franco Ho Ting Lin & Sebastian Jaimungal, 2021. "Double Deep Q-Learning for Optimal Execution," Applied Mathematical Finance, Taylor & Francis Journals, vol. 28(4), pages 361-380, July.
    2. Johann Lussange & Ivan Lazarevich & Sacha Bourgeois-Gironde & Stefano Palminteri & Boris Gutkin, 2021. "Modelling Stock Markets by Multi-agent Reinforcement Learning," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 113-147, January.
    3. Marco Avellaneda & Sasha Stoikov, 2008. "High-frequency trading in a limit order book," Quantitative Finance, Taylor & Francis Journals, vol. 8(3), pages 217-224.
    4. Gode, Dhananjay K & Sunder, Shyam, 1993. "Allocative Efficiency of Markets with Zero-Intelligence Traders: Market as a Partial Substitute for Individual Rationality," Journal of Political Economy, University of Chicago Press, vol. 101(1), pages 119-137, February.
    5. Rama Cont & Marvin S. Mueller, 2019. "A stochastic partial differential equation model for limit order book dynamics," Papers 1904.03058, arXiv.org, revised May 2021.
    6. Ciamac C. Moallemi & Muye Wang, 2022. "A reinforcement learning approach to optimal execution," Quantitative Finance, Taylor & Francis Journals, vol. 22(6), pages 1051-1069, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Leo Ardon & Nelson Vadori & Thomas Spooner & Mengda Xu & Jared Vann & Sumitra Ganesh, 2021. "Towards a fully RL-based Market Simulator," Papers 2110.06829, arXiv.org, revised Nov 2021.
    2. Xianfeng Jiao & Zizhong Li & Chang Xu & Yang Liu & Weiqing Liu & Jiang Bian, 2023. "Microstructure-Empowered Stock Factor Extraction and Utilization," Papers 2308.08135, arXiv.org.
    3. Aaron Wray & Matthew Meades & Dave Cliff, 2020. "Automated Creation of a High-Performing Algorithmic Trader via Deep Learning on Level-2 Limit Order Book Data," Papers 2012.00821, arXiv.org.
    4. Zhiyuan Yao & Zheng Li & Matthew Thomas & Ionut Florescu, 2024. "Reinforcement Learning in Agent-Based Market Simulation: Unveiling Realistic Stylized Facts and Behavior," Papers 2403.19781, arXiv.org.
    5. Bastien Baldacci & Philippe Bergault, 2021. "Optimal incentives in a limit order book: a SPDE control approach," Papers 2112.00375, arXiv.org, revised Oct 2022.
    6. Alexander Lykov & Stepan Muzychka & Kirill Vaninsky, 2016. "Investor'S Sentiment In Multi-Agent Model Of The Continuous Double Auction," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 19(06), pages 1-29, September.
    7. Nelson Vadori & Leo Ardon & Sumitra Ganesh & Thomas Spooner & Selim Amrouni & Jared Vann & Mengda Xu & Zeyu Zheng & Tucker Balch & Manuela Veloso, 2022. "Towards Multi-Agent Reinforcement Learning driven Over-The-Counter Market Simulations," Papers 2210.07184, arXiv.org, revised Aug 2023.
    8. Carè, Rosella & Cumming, Douglas, 2024. "Technology and automation in financial trading: A bibliometric review," Research in International Business and Finance, Elsevier, vol. 71(C).
    9. Marcello Monga, 2024. "Automated Market Making and Decentralized Finance," Papers 2407.16885, arXiv.org.
    10. Hamza Bodor & Laurent Carlier, 2024. "A Novel Approach to Queue-Reactive Models: The Importance of Order Sizes," Papers 2405.18594, arXiv.org.
    11. Pastushkov, A., 2025. "Evolutionary and agent-based computational finance: The new paradigms for asset pricing," Journal of the New Economic Association, New Economic Association, vol. 66(1), pages 196-222.
    12. Thomas Spooner & John Fearnley & Rahul Savani & Andreas Koukorinis, 2018. "Market Making via Reinforcement Learning," Papers 1804.04216, arXiv.org.
    13. Campi, Luciano & Zabaljauregui, Diego, 2020. "Optimal market making under partial information with general intensities," LSE Research Online Documents on Economics 104612, London School of Economics and Political Science, LSE Library.
    14. Berg, Joyce E. & Rietz, Thomas A., 2019. "Longshots, overconfidence and efficiency on the Iowa Electronic Market," International Journal of Forecasting, Elsevier, vol. 35(1), pages 271-287.
    15. Roza Galeeva & Ehud Ronn, 2022. "Oil futures volatility smiles in 2020: Why the bachelier smile is flatter," Review of Derivatives Research, Springer, vol. 25(2), pages 173-187, July.
    16. Daniele Giachini & Shabnam Mousavi & Matteo Ottaviani, 2025. "From zero-intelligence to Bayesian learning: the effect of rationality on market efficiency," Journal of Economic Interaction and Coordination, Springer;Society for Economic Science with Heterogeneous Interacting Agents, vol. 20(3), pages 659-676, July.
    17. Daniel Sutter & Daniel J. Smith, 2017. "Coordination in disaster: Nonprice learning and the allocation of resources after natural disasters," The Review of Austrian Economics, Springer;Society for the Development of Austrian Economics, vol. 30(4), pages 469-492, December.
    18. Simon, Herbert A., 2000. "Barriers and bounds to Rationality," Structural Change and Economic Dynamics, Elsevier, vol. 11(1-2), pages 243-253, July.
    19. Lovric, M. & Kaymak, U. & Spronk, J., 2008. "A Conceptual Model of Investor Behavior," ERIM Report Series Research in Management ERS-2008-030-F&A, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
    20. Makarewicz, Tomasz, 2021. "Traders, forecasters and financial instability: A model of individual learning of anchor-and-adjustment heuristics," Journal of Economic Behavior & Organization, Elsevier, vol. 190(C), pages 626-673.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2511.02136. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.