IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2508.02247.html
   My bibliography  Save this paper

ByteGen: A Tokenizer-Free Generative Model for Orderbook Events in Byte Space

Author

Listed:
  • Yang Li
  • Zhi Chen

Abstract

Generative modeling of high-frequency limit order book (LOB) dynamics is a critical yet unsolved challenge in quantitative finance, essential for robust market simulation and strategy backtesting. Existing approaches are often constrained by simplifying stochastic assumptions or, in the case of modern deep learning models like Transformers, rely on tokenization schemes that affect the high-precision, numerical nature of financial data through discretization and binning. To address these limitations, we introduce ByteGen, a novel generative model that operates directly on the raw byte streams of LOB events. Our approach treats the problem as an autoregressive next-byte prediction task, for which we design a compact and efficient 32-byte packed binary format to represent market messages without information loss. The core novelty of our work is the complete elimination of feature engineering and tokenization, enabling the model to learn market dynamics from its most fundamental representation. We achieve this by adapting the H-Net architecture, a hybrid Mamba-Transformer model that uses a dynamic chunking mechanism to discover the inherent structure of market messages without predefined rules. Our primary contributions are: 1) the first end-to-end, byte-level framework for LOB modeling; 2) an efficient packed data representation; and 3) a comprehensive evaluation on high-frequency data. Trained on over 34 million events from CME Bitcoin futures, ByteGen successfully reproduces key stylized facts of financial markets, generating realistic price distributions, heavy-tailed returns, and bursty event timing. Our findings demonstrate that learning directly from byte space is a promising and highly flexible paradigm for modeling complex financial systems, achieving competitive performance on standard market quality metrics without the biases of tokenization.

Suggested Citation

  • Yang Li & Zhi Chen, 2025. "ByteGen: A Tokenizer-Free Generative Model for Orderbook Events in Byte Space," Papers 2508.02247, arXiv.org, revised Aug 2025.
  • Handle: RePEc:arx:papers:2508.02247
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2508.02247
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rama Cont & Sasha Stoikov & Rishi Talreja, 2010. "A Stochastic Model for Order Book Dynamics," Operations Research, INFORMS, vol. 58(3), pages 549-563, June.
    2. Justin Sirignano & Rama Cont, 2019. "Universal features of price formation in financial markets: perspectives from deep learning," Quantitative Finance, Taylor & Francis Journals, vol. 19(9), pages 1449-1459, September.
    3. Rama Cont, 2007. "Volatility Clustering in Financial Markets: Empirical Facts and Agent-Based Models," Springer Books, in: Gilles Teyssière & Alan P. Kirman (ed.), Long Memory in Economics, pages 289-309, Springer.
    4. Peer Nagy & Sascha Frey & Silvia Sapora & Kang Li & Anisoara Calinescu & Stefan Zohren & Jakob Foerster, 2023. "Generative AI for End-to-End Limit Order Book Modelling: A Token-Level Autoregressive Generative Model of Message Flow Using a Deep State Space Network," Papers 2309.00638, arXiv.org.
    5. Aaron Wheeler & Jeffrey D. Varner, 2024. "MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series," Papers 2411.16585, arXiv.org.
    6. Weibing Huang & Charles-Albert Lehalle & Mathieu Rosenbaum, 2015. "Simulating and Analyzing Order Book Data: The Queue-Reactive Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 107-122, March.
    7. Yang Li & Zhi Chen, 2025. "FlowOE: Imitation Learning with Flow Policy from Ensemble RL Experts for Optimal Execution under Heston Volatility and Concave Market Impacts," Papers 2506.05755, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hamza Bodor & Laurent Carlier, 2025. "Deep Learning Meets Queue-Reactive: A Framework for Realistic Limit Order Book Simulation," Papers 2501.08822, arXiv.org.
    2. Leonardo Berti & Bardh Prenkaj & Paola Velardi, 2025. "TRADES: Generating Realistic Market Simulations with Diffusion Models," Papers 2502.07071, arXiv.org, revised Nov 2025.
    3. Peng Wu & Marcello Rambaldi & Jean-Franc{c}ois Muzy & Emmanuel Bacry, 2019. "Queue-reactive Hawkes models for the order flow," Papers 1901.08938, arXiv.org.
    4. Marcello Rambaldi & Emmanuel Bacry & Jean-Franc{c}ois Muzy, 2018. "Disentangling and quantifying market participant volatility contributions," Papers 1807.07036, arXiv.org.
    5. Michael Giegrich & Roel Oomen & Christoph Reisinger, 2024. "Limit Order Book Simulation and Trade Evaluation with $K$-Nearest-Neighbor Resampling," Papers 2409.06514, arXiv.org.
    6. Zijian Shi & John Cartlidge, 2024. "Neural stochastic agent‐based limit order book simulation with neural point process and diffusion probabilistic model," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 31(2), June.
    7. Philippe Bergault & Enzo Cogn'eville, 2024. "Simulating and analyzing a sparse order book: an application to intraday electricity markets," Papers 2410.06839, arXiv.org.
    8. Johann Lussange & Stefano Vrizzi & Sacha Bourgeois-Gironde & Stefano Palminteri & Boris Gutkin, 2023. "Stock Price Formation: Precepts from a Multi-Agent Reinforcement Learning Model," Computational Economics, Springer;Society for Computational Economics, vol. 61(4), pages 1523-1544, April.
    9. Paul Jusselin & Mathieu Rosenbaum, 2020. "No‐arbitrage implies power‐law market impact and rough volatility," Mathematical Finance, Wiley Blackwell, vol. 30(4), pages 1309-1336, October.
    10. Johann Lussange & Ivan Lazarevich & Sacha Bourgeois-Gironde & Stefano Palminteri & Boris Gutkin, 2021. "Modelling Stock Markets by Multi-agent Reinforcement Learning," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 113-147, January.
    11. Xiaofei Lu & Frédéric Abergel, 2017. "Limit order book modelling with high dimensional Hawkes processes," Working Papers hal-01512430, HAL.
    12. Weibing Huang & Sergio Pulido & Mathieu Rosenbaum & Pamela Saliba & Emmanouil Sfendourakis, 2019. "From Glosten-Milgrom to the whole limit order book and applications to financial regulation," Papers 1902.10743, arXiv.org, revised Mar 2025.
    13. repec:hal:wpaper:hal-03968767 is not listed on IDEAS
    14. Julius Bonart & Martin D. Gould, 2017. "Latency and liquidity provision in a limit order book," Quantitative Finance, Taylor & Francis Journals, vol. 17(10), pages 1601-1616, October.
    15. Clinet, Simon & Yoshida, Nakahiro, 2017. "Statistical inference for ergodic point processes and application to Limit Order Book," Stochastic Processes and their Applications, Elsevier, vol. 127(6), pages 1800-1839.
    16. Peng Wu & Marcello Rambaldi & Jean-François Muzy & Emmanuel Bacry, 2023. "A single queue-reactive Hawkes model for the order flow," Post-Print hal-02409073, HAL.
    17. Federico Gonzalez & Mark Schervish, 2017. "Instantaneous order impact and high-frequency strategy optimization in limit order books," Papers 1707.01167, arXiv.org, revised Oct 2017.
    18. Bonart, Julius & Lillo, Fabrizio, 2018. "A continuous and efficient fundamental price on the discrete order book grid," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 698-713.
    19. Ye-Sheen Lim & Denise Gorse, 2021. "Intra-Day Price Simulation with Generative Adversarial Modelling of the Order Flow," Papers 2109.13905, arXiv.org.
    20. Peng Wu & Marcello Rambaldi & Jean-François Muzy & Emmanuel Bacry, 2021. "Queue-reactive Hawkes models for the order flow," Working Papers hal-02409073, HAL.
    21. Julius Bonart & Martin Gould, 2015. "Latency and liquidity provision in a limit order book," Papers 1511.04116, arXiv.org, revised Jun 2016.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2508.02247. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.