IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2602.00133.html

PredictionMarketBench: A SWE-bench-Style Framework for Backtesting Trading Agents on Prediction Markets

Author

Listed:
  • Avi Arora
  • Ritesh Malpani

Abstract

Prediction markets offer a natural testbed for trading agents: contracts have binary payoffs, prices can be interpreted as probabilities, and realized performance depends critically on market microstructure, fees, and settlement risk. We introduce PredictionMarketBench, a SWE-bench-style benchmark for evaluating algorithmic and LLM-based trading agents on prediction markets via deterministic, event-driven replay of historical limit-order-book and trade data. PredictionMarketBench standardizes (i) episode construction from raw exchange streams (orderbooks, trades, lifecycle, settlement), (ii) an execution-realistic simulator with maker/taker semantics and fee modeling, and (iii) a tool-based agent interface that supports both classical strategies and tool-calling LLM agents with reproducible trajectories. We release four Kalshi-based episodes spanning cryptocurrency, weather, and sports. Baseline results show that naive trading agents can underperform due to transaction costs and settlement losses, while fee-aware algorithmic strategies remain competitive in volatile episodes.

Suggested Citation

  • Avi Arora & Ritesh Malpani, 2026. "PredictionMarketBench: A SWE-bench-Style Framework for Backtesting Trading Agents on Prediction Markets," Papers 2602.00133, arXiv.org.
  • Handle: RePEc:arx:papers:2602.00133
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2602.00133
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Berg, Joyce E. & Nelson, Forrest D. & Rietz, Thomas A., 2008. "Prediction market accuracy in the long run," International Journal of Forecasting, Elsevier, vol. 24(2), pages 285-300.
    2. Tongkui Yu & Shu-Heng Chen, 2011. "Agent-Based Modeling of the Prediction Markets," ASSRU Discussion Papers 1119, ASSRU - Algorithmic Social Science Research Unit.
    3. Xiao-Yang Liu & Hongyang Yang & Jiechao Gao & Christina Dan Wang, 2021. "FinRL: Deep Reinforcement Learning Framework to Automate Trading in Quantitative Finance," Papers 2111.09395, arXiv.org.
    4. Xue-Zhong He & Shen Lin, 2019. "Reinforcement Learning in Limit Order Markets," Research Paper Series 403, Quantitative Finance Research Centre, University of Technology, Sydney.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Berg, Joyce E. & Rietz, Thomas A., 2019. "Longshots, overconfidence and efficiency on the Iowa Electronic Market," International Journal of Forecasting, Elsevier, vol. 35(1), pages 271-287.
    2. Siemroth, Christoph, 2014. "Why prediction markets work : The role of information acquisition and endogenous weighting," Working Papers 14-02, University of Mannheim, Department of Economics.
    3. Denter, Philipp & Sisak, Dana, 2015. "Do polls create momentum in political competition?," Journal of Public Economics, Elsevier, vol. 130(C), pages 1-14.
    4. Galanis Spyros & Kotronis Stelios, 2021. "Updating Awareness and Information Aggregation," The B.E. Journal of Theoretical Economics, De Gruyter, vol. 21(2), pages 613-635, June.
    5. Bergemann, Dirk & Ottaviani, Marco, 2021. "Information Markets and Nonmarkets," CEPR Discussion Papers 16459, C.E.P.R. Discussion Papers.
    6. Dian Yu & Jianjun Gao & Weiping Wu & Zizhuo Wang, 2022. "Price Interpretability of Prediction Markets: A Convergence Analysis," Papers 2205.08913, arXiv.org, revised Nov 2023.
    7. Khan, Urmee & Lieli, Robert P., 2018. "Information flow between prediction markets, polls and media: Evidence from the 2008 presidential primaries," International Journal of Forecasting, Elsevier, vol. 34(4), pages 696-710.
    8. Spyros Galanis & Christos A Ioannou & Stelios Kotronis, 2024. "Information Aggregation Under Ambiguity: Theory and Experimental Evidence," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 91(6), pages 3423-3467.
    9. Victor Tiberius & Christoph Rasche, 2011. "Prognosemärkte," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 21(4), pages 467-472, April.
    10. James Reade, 2014. "Information And Predictability: Bookmakers, Prediction Markets And Tipsters As Forecasters," Journal of Prediction Markets, University of Buckingham Press, vol. 8(1), pages 43-76.
    11. repec:grz:wpsses:2019-01 is not listed on IDEAS
    12. Aliakbari, Elmira & McKitrick, Ross, 2018. "Information aggregation in a prediction market for climate outcomes," Energy Economics, Elsevier, vol. 74(C), pages 97-106.
    13. Coulomb, Renaud & Sangnier, Marc, 2014. "The impact of political majorities on firm value: Do electoral promises or friendship connections matter?," Journal of Public Economics, Elsevier, vol. 115(C), pages 158-170.
    14. Dilger, Alexander, 2016. "Bedingte Aktiengeschäfte," Discussion Papers of the Institute for Organisational Economics 08/2016, University of Münster, Institute for Organisational Economics.
    15. Schadner, Wolfgang, 2022. "U.S. Politics from a multifractal perspective," Chaos, Solitons & Fractals, Elsevier, vol. 155(C).
    16. Christoph Diermann & Arnd Huchzermeier, 2017. "Case Article—Canyon Bicycles: Judgmental Demand Forecasting in Direct Sales," INFORMS Transactions on Education, INFORMS, vol. 17(2), pages 58-62, January.
    17. Kwok Ping Tsang & Zichao Yang, 2026. "Political Shocks and Price Discovery in Prediction Markets: Evidence from the 2024 U.S. Presidential Election," Papers 2603.03152, arXiv.org, revised Mar 2026.
    18. Hedtrich, F. & Loy, J.-P. & Müller, R.A.E., . "Prognosen auf Agrarmärkten: Prediction Markets – eine innovative Prognosemethode auch für die Landwirtschaft?," Proceedings “Schriften der Gesellschaft für Wirtschafts- und Sozialwissenschaften des Landbaues e.V.”, German Association of Agricultural Economists (GEWISOLA), vol. 45.
    19. Xiao-Yang Liu & Jingyang Rui & Jiechao Gao & Liuqing Yang & Hongyang Yang & Zhaoran Wang & Christina Dan Wang & Jian Guo, 2021. "FinRL-Meta: A Universe of Near-Real Market Environments for Data-Driven Deep Reinforcement Learning in Quantitative Finance," Papers 2112.06753, arXiv.org, revised Mar 2022.
    20. Andrea Albertazzi & Friederike Mengel & Ronald Peeters, 2021. "Benchmarking information aggregation in experimental markets," Economic Inquiry, Western Economic Association International, vol. 59(4), pages 1500-1516, October.
    21. Graefe, Andreas & Armstrong, J. Scott & Jones, Randall J. & Cuzán, Alfred G., 2014. "Combining forecasts: An application to elections," International Journal of Forecasting, Elsevier, vol. 30(1), pages 43-54.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2602.00133. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.