IDEAS home Printed from https://ideas.repec.org/a/inm/orstsc/v11y2026i1p93-117.html

How Well Can AI Do Strategy? Empirical Benchmarking Using Strategy Simulations

Author

Listed:
  • Ryan T. Allen

    (Management Department, Marriott School of Business, Brigham Young University, Provo, Utah 84602)

  • Rory M. McDonald

    (Strategy, Ethics, and Entrepreneurship, Darden School of Business, University of Virginia, Charlottesville, Virginia 22903)

Abstract

Benchmarks have helped fuel rapid progress in large language models (LLMs) across a variety of domains including math, science, dialogue, and coding. Yet no existing benchmark adequately captures the defining elements of strategic decision making: uncertainty, complexity, irreversible multiperiod moves, and delayed or noisy feedback. This gap limits our ability to assess and guide LLMs’ capabilities in strategy. We propose that established strategy teaching simulations provide an ideal benchmarking approach because (1) they approximate the essential features of real-world strategy, and (2) they do so in a controlled, replicable environment suitable for evaluation. To demonstrate this, we assess the performance of 21 proprietary and 13 open-source LLMs on the Back Bay Battery (BBB) simulation, a widely used exercise in strategy and innovation courses. The simulation requires balancing short-term profitability against long-term competitive positioning while integrating complex information about customer preferences and technological change. We built an interface enabling LLMs to interact with the simulation as though encountering it for the first time, masking identifiers to reduce contamination from prior training data. Our results show clear progress in composite BBB performance: Later models generally outperform earlier versions, and reasoning-focused models from late 2024–early 2025 (e.g., o4-mini, Claude Sonnet 4, Gemini 2.0 Flash) exceed even the average scores of historical MBA student cohorts. However, frontier models from mid-to-late 2025 (e.g., GPT-5, Claude Opus 4.5, Gemini 3) have declined, underperforming both earlier LLMs and MBA students. This decline is partially explained by a systematic bias toward exploiting the core business at the expense of investing in future growth. Overall, these findings highlight impressive advances in LLMs’ strategic abilities since their inception. At the same time, we document current frontier models’ surprising weakness in managing strategic uncertainty. This paper pioneers and provides guidance for using simulation-based benchmarking as a productive framework for strategy researchers to track progress, identify blind spots, and shape the trajectory of strategy-specific LLM capabilities.

Suggested Citation

  • Ryan T. Allen & Rory M. McDonald, 2026. "How Well Can AI Do Strategy? Empirical Benchmarking Using Strategy Simulations," Strategy Science, INFORMS, vol. 11(1), pages 93-117, March.
  • Handle: RePEc:inm:orstsc:v:11:y:2026:i:1:p:93-117
    DOI: 10.1287/stsc.2025.0444
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/stsc.2025.0444
    Download Restriction: no

    File URL: https://libkey.io/10.1287/stsc.2025.0444?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orstsc:v:11:y:2026:i:1:p:93-117. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.