IDEAS home Printed from https://ideas.repec.org/p/fip/fedgfe/2025-44.html

Total Recall? Evaluating the Macroeconomic Knowledge of Large Language Models

Author

Abstract

We evaluate the ability of large language models (LLMs) to estimate historical macroeconomic variables and data release dates. We find that LLMs have precise knowledge of some recent statistics, but performance degrades as we go farther back in history. We highlight two particularly important kinds of recall errors: mixing together first print data with subsequent revisions (i.e., smoothing across vintages) and mixing data for past and future reference periods (i.e., smoothing within vintages). We also find that LLMs can often recall individual data release dates accurately, but aggregating across series shows that on any given day the LLM is likely to believe it has data in hand which has not been released. Our results indicate that while LLMs have impressively accurate recall, their errors point to some limitations when used for historical analysis or to mimic real time forecasters.

Suggested Citation

  • Leland D. Crane & Akhil Karra & Paul E. Soto, 2025. "Total Recall? Evaluating the Macroeconomic Knowledge of Large Language Models," Finance and Economics Discussion Series 2025-044, Board of Governors of the Federal Reserve System (U.S.).
  • Handle: RePEc:fip:fedgfe:2025-44
    DOI: 10.17016/FEDS.2025.044
    as

    Download full text from publisher

    File URL: https://www.federalreserve.gov/econres/feds/files/2025044pap.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.17016/FEDS.2025.044?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Benjamin S. Manning & Kehang Zhu & John J. Horton, 2024. "Automated Social Science: Language Models as Scientist and Subjects," Papers 2404.11794, arXiv.org, revised Apr 2024.
    2. Miguel Faria-e-Castro & Fernando Leibovici, 2024. "Artificial Intelligence and Inflation Forecasts," Review, Federal Reserve Bank of St. Louis, vol. 106(12), pages 1-14, November.
    3. Anton Korinek, 2023. "Generative AI for Economic Research: Use Cases and Implications for Economists," Journal of Economic Literature, American Economic Association, vol. 61(4), pages 1281-1317, December.
    4. Paul Glasserman & Caden Lin, 2023. "Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis," Papers 2309.17322, arXiv.org.
    5. Alejandro Lopez-Lira & Yuehua Tang & Mingyin Zhu, 2025. "The Memorization Problem: Can We Trust LLMs' Economic Forecasts?," Papers 2504.14765, arXiv.org, revised Dec 2025.
    6. Benjamin S. Manning & Kehang Zhu & John J. Horton, 2024. "Automated Social Science: Language Models as Scientist and Subjects," NBER Working Papers 32381, National Bureau of Economic Research, Inc.
    7. Van Pham & Scott Cunningham, 2024. "Can Base ChatGPT be Used for Forecasting without Additional Optimization?," Papers 2404.07396, arXiv.org, revised Jul 2024.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. M.Jahangir Alam & Shane Boyle & Huiyu Li & Tatevik Sekhposyan, 2026. "ChatMacro: Evaluating Inflation Forecasts of Generative AI," Working Paper Series 2026-04, Federal Reserve Bank of San Francisco.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Didisheim, Antoine & Fraschini, Martina & Somoza, Luciano, 2025. "AI’s predictable memory in financial analysis," Economics Letters, Elsevier, vol. 256(C).
    2. Nikoleta Anesti & Edward Hill & Andreas Joseph, 2025. "Inflation Attitudes of Large Language Models," Papers 2512.14306, arXiv.org.
    3. Alejandro Lopez-Lira & Yuehua Tang & Mingyin Zhu, 2025. "The Memorization Problem: Can We Trust LLMs' Economic Forecasts?," Papers 2504.14765, arXiv.org, revised Dec 2025.
    4. Alexander Eliseev & Sergei Seleznev, 2026. "Fake Date Tests: Can We Trust In-sample Accuracy of LLMs in Macroeconomic Forecasting?," Papers 2601.07992, arXiv.org, revised Mar 2026.
    5. Gonzalo Ballestero & Hadi Hosseini & Samarth Khanna & Ran I. Shorrer, 2026. "Strategic Algorithmic Monoculture: Experimental Evidence from Coordination Games," Papers 2604.09502, arXiv.org, revised Apr 2026.
    6. Giuseppe Matera, 2025. "Corporate Earnings Calls and Analyst Beliefs," Papers 2511.15214, arXiv.org, revised Nov 2025.
    7. Matthew O. Jackson & Qiaozhu Me & Stephanie W. Wang & Yutong Xie & Walter Yuan & Seth Benzell & Erik Brynjolfsson & Colin F. Camerer & James Evans & Brian Jabarian & Jon Kleinberg & Juanjuan Meng & Se, 2025. "AI Behavioral Science," Papers 2509.13323, arXiv.org.
    8. Mostapha Benhenda, 2026. "Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance," Papers 2601.13770, arXiv.org.
    9. Zhenyu Gao & Wenxi Jiang & Yutong Yan, 2026. "Debiasing LLMs by Fine-tuning," Papers 2604.02921, arXiv.org, revised May 2026.
    10. So Kuroki & Yingtao Tian & Kou Misaki & Takashi Ikegami & Takuya Akiba & Yujin Tang, 2025. "Reimagining Agent-based Modeling with Large Language Model Agents via Shachi," Papers 2509.21862, arXiv.org, revised Oct 2025.
    11. Sugat Chaturvedi & Rochana Chaturvedi, 2025. "Who Gets the Callback? Generative AI and Gender Bias," Papers 2504.21400, arXiv.org.
    12. Dong, Mengming Michael & Stratopoulos, Theophanis C. & Wang, Victor Xiaoqi, 2024. "A scoping review of ChatGPT research in accounting and finance," International Journal of Accounting Information Systems, Elsevier, vol. 55(C).
    13. Alexander Erlei, 2025. "From Digital Distrust to Codified Honesty: Experimental Evidence on Generative AI in Credence Goods Markets," Papers 2509.06069, arXiv.org.
    14. Sophia Kazinnik & Tara M. Sinclair, 2025. "FOMC In Silico: A Multi-Agent System for Monetary Policy Decision Modeling," Working Papers 2025-005, The George Washington University, The Center for Economic Research.
    15. Wayne Gao & Sukjin Han & Annie Liang, 2026. "How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge," Papers 2601.12343, arXiv.org.
    16. Alejandro Lopez-Lira, 2025. "Can Large Language Models Trade? Testing Financial Theories with LLM Agents in Market Simulations," Papers 2504.10789, arXiv.org.
    17. Jieshu Wang & Andrew Maynard, 2025. "Gender disparity in U.S. patenting," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 12(1), pages 1-26, December.
    18. Kevin He & Ran Shorrer & Mengjia Xia, 2025. "Human Misperception of Generative-AI Alignment: A Laboratory Experiment," Papers 2502.14708, arXiv.org, revised Apr 2026.
    19. Jian-Qiao Zhu & Haijiang Yan & Thomas L. Griffiths, 2024. "Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice," Papers 2405.19313, arXiv.org, revised May 2025.
    20. Felipe A. Csaszar & Harsh Ketkar & Hyunjin Kim, 2024. "Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors," Papers 2408.08811, arXiv.org.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    JEL classification:

    • C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods
    • C80 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - General
    • E37 - Macroeconomics and Monetary Economics - - Prices, Business Fluctuations, and Cycles - - - Forecasting and Simulation: Models and Applications

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:fip:fedgfe:2025-44. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Ryan Wolfslayer ; Keisha Fournillier (email available below). General contact details of provider: https://edirc.repec.org/data/frbgvus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.