IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2601.12343.html

How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

Author

Listed:
  • Wayne Gao
  • Sukjin Han
  • Annie Liang

Abstract

Large language models (LLMs) are increasingly used to predict human behavior. We propose a measure for evaluating how much knowledge a pretrained LLM brings to such a prediction: its equivalent sample size, defined as the amount of task-specific data needed to match the predictive accuracy of the LLM. We estimate this measure by comparing the prediction error of a fixed LLM in a given domain to that of flexible machine learning models trained on increasing samples of domain-specific data. We further provide a statistical inference procedure by developing a new asymptotic theory for cross-validated prediction error. Finally, we apply this method to the Panel Study of Income Dynamics. We find that LLMs encode considerable predictive information for some economic variables but much less for others, suggesting that their value as substitutes for domain-specific data differs markedly across settings.

Suggested Citation

  • Wayne Gao & Sukjin Han & Annie Liang, 2026. "How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge," Papers 2601.12343, arXiv.org.
  • Handle: RePEc:arx:papers:2601.12343
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2601.12343
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Drew Fudenberg & Jon Kleinberg & Annie Liang & Sendhil Mullainathan, 2022. "Measuring the Completeness of Economic Models," Journal of Political Economy, University of Chicago Press, vol. 130(4), pages 956-990.
    2. Sukjin Han, 2024. "Mining Causality: AI-Assisted Search for Instrumental Variables," Papers 2409.14202, arXiv.org, revised Jun 2025.
    3. Benjamin S. Manning & Kehang Zhu & John J. Horton, 2024. "Automated Social Science: Language Models as Scientist and Subjects," NBER Working Papers 32381, National Bureau of Economic Research, Inc.
    4. Ido Erev & Alvin Roth & Robert Slonim & Greg Barron, 2007. "Learning and equilibrium as useful approximations: Accuracy of prediction on randomly selected constant sum games," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 33(1), pages 29-51, October.
    5. Benjamin S. Manning & Kehang Zhu & John J. Horton, 2024. "Automated Social Science: Language Models as Scientist and Subjects," Papers 2404.11794, arXiv.org, revised Apr 2024.
    6. Isaiah Andrews & Drew Fudenberg & Lihua Lei & Annie Liang & Chaofeng Wu, 2022. "The Transfer Performance of Economic Models," Papers 2202.04796, arXiv.org, revised Mar 2025.
    7. Stephen Bates & Trevor Hastie & Robert Tibshirani, 2024. "Cross-Validation: What Does It Estimate and How Well Does It Do It?," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(546), pages 1434-1445, April.
    8. Bruno Fava, 2025. "Training and Testing with Multiple Splits: A Central Limit Theorem for Split-Sample Estimators," Papers 2511.04957, arXiv.org, revised Nov 2025.
    9. Jens Ludwig & Sendhil Mullainathan & Ashesh Rambachan, 2024. "Large Language Models: An Applied Econometric Framework," Papers 2412.07031, arXiv.org, revised Dec 2025.
    10. Yuan Gao & Dokyun Lee & Gordon Burtch & Sina Fazelpour, 2024. "Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina," Papers 2410.19599, arXiv.org, revised Jan 2025.
    11. Argyle, Lisa P. & Busby, Ethan C. & Fulda, Nancy & Gubler, Joshua R. & Rytting, Christopher & Wingate, David, 2023. "Out of One, Many: Using Language Models to Simulate Human Samples," Political Analysis, Cambridge University Press, vol. 31(3), pages 337-351, July.
    12. John J. Horton & Apostolos Filippas & Benjamin S. Manning, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," NBER Working Papers 31122, National Bureau of Economic Research, Inc.
    13. John J. Horton & Apostolos Filippas & Benjamin S. Manning, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," Papers 2301.07543, arXiv.org, revised Feb 2026.
    14. J Aislinn Bohren & Peter Hull & Alex Imas, 2025. "Systemic Discrimination: Theory and Measurement," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 140(3), pages 1743-1799.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nikoleta Anesti & Edward Hill & Andreas Joseph, 2025. "Inflation Attitudes of Large Language Models," Papers 2512.14306, arXiv.org.
    2. Matthew O. Jackson & Qiaozhu Me & Stephanie W. Wang & Yutong Xie & Walter Yuan & Seth Benzell & Erik Brynjolfsson & Colin F. Camerer & James Evans & Brian Jabarian & Jon Kleinberg & Juanjuan Meng & Se, 2025. "AI Behavioral Science," Papers 2509.13323, arXiv.org.
    3. Didisheim, Antoine & Fraschini, Martina & Somoza, Luciano, 2025. "AI’s predictable memory in financial analysis," Economics Letters, Elsevier, vol. 256(C).
    4. Sugat Chaturvedi & Rochana Chaturvedi, 2025. "Who Gets the Callback? Generative AI and Gender Bias," Papers 2504.21400, arXiv.org.
    5. Giuseppe Matera, 2025. "Corporate Earnings Calls and Analyst Beliefs," Papers 2511.15214, arXiv.org, revised Nov 2025.
    6. Hongshen Sun & Juanjuan Zhang, 2025. "From Model Choice to Model Belief: Establishing a New Measure for LLM-Based Research," Papers 2512.23184, arXiv.org.
    7. Koji Takahashi & Joon Suk Park, 2025. "Generative AI for Surveys on Payment Apps: AIs' View on Privacy and Technology," IMES Discussion Paper Series 25-E-13, Institute for Monetary and Economic Studies, Bank of Japan.
    8. Hui Chen & Antoine Didisheim & Mohammad & Pourmohammadi & Luciano Somoza & Hanqing Tian, 2025. "A Financial Brain Scan of the LLM," Papers 2508.21285, arXiv.org, revised Feb 2026.
    9. Alexander Erlei, 2025. "From Digital Distrust to Codified Honesty: Experimental Evidence on Generative AI in Credence Goods Markets," Papers 2509.06069, arXiv.org.
    10. Alejandro Lopez-Lira & Yuehua Tang & Mingyin Zhu, 2025. "The Memorization Problem: Can We Trust LLMs' Economic Forecasts?," Papers 2504.14765, arXiv.org, revised Dec 2025.
    11. Alejandro Lopez-Lira, 2025. "Can Large Language Models Trade? Testing Financial Theories with LLM Agents in Market Simulations," Papers 2504.10789, arXiv.org.
    12. Kevin He & Ran Shorrer & Mengjia Xia, 2025. "Human Misperception of Generative-AI Alignment: A Laboratory Experiment," Papers 2502.14708, arXiv.org, revised Apr 2026.
    13. Felipe A. Csaszar & Harsh Ketkar & Hyunjin Kim, 2024. "Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors," Papers 2408.08811, arXiv.org.
    14. Benjamin S. Manning & John J. Horton, 2025. "General Social Agents," Papers 2508.17407, arXiv.org, revised Mar 2026.
    15. repec:osf:osfxxx:r3qng_v1 is not listed on IDEAS
    16. George Gui & Seungwoo Kim, 2025. "Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach," Papers 2509.25709, arXiv.org.
    17. Aliya Amirova & Theodora Fteropoulli & Nafiso Ahmed & Martin R Cowie & Joel Z Leibo, 2024. "Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity," PLOS ONE, Public Library of Science, vol. 19(3), pages 1-33, March.
    18. Anne Lundgaard Hansen & Seung Jung Lee, 2025. "Financial Stability Implications of Generative AI: Taming the Animal Spirits," Papers 2510.01451, arXiv.org.
    19. Alekseenko, Iuliia & Dagaev, Dmitry & Paklina, Sofiia & Parshakov, Petr, 2025. "Strategizing with AI: Insights from a beauty contest experiment," Journal of Economic Behavior & Organization, Elsevier, vol. 240(C).
    20. Hua Li & Qifang Wang & Ye Wu, 2025. "From Mobile Media to Generative AI: The Evolutionary Logic of Computational Social Science Across Data, Methods, and Theory," Mathematics, MDPI, vol. 13(19), pages 1-17, September.
    21. Ben Weidmann & Yixian Xu & David J. Deming, 2025. "Measuring Human Leadership Skills with Artificially Intelligent Agents," Papers 2508.02966, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.12343. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.