How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

Author

Listed:

Wayne Gao
Sukjin Han
Annie Liang

Registered:

Sukjin Han

Abstract

Large language models (LLMs) are increasingly used to predict human behavior. We propose a measure for evaluating how much knowledge a pretrained LLM brings to such a prediction: its equivalent sample size, defined as the amount of task-specific data needed to match the predictive accuracy of the LLM. We estimate this measure by comparing the prediction error of a fixed LLM in a given domain to that of flexible machine learning models trained on increasing samples of domain-specific data. We further provide a statistical inference procedure by developing a new asymptotic theory for cross-validated prediction error. Finally, we apply this method to the Panel Study of Income Dynamics. We find that LLMs encode considerable predictive information for some economic variables but much less for others, suggesting that their value as substitutes for domain-specific data differs markedly across settings.

Suggested Citation

Wayne Gao & Sukjin Han & Annie Liang, 2026. "How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge," Papers 2601.12343, arXiv.org.

Handle: RePEc:arx:papers:2601.12343

Download full text from publisher

References listed on IDEAS

Drew Fudenberg & Jon Kleinberg & Annie Liang & Sendhil Mullainathan, 2022. "Measuring the Completeness of Economic Models," Journal of Political Economy, University of Chicago Press, vol. 130(4), pages 956-990.
Sukjin Han, 2024. "Mining Causality: AI-Assisted Search for Instrumental Variables," Papers 2409.14202, arXiv.org, revised Jun 2025.
Benjamin S. Manning & Kehang Zhu & John J. Horton, 2024. "Automated Social Science: Language Models as Scientist and Subjects," NBER Working Papers 32381, National Bureau of Economic Research, Inc.
Ido Erev & Alvin Roth & Robert Slonim & Greg Barron, 2007. "Learning and equilibrium as useful approximations: Accuracy of prediction on randomly selected constant sum games," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 33(1), pages 29-51, October.
Benjamin S. Manning & Kehang Zhu & John J. Horton, 2024. "Automated Social Science: Language Models as Scientist and Subjects," Papers 2404.11794, arXiv.org, revised Apr 2024.
Isaiah Andrews & Drew Fudenberg & Lihua Lei & Annie Liang & Chaofeng Wu, 2022. "The Transfer Performance of Economic Models," Papers 2202.04796, arXiv.org, revised Mar 2025.
Stephen Bates & Trevor Hastie & Robert Tibshirani, 2024. "Cross-Validation: What Does It Estimate and How Well Does It Do It?," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(546), pages 1434-1445, April.
Bruno Fava, 2025. "Training and Testing with Multiple Splits: A Central Limit Theorem for Split-Sample Estimators," Papers 2511.04957, arXiv.org, revised Nov 2025.
Jens Ludwig & Sendhil Mullainathan & Ashesh Rambachan, 2024. "Large Language Models: An Applied Econometric Framework," Papers 2412.07031, arXiv.org, revised Dec 2025.
- Jens Ludwig & Sendhil Mullainathan & Ashesh Rambachan, 2025. "Large Language Models: An Applied Econometric Framework," NBER Working Papers 33344, National Bureau of Economic Research, Inc.
Yuan Gao & Dokyun Lee & Gordon Burtch & Sina Fazelpour, 2024. "Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina," Papers 2410.19599, arXiv.org, revised Jan 2025.
Argyle, Lisa P. & Busby, Ethan C. & Fulda, Nancy & Gubler, Joshua R. & Rytting, Christopher & Wingate, David, 2023. "Out of One, Many: Using Language Models to Simulate Human Samples," Political Analysis, Cambridge University Press, vol. 31(3), pages 337-351, July.
John J. Horton & Apostolos Filippas & Benjamin S. Manning, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," NBER Working Papers 31122, National Bureau of Economic Research, Inc.
John J. Horton & Apostolos Filippas & Benjamin S. Manning, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," Papers 2301.07543, arXiv.org, revised Feb 2026.
J Aislinn Bohren & Peter Hull & Alex Imas, 2025. "Systemic Discrimination: Theory and Measurement," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 140(3), pages 1743-1799.
- Bohren, Aislinn & Hull, Peter & Imas, Alex, 2022. "Systemic Discrimination: Theory and Measurement," CEPR Discussion Papers 17136, Centre for Economic Policy Research.
- J. Aislinn Bohren & Peter Hull & Alex Imas, 2022. "Systemic Discrimination: Theory and Measurement," PIER Working Paper Archive 22-009, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
- J. Aislinn Bohren & Peter Hull & Alex Imas, 2022. "Systemic Discrimination: Theory and Measurement," NBER Working Papers 29820, National Bureau of Economic Research, Inc.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Gonzalo Ballestero & Hadi Hosseini & Samarth Khanna & Ran I. Shorrer, 2026. "Strategic Algorithmic Monoculture: Experimental Evidence from Coordination Games," Papers 2604.09502, arXiv.org, revised Apr 2026.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Nikoleta Anesti & Edward Hill & Andreas Joseph, 2025. "Inflation Attitudes of Large Language Models," Papers 2512.14306, arXiv.org.
Matthew O. Jackson & Qiaozhu Me & Stephanie W. Wang & Yutong Xie & Walter Yuan & Seth Benzell & Erik Brynjolfsson & Colin F. Camerer & James Evans & Brian Jabarian & Jon Kleinberg & Juanjuan Meng & Se, 2025. "AI Behavioral Science," Papers 2509.13323, arXiv.org, revised May 2026.
Didisheim, Antoine & Fraschini, Martina & Somoza, Luciano, 2025. "AI’s predictable memory in financial analysis," Economics Letters, Elsevier, vol. 256(C).
Sugat Chaturvedi & Rochana Chaturvedi, 2025. "Who Gets the Callback? Generative AI and Gender Bias," Papers 2504.21400, arXiv.org.
Giuseppe Matera, 2025. "Corporate Earnings Calls and Analyst Beliefs," Papers 2511.15214, arXiv.org, revised Nov 2025.
Felipe A. Csaszar & Harsh Ketkar & Hyunjin Kim, 2024. "Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors," Strategy Science, INFORMS, vol. 9(4), pages 322-345, December.
Hongshen Sun & Juanjuan Zhang, 2025. "From Model Choice to Model Belief: Establishing a New Measure for LLM-Based Research," Papers 2512.23184, arXiv.org.
Koji Takahashi & Joon Suk Park, 2025. "Generative AI for Surveys on Payment Apps: AIs' View on Privacy and Technology," IMES Discussion Paper Series 25-E-13, Institute for Monetary and Economic Studies, Bank of Japan.
Hui Chen & Antoine Didisheim & Mohammad & Pourmohammadi & Luciano Somoza & Hanqing Tian, 2025. "A Financial Brain Scan of the LLM," Papers 2508.21285, arXiv.org, revised Feb 2026.
Alexander Erlei, 2025. "From Digital Distrust to Codified Honesty: Experimental Evidence on Generative AI in Credence Goods Markets," Papers 2509.06069, arXiv.org.
Alejandro Lopez-Lira & Yuehua Tang & Mingyin Zhu, 2025. "The Memorization Problem: Can We Trust LLMs' Economic Forecasts?," Papers 2504.14765, arXiv.org, revised Dec 2025.
Alejandro Lopez-Lira, 2025. "Can Large Language Models Trade? Testing Financial Theories with LLM Agents in Market Simulations," Papers 2504.10789, arXiv.org.
Kevin He & Ran Shorrer & Mengjia Xia, 2025. "Human Misperception of Generative-AI Alignment: A Laboratory Experiment," Papers 2502.14708, arXiv.org, revised Apr 2026.
- Kevin He & Ran Shorrer & Mengjia Xia, 2025. "Human Misperception of Generative-AI Alignment:A Laboratory Experiment," PIER Working Paper Archive 25-019, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
Felipe A. Csaszar & Harsh Ketkar & Hyunjin Kim, 2024. "Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors," Papers 2408.08811, arXiv.org.
Benjamin S. Manning & John J. Horton, 2025. "General Social Agents," Papers 2508.17407, arXiv.org, revised Mar 2026.
repec:osf:osfxxx:r3qng_v1 is not listed on IDEAS
George Gui & Seungwoo Kim, 2025. "Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach," Papers 2509.25709, arXiv.org.
Gonzalo Ballestero & Hadi Hosseini & Samarth Khanna & Ran I. Shorrer, 2026. "Strategic Algorithmic Monoculture: Experimental Evidence from Coordination Games," Papers 2604.09502, arXiv.org, revised Apr 2026.
Aliya Amirova & Theodora Fteropoulli & Nafiso Ahmed & Martin R Cowie & Joel Z Leibo, 2024. "Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity," PLOS ONE, Public Library of Science, vol. 19(3), pages 1-33, March.
Anne Lundgaard Hansen & Seung Jung Lee, 2025. "Financial Stability Implications of Generative AI: Taming the Animal Spirits," Papers 2510.01451, arXiv.org.
Alekseenko, Iuliia & Dagaev, Dmitry & Paklina, Sofiia & Parshakov, Petr, 2025. "Strategizing with AI: Insights from a beauty contest experiment," Journal of Economic Behavior & Organization, Elsevier, vol. 240(C).
- Iuliia Alekseenko & Dmitry Dagaev & Sofia Paklina & Petr Parshakov, 2025. "Strategizing with AI: Insights from a Beauty Contest Experiment," Papers 2502.03158, arXiv.org, revised Oct 2025.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2026-02-09 (Artificial Intelligence)
NEP-BIG-2026-02-09 (Big Data)
NEP-CMP-2026-02-09 (Computational Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.12343. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data