Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks

My bibliography Save this paper

Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks

Author

Listed:

Qiang Chen
Tianyang Han
Jin Li
Ye Luo
Yuxiao Wu
Xiaowei Zhang
Tuo Zhou

Registered:

Ye Luo

Abstract

Can AI effectively perform complex econometric analysis traditionally requiring human expertise? This paper evaluates AI agents' capability to master econometrics, focusing on empirical analysis performance. We develop an ``Econometrics AI Agent'' built on the open-source MetaGPT framework. This agent exhibits outstanding performance in: (1) planning econometric tasks strategically, (2) generating and executing code, (3) employing error-based reflection for improved robustness, and (4) allowing iterative refinement through multi-round conversations. We construct two datasets from academic coursework materials and published research papers to evaluate performance against real-world challenges. Comparative testing shows our domain-specialized AI agent significantly outperforms both benchmark large language models (LLMs) and general-purpose AI agents. This work establishes a testbed for exploring AI's impact on social science research and enables cost-effective integration of domain expertise, making advanced econometric methods accessible to users with minimal coding skills. Furthermore, our AI agent enhances research reproducibility and offers promising pedagogical applications for econometrics teaching.

Suggested Citation

Qiang Chen & Tianyang Han & Jin Li & Ye Luo & Yuxiao Wu & Xiaowei Zhang & Tuo Zhou, 2025. "Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks," Papers 2506.00856, arXiv.org, revised Jun 2025.

Handle: RePEc:arx:papers:2506.00856

Download full text from publisher

References listed on IDEAS

Erik Brynjolfsson & Danielle Li & Lindsey Raymond, 2025. "Generative AI at Work," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 140(2), pages 889-942.
- Erik Brynjolfsson & Danielle Li & Lindsey Raymond, 2023. "Generative AI at Work," Papers 2304.11771, arXiv.org, revised Nov 2024.
- Brynjolfsson, Erik & Li, Danielle & Raymond, Lindsey R., 2023. "Generative AI at Work," Research Papers 4141, Stanford University, Graduate School of Business.
- Erik Brynjolfsson & Danielle Li & Lindsey R. Raymond, 2023. "Generative AI at Work," NBER Working Papers 31161, National Bureau of Economic Research, Inc.
Margarita Leib & Nils Köbis & Rainer Michael Rilke & Marloes Hagens & Bernd Irlenbusch, 2023. "Corrupted by Algorithms? How AI-generated and Human-written Advice Shape (Dis)honesty," ECONtribute Discussion Papers Series 251, University of Bonn and University of Cologne, Germany.
Douglas Almond & Kenneth Y. Chay & David S. Lee, 2005. "The Costs of Low Birth Weight," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 120(3), pages 1031-1083.
- Douglas Almond & Kenneth Y. Chay & David S. Lee, 2004. "The Costs of Low Birth Weight," NBER Working Papers 10552, National Bureau of Economic Research, Inc.
Gorodnichenko, Yuriy & Pham, Tho & Talavera, Oleksandr, 2025. "Central bank communication on social media: What, to whom, and how?," Journal of Econometrics, Elsevier, vol. 249(PC).
- Yuriy Gorodnichenko & Tho Pham & Oleksandr Talavera, 2021. "Central Bank Communication on Social Media: What, To Whom, and How?," Discussion Papers 21-05, Department of Economics, University of Birmingham.
Ahrens, Maximilian & Erdemlioglu, Deniz & McMahon, Michael & Neely, Christopher J. & Yang, Xiye, 2025. "Mind your language: Market responses to central bank speeches," Journal of Econometrics, Elsevier, vol. 249(PC).
- Ahrens, Maximilian & Erdemlioglu, Deniz & Mcmahon, Michael & Neely, Christopher J & Yang, Xiye, 2023. "Mind Your Language: Market Responses to Central Bank Speeches," CEPR Discussion Papers 18191, C.E.P.R. Discussion Papers.
- Maximilian Ahrens & Deniz Erdemlioglu & Michael McMahon & Christopher J. Neely & Xiye Yang, 2023. "Mind Your Language: Market Responses to Central Bank Speeches," Working Papers 2023-013, Federal Reserve Bank of St. Louis, revised 28 Sep 2024.
Margarita Leib & Nils Kobis & Rainer Michael Rilke & Marloes Hagens & Bernd Irlenbusch, 2023. "Corrupted by Algorithms? How AI-generated and Human-written Advice Shape (Dis)honesty," Papers 2301.01954, arXiv.org.
Card, David & Sullivan, Daniel G, 1988. "Measuring the Effect of Subsidized Training Programs on Movements in and out of Employment," Econometrica, Econometric Society, vol. 56(3), pages 497-530, May.
- David Card & Daniel Sullivan, 1987. "Measuring the Effect of Subsidized Training Programs on Movements In andOut of Employment," NBER Working Papers 2173, National Bureau of Economic Research, Inc.
Leib, Margarita & Köbis, Nils & Rilke, Rainer Michael & Hagens, Marloes & Irlenbusch, Bernd, 2023. "Corrupted by Algorithms? How AI-Generated and Human-Written Advice Shape (Dis)Honesty," IZA Discussion Papers 16293, Institute of Labor Economics (IZA).
Cattaneo, Matias D., 2010. "Efficient semiparametric estimation of multi-valued treatment effects under ignorability," Journal of Econometrics, Elsevier, vol. 155(2), pages 138-154, April.
Anton Korinek, 2023. "Generative AI for Economic Research: Use Cases and Implications for Economists," Journal of Economic Literature, American Economic Association, vol. 61(4), pages 1281-1317, December.
Noailly, Joëlle & Nowzohour, Laura & van den Heuvel, Matthias & Pla, Ireneu, 2024. "Heard the news? Environmental policy and clean investments," Journal of Public Economics, Elsevier, vol. 238(C).
- Joelle Noailly; Laura Nowzohour; Matthias van den Heuvel, 2021. "Heard the News? Environmental Policy and Clean Investments," CIES Research Paper series 70-2021, Centre for International Environmental Studies, The Graduate Institute.
Niu, Yimeng & Wu, Jing & Jiang, Shenyang & Jiang, Zhibin, 2025. "The bullwhip effect in servitized manufacturers," Other publications TiSEM 638f661c-295a-4ada-8c33-a, Tilburg University, School of Economics and Management.
Ali Goli & Amandeep Singh, 2024. "Frontiers: Can Large Language Models Capture Human Preferences?," Marketing Science, INFORMS, vol. 43(4), pages 709-722, July.
Thomas Graeber & Christopher Roth & Florian Zimmermann, 2024. "Stories, Statistics, and Memory," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 139(4), pages 2181-2225.
- Graeber, Thomas & Roth, Christopher & Zimmermann, Florian, 2022. "Stories, Statistics, and Memory," CEPR Discussion Papers 17683, C.E.P.R. Discussion Papers.
- Thomas Graeber & Christopher Roth & Florian Zimmermann & Thomas W. Graeber, 2022. "Stories, Statistics, and Memory," CESifo Working Paper Series 10107, CESifo.
- Thomas Graeber & Christopher Roth & Florian Zimmermann, 2023. "Stories, statistics, and memory," ECONtribute Policy Brief Series 045, University of Bonn and University of Cologne, Germany.
- Thomas Graeber & Christopher Roth & Florian Zimmermann, 2022. "Stories, Statistics, and Memory," ECONtribute Discussion Papers Series 208, University of Bonn and University of Cologne, Germany.
Peiyao Li & Noah Castelo & Zsolt Katona & Miklos Sarvary, 2024. "Frontiers: Determining the Validity of Large Language Models for Automated Perceptual Analysis," Marketing Science, INFORMS, vol. 43(2), pages 254-266, March.
Anderson, Michael, 2008. "Safety for whom? The effects of light trucks on traffic fatalities," Journal of Health Economics, Elsevier, vol. 27(4), pages 973-989, July.
Yang Chen & Samuel N. Kirshner & Anton Ovchinnikov & Meena Andiappan & Tracy Jenkin, 2025. "A Manager and an AI Walk into a Bar: Does ChatGPT Make Biased Decisions Like We Do?," Manufacturing & Service Operations Management, INFORMS, vol. 27(2), pages 354-368, March.
repec:cdl:agrebk:qt1mb4m7qz is not listed on IDEAS
Ishita Chakraborty & Khai Chiong & Howard Dover & K. Sudhir, 2025. "Can AI and AI-Hybrids Detect Persuasion Skills? Salesforce Hiring with Conversational Video Interviews," Marketing Science, INFORMS, vol. 44(1), pages 30-53, January.
Gmyrek, Pawel, & Lutz, Christoph, & Newlands, Gemma,, 2024. "A Technological Construction of Society comparing GPT-4 and human respondents for occupational evaluation in the UK," ILO Working Papers 995347793502676, International Labour Organization.
Bryan K. Stroube & David M. Waguespack, 2024. "Status and consensus: Heterogeneity in audience evaluations of female‐ versus male‐lead films," Strategic Management Journal, Wiley Blackwell, vol. 45(5), pages 994-1024, May.
Matias D. Cattaneo, 2010. "multi-valued treatment effects," The New Palgrave Dictionary of Economics,, Palgrave Macmillan.
Yimeng Niu & Jing Wu & Shenyang Jiang & Zhibin Jiang, 2025. "The Bullwhip Effect in Servitized Manufacturers," Management Science, INFORMS, vol. 71(1), pages 1-20, January.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Buchanan, Joy & Hickman, William, 2024. "Do people trust humans more than ChatGPT?," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 112(C).
Sokbae Lee & Ryo Okui & Yoonâ€ Jae Whang, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 32(7), pages 1207-1225, November.
- Sokbae Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly Robust Uniform Confidence Band for the Conditional Average Treatment Effect Function," Papers 1601.02801, arXiv.org, revised Oct 2016.
- Lee, Sokbae & Okui, Ryo & Whang, Yoon-Jae, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," LSE Research Online Documents on Economics 86852, London School of Economics and Political Science, LSE Library.
- Sokbae (Simon) Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly robust uniform confidence band for the conditional average treatment effect function," CeMMAP working papers 03/16, Institute for Fiscal Studies.
- Sokbae Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly Robust Uniform Confidence Band For The Conditional Average Treatment Effect Function," KIER Working Papers 931, Kyoto University, Institute of Economic Research.
- Sokbae (Simon) Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly robust uniform confidence band for the conditional average treatment effect function," CeMMAP working papers CWP03/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Difang Huang & Jiti Gao & Tatsushi Oka, 2025. "Semiparametric single-index estimation for average treatment effects," Econometric Reviews, Taylor & Francis Journals, vol. 44(6), pages 843-885, July.
- Difang Huang & Jiti Gao & Tatsushi Oka, 2022. "Semiparametric Single-Index Estimation for Average Treatment Effects," Papers 2206.08503, arXiv.org, revised Jan 2025.
- Difang Huang & Jiti Gao & Tatsushi Oka, 2022. "Semiparametric Single-Index Estimation for Average Treatment Effects," Monash Econometrics and Business Statistics Working Papers 10/22, Monash University, Department of Econometrics and Business Statistics.
Chen, Le-Yu & Lee, Sokbae, 2023. "Sparse quantile regression," Journal of Econometrics, Elsevier, vol. 235(2), pages 2195-2217.
- Le-Yu Chen & Sokbae Lee, 2020. "Sparse Quantile Regression," Papers 2006.11201, arXiv.org, revised Mar 2023.
- Le-Yu Chen & Sokbae (Simon) Lee, 2020. "Sparse Quantile Regression," CeMMAP working papers CWP30/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Jianxuan Liu & Yanyuan Ma & Lan Wang, 2018. "An alternative robust estimator of average treatment effect in causal inference," Biometrics, The International Biometric Society, vol. 74(3), pages 910-923, September.
Xu, Wenfu & Tan, Zhiqiang, 2024. "High-dimensional model-assisted inference for treatment effects with multi-valued treatments," Journal of Econometrics, Elsevier, vol. 244(1).
Riccardo Di Francesco, 2024. "Aggregation Trees," Papers 2410.11408, arXiv.org, revised Oct 2025.
Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.
- Pedro H. C. Sant'Anna & Jun B. Zhao, 2018. "Doubly Robust Difference-in-Differences Estimators," Papers 1812.01723, arXiv.org, revised May 2020.
Stojčić, Nebojša, 2021. "Social and private outcomes of green innovation incentives in European advancing economies," Technovation, Elsevier, vol. 104(C).
Ay, Jean-Sauveur & Le Gallo, Julie, 2021. "The Signaling Values of Nested Wine Names," Working Papers 321851, American Association of Wine Economists.
- Jean-Sauveur Ay & Julie Le Gallo, 2021. "The signaling value of nested wine names," Post-Print hal-03268014, HAL.
Katie Meara & Francesco Pastore & Allan Webster, 2020. "The gender pay gap in the USA: a matching study," Journal of Population Economics, Springer;European Society for Population Economics, vol. 33(1), pages 271-305, January.
- Meara, Katie & Pastore, Francesco & Webster, Allan, 2019. "The Gender Pay Gap in the US: A Matching Study," GLO Discussion Paper Series 363, Global Labor Organization (GLO).
repec:cdl:econwp:qt565889qz is not listed on IDEAS
Michael Situ & Ute I. Schwarz & Guangyong Zou & Eric McArthur & Richard B. Kim & Amit X. Garg & Sisira Sarma, 2024. "Does prescribing apixaban or rivaroxaban versus warfarin for patients diagnosed with atrial fibrillation save health system costs? A multivalued treatment effects analysis," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 25(3), pages 397-409, April.
Shaibu Baanni Azumah & Abraham Zakaria & Rosaine N. Yegbemey & Philips A. Apalogta & Vishal Dagar & Abass Mahama, 2022. "Climate Smart Production, Gross Income, and Downstream Risk Characterization of Rice Farmers in Ghana," Journal of Agricultural Studies, Macrothink Institute, vol. 10(2), pages 13-35, June.
Hong, Yan-Zhen & Chang, Hung-Hao & Dai, Yong-Wu, 2018. "Is deregulation of forest land use rights transactions associated with economic well-being and labor allocation of farm households? Empirical evidence in China," Land Use Policy, Elsevier, vol. 75(C), pages 694-701.
Charles Godfred Ackah & Robert Darko Osei & Nana Yaw Agyeman Owusu & Vera Acheampong, 2025. "Special Economic Zones and Household Welfare: New Evidence from Ghana," International Journal of the Economics of Business, Taylor & Francis Journals, vol. 32(3), pages 297-319, September.
Oyenubi, Adeola & Kollamparambil, Umakrishnan, 2023. "Does noncompliance with COVID-19 regulations impact the depressive symptoms of others?," Economic Modelling, Elsevier, vol. 120(C).
Michael C. Knaus, 2021. "A double machine learning approach to estimate the effects of musical practice on student’s skills," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 282-300, January.
- Knaus, Michael C., 2018. "A Double Machine Learning Approach to Estimate the Effects of Musical Practice on Student's Skills," IZA Discussion Papers 11547, Institute of Labor Economics (IZA).
- Michael C. Knaus, 2018. "A Double Machine Learning Approach to Estimate the Effects of Musical Practice on Student's Skills," Papers 1805.10300, arXiv.org, revised Jan 2019.
Nicolas Moreau, 2018. "A SAS macro to estimate Average Treatment Effects with Propensity Score Matching," Working Papers hal-01691528, HAL.
Wei Huang & Oliver Linton & Zheng Zhang, 2022. "A Unified Framework for Specification Tests of Continuous Treatment Effect Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(4), pages 1817-1830, October.
- Wei Huang & Oliver Linton & Zheng Zhang, 2021. "A Unified Framework for Specification Tests of Continuous Treatment Effect Models," Papers 2102.08063, arXiv.org, revised Sep 2021.
- Huang, W. & Linton, O. & Zhang, Z., 2021. "A Unified Framework for Specification Tests of Continuous Treatment Effect Models," Cambridge Working Papers in Economics 2113, Faculty of Economics, University of Cambridge.
Ma, Wanglin & Vatsa, Puneet & Zheng, Hongyun, 2022. "Cooking fuel choices and subjective well-being in rural China: Implications for a complete energy transition," Energy Policy, Elsevier, vol. 165(C).

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2025-06-09 (Artificial Intelligence)
NEP-BIG-2025-06-09 (Big Data)
NEP-CMP-2025-06-09 (Computational Economics)
NEP-ECM-2025-06-09 (Econometrics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2506.00856. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data