IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2506.00856.html
   My bibliography  Save this paper

Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks

Author

Listed:
  • Qiang Chen
  • Tianyang Han
  • Jin Li
  • Ye Luo
  • Yuxiao Wu
  • Xiaowei Zhang
  • Tuo Zhou

Abstract

Can AI effectively perform complex econometric analysis traditionally requiring human expertise? This paper evaluates AI agents' capability to master econometrics, focusing on empirical analysis performance. We develop an ``Econometrics AI Agent'' built on the open-source MetaGPT framework. This agent exhibits outstanding performance in: (1) planning econometric tasks strategically, (2) generating and executing code, (3) employing error-based reflection for improved robustness, and (4) allowing iterative refinement through multi-round conversations. We construct two datasets from academic coursework materials and published research papers to evaluate performance against real-world challenges. Comparative testing shows our domain-specialized AI agent significantly outperforms both benchmark large language models (LLMs) and general-purpose AI agents. This work establishes a testbed for exploring AI's impact on social science research and enables cost-effective integration of domain expertise, making advanced econometric methods accessible to users with minimal coding skills. Furthermore, our AI agent enhances research reproducibility and offers promising pedagogical applications for econometrics teaching.

Suggested Citation

  • Qiang Chen & Tianyang Han & Jin Li & Ye Luo & Yuxiao Wu & Xiaowei Zhang & Tuo Zhou, 2025. "Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks," Papers 2506.00856, arXiv.org, revised Jun 2025.
  • Handle: RePEc:arx:papers:2506.00856
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2506.00856
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Erik Brynjolfsson & Danielle Li & Lindsey Raymond, 2025. "Generative AI at Work," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 140(2), pages 889-942.
    2. Margarita Leib & Nils Köbis & Rainer Michael Rilke & Marloes Hagens & Bernd Irlenbusch, 2023. "Corrupted by Algorithms? How AI-generated and Human-written Advice Shape (Dis)honesty," ECONtribute Discussion Papers Series 251, University of Bonn and University of Cologne, Germany.
    3. Douglas Almond & Kenneth Y. Chay & David S. Lee, 2005. "The Costs of Low Birth Weight," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 120(3), pages 1031-1083.
    4. Gorodnichenko, Yuriy & Pham, Tho & Talavera, Oleksandr, 2025. "Central bank communication on social media: What, to whom, and how?," Journal of Econometrics, Elsevier, vol. 249(PC).
    5. Ahrens, Maximilian & Erdemlioglu, Deniz & McMahon, Michael & Neely, Christopher J. & Yang, Xiye, 2025. "Mind your language: Market responses to central bank speeches," Journal of Econometrics, Elsevier, vol. 249(PC).
    6. Margarita Leib & Nils Kobis & Rainer Michael Rilke & Marloes Hagens & Bernd Irlenbusch, 2023. "Corrupted by Algorithms? How AI-generated and Human-written Advice Shape (Dis)honesty," Papers 2301.01954, arXiv.org.
    7. Card, David & Sullivan, Daniel G, 1988. "Measuring the Effect of Subsidized Training Programs on Movements in and out of Employment," Econometrica, Econometric Society, vol. 56(3), pages 497-530, May.
    8. Leib, Margarita & Köbis, Nils & Rilke, Rainer Michael & Hagens, Marloes & Irlenbusch, Bernd, 2023. "Corrupted by Algorithms? How AI-Generated and Human-Written Advice Shape (Dis)Honesty," IZA Discussion Papers 16293, Institute of Labor Economics (IZA).
    9. Cattaneo, Matias D., 2010. "Efficient semiparametric estimation of multi-valued treatment effects under ignorability," Journal of Econometrics, Elsevier, vol. 155(2), pages 138-154, April.
    10. Anton Korinek, 2023. "Generative AI for Economic Research: Use Cases and Implications for Economists," Journal of Economic Literature, American Economic Association, vol. 61(4), pages 1281-1317, December.
    11. Noailly, Joëlle & Nowzohour, Laura & van den Heuvel, Matthias & Pla, Ireneu, 2024. "Heard the news? Environmental policy and clean investments," Journal of Public Economics, Elsevier, vol. 238(C).
    12. Niu, Yimeng & Wu, Jing & Jiang, Shenyang & Jiang, Zhibin, 2025. "The bullwhip effect in servitized manufacturers," Other publications TiSEM 638f661c-295a-4ada-8c33-a, Tilburg University, School of Economics and Management.
    13. Ali Goli & Amandeep Singh, 2024. "Frontiers: Can Large Language Models Capture Human Preferences?," Marketing Science, INFORMS, vol. 43(4), pages 709-722, July.
    14. Thomas Graeber & Christopher Roth & Florian Zimmermann, 2024. "Stories, Statistics, and Memory," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 139(4), pages 2181-2225.
    15. Peiyao Li & Noah Castelo & Zsolt Katona & Miklos Sarvary, 2024. "Frontiers: Determining the Validity of Large Language Models for Automated Perceptual Analysis," Marketing Science, INFORMS, vol. 43(2), pages 254-266, March.
    16. Anderson, Michael, 2008. "Safety for whom? The effects of light trucks on traffic fatalities," Journal of Health Economics, Elsevier, vol. 27(4), pages 973-989, July.
    17. Yang Chen & Samuel N. Kirshner & Anton Ovchinnikov & Meena Andiappan & Tracy Jenkin, 2025. "A Manager and an AI Walk into a Bar: Does ChatGPT Make Biased Decisions Like We Do?," Manufacturing & Service Operations Management, INFORMS, vol. 27(2), pages 354-368, March.
    18. repec:cdl:agrebk:qt1mb4m7qz is not listed on IDEAS
    19. Ishita Chakraborty & Khai Chiong & Howard Dover & K. Sudhir, 2025. "Can AI and AI-Hybrids Detect Persuasion Skills? Salesforce Hiring with Conversational Video Interviews," Marketing Science, INFORMS, vol. 44(1), pages 30-53, January.
    20. Gmyrek, Pawel, & Lutz, Christoph, & Newlands, Gemma,, 2024. "A Technological Construction of Society comparing GPT-4 and human respondents for occupational evaluation in the UK," ILO Working Papers 995347793502676, International Labour Organization.
    21. Bryan K. Stroube & David M. Waguespack, 2024. "Status and consensus: Heterogeneity in audience evaluations of female‐ versus male‐lead films," Strategic Management Journal, Wiley Blackwell, vol. 45(5), pages 994-1024, May.
    22. Matias D. Cattaneo, 2010. "multi-valued treatment effects," The New Palgrave Dictionary of Economics,, Palgrave Macmillan.
    23. Yimeng Niu & Jing Wu & Shenyang Jiang & Zhibin Jiang, 2025. "The Bullwhip Effect in Servitized Manufacturers," Management Science, INFORMS, vol. 71(1), pages 1-20, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Buchanan, Joy & Hickman, William, 2024. "Do people trust humans more than ChatGPT?," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 112(C).
    2. Sokbae Lee & Ryo Okui & Yoon†Jae Whang, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 32(7), pages 1207-1225, November.
    3. Difang Huang & Jiti Gao & Tatsushi Oka, 2025. "Semiparametric single-index estimation for average treatment effects," Econometric Reviews, Taylor & Francis Journals, vol. 44(6), pages 843-885, July.
    4. Chen, Le-Yu & Lee, Sokbae, 2023. "Sparse quantile regression," Journal of Econometrics, Elsevier, vol. 235(2), pages 2195-2217.
    5. Jianxuan Liu & Yanyuan Ma & Lan Wang, 2018. "An alternative robust estimator of average treatment effect in causal inference," Biometrics, The International Biometric Society, vol. 74(3), pages 910-923, September.
    6. Xu, Wenfu & Tan, Zhiqiang, 2024. "High-dimensional model-assisted inference for treatment effects with multi-valued treatments," Journal of Econometrics, Elsevier, vol. 244(1).
    7. Riccardo Di Francesco, 2024. "Aggregation Trees," Papers 2410.11408, arXiv.org, revised Oct 2025.
    8. Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.
    9. Stojčić, Nebojša, 2021. "Social and private outcomes of green innovation incentives in European advancing economies," Technovation, Elsevier, vol. 104(C).
    10. Ay, Jean-Sauveur & Le Gallo, Julie, 2021. "The Signaling Values of Nested Wine Names," Working Papers 321851, American Association of Wine Economists.
    11. Katie Meara & Francesco Pastore & Allan Webster, 2020. "The gender pay gap in the USA: a matching study," Journal of Population Economics, Springer;European Society for Population Economics, vol. 33(1), pages 271-305, January.
    12. repec:cdl:econwp:qt565889qz is not listed on IDEAS
    13. Michael Situ & Ute I. Schwarz & Guangyong Zou & Eric McArthur & Richard B. Kim & Amit X. Garg & Sisira Sarma, 2024. "Does prescribing apixaban or rivaroxaban versus warfarin for patients diagnosed with atrial fibrillation save health system costs? A multivalued treatment effects analysis," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 25(3), pages 397-409, April.
    14. Shaibu Baanni Azumah & Abraham Zakaria & Rosaine N. Yegbemey & Philips A. Apalogta & Vishal Dagar & Abass Mahama, 2022. "Climate Smart Production, Gross Income, and Downstream Risk Characterization of Rice Farmers in Ghana," Journal of Agricultural Studies, Macrothink Institute, vol. 10(2), pages 13-35, June.
    15. Hong, Yan-Zhen & Chang, Hung-Hao & Dai, Yong-Wu, 2018. "Is deregulation of forest land use rights transactions associated with economic well-being and labor allocation of farm households? Empirical evidence in China," Land Use Policy, Elsevier, vol. 75(C), pages 694-701.
    16. Charles Godfred Ackah & Robert Darko Osei & Nana Yaw Agyeman Owusu & Vera Acheampong, 2025. "Special Economic Zones and Household Welfare: New Evidence from Ghana," International Journal of the Economics of Business, Taylor & Francis Journals, vol. 32(3), pages 297-319, September.
    17. Oyenubi, Adeola & Kollamparambil, Umakrishnan, 2023. "Does noncompliance with COVID-19 regulations impact the depressive symptoms of others?," Economic Modelling, Elsevier, vol. 120(C).
    18. Michael C. Knaus, 2021. "A double machine learning approach to estimate the effects of musical practice on student’s skills," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 282-300, January.
    19. Nicolas Moreau, 2018. "A SAS macro to estimate Average Treatment Effects with Propensity Score Matching," Working Papers hal-01691528, HAL.
    20. Wei Huang & Oliver Linton & Zheng Zhang, 2022. "A Unified Framework for Specification Tests of Continuous Treatment Effect Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(4), pages 1817-1830, October.
    21. Ma, Wanglin & Vatsa, Puneet & Zheng, Hongyun, 2022. "Cooking fuel choices and subjective well-being in rural China: Implications for a complete energy transition," Energy Policy, Elsevier, vol. 165(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2506.00856. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.