IDEAS home Printed from https://ideas.repec.org/p/uea/wcbess/24-01.html
   My bibliography  Save this paper

Using Large Language Models for Text Classification in Experimental Economics

Author

Listed:
  • Can Celebi

    (University of Mannheim)

  • Stefan Penczynski

    (School of Economics and Centre for Behavioural and Experimental Social Science, University of East Anglia)

Abstract

In our study, we compare the classification capabilities of GPT-3.5 and GPT-4 with human annotators using text data from economic experiments. We analysed four text corpora, focusing on two domains: promises and strategic reasoning. Starting with prompts close to those given to human annotators, we subsequently explored alternative prompts to investigate the effect of varying classification instructions and degrees of background information on the models' classification performance. Additionally, we varied the number of examples in a prompt (few-shot vs zero-shot) and the use of the zero-shot "Chain of Thought" prompting technique. Our findings show that GPT-4's performance is comparable to human annotators, achieving accuracy levels near or over 90% in three tasks, and in the most challenging task of classifying strategic thinking in asymmetric coordination games, it reaches an accuracy level above 70%.

Suggested Citation

  • Can Celebi & Stefan Penczynski, 2024. "Using Large Language Models for Text Classification in Experimental Economics," Working Paper series, University of East Anglia, Centre for Behavioural and Experimental Social Science (CBESS) 24-01, School of Economics, University of East Anglia, Norwich, UK..
  • Handle: RePEc:uea:wcbess:24-01
    as

    Download full text from publisher

    File URL: https://ueaeco.github.io/working-papers/papers/cbess/UEA-CBESS-24-01.pdf
    File Function: main text
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Guarnaschelli, Serena & McKelvey, Richard D. & Palfrey, Thomas R., 2000. "An Experimental Study of Jury Decision Rules," American Political Science Review, Cambridge University Press, vol. 94(2), pages 407-423, June.
    2. Paul Glasserman & Caden Lin, 2023. "Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis," Papers 2309.17322, arXiv.org.
    3. David J. Cooper & John H. Kagel, 2005. "Are Two Heads Better Than One? Team versus Individual Play in Signaling Games," American Economic Review, American Economic Association, vol. 95(3), pages 477-509, June.
    4. Alonso-Robisco, Andres & Carbó, José Manuel, 2023. "Analysis of CBDC narrative by central banks using large language models," Finance Research Letters, Elsevier, vol. 58(PC).
    5. Huseyn Ismayilov & Jan Potters, 2016. "Why do promises affect trustworthiness, or do they?," Experimental Economics, Springer;Economic Science Association, vol. 19(2), pages 382-393, June.
    6. Smales, Lee A., 2023. "Classification of RBA monetary policy announcements using ChatGPT," Finance Research Letters, Elsevier, vol. 58(PC).
    7. Daniel Houser & Erte Xiao, 2011. "Classification of natural language messages using a coordination game," Experimental Economics, Springer;Economic Science Association, vol. 14(1), pages 1-14, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Simeon Schudy & Susanna Grundmann & Lisa Spantig, 2024. "Individual Preferences for Truth-Telling," CESifo Working Paper Series 11521, CESifo.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Julian Junyan Wang & Victor Xiaoqi Wang, 2025. "Assessing Consistency and Reproducibility in the Outputs of Large Language Models: Evidence Across Diverse Finance and Accounting Tasks," Papers 2503.16974, arXiv.org, revised Sep 2025.
    2. Dong, Mengming Michael & Stratopoulos, Theophanis C. & Wang, Victor Xiaoqi, 2024. "A scoping review of ChatGPT research in accounting and finance," International Journal of Accounting Information Systems, Elsevier, vol. 55(C).
    3. Zakaria Babutsidze & Nobuyuki Hanaki & Adam Zylbersztejn, 2019. "Digital Communication and Swift Trust," Post-Print halshs-02409314, HAL.
    4. Jiabin Wu, 2018. "Indirect higher order beliefs and cooperation," Experimental Economics, Springer;Economic Science Association, vol. 21(4), pages 858-876, December.
    5. Baethge, Caroline, 2016. "Performance in the beauty contest: How strategic discussion enhances team reasoning," Passauer Diskussionspapiere, Betriebswirtschaftliche Reihe B-17-16, University of Passau, Faculty of Business and Economics.
    6. Zakaria Babutsidze & Nobuyuki Hanaki & Adam Zylbersztejn, 2021. "Nonverbal content and trust: An experiment on digital communication," Economic Inquiry, Western Economic Association International, vol. 59(4), pages 1517-1532, October.
    7. Xiangdong Qin & Siyu Wang & Mike Zhiren Wu, 2024. "Is it what you say or how you say it?," Experimental Economics, Springer;Economic Science Association, vol. 27(4), pages 874-921, September.
    8. Tebbe, Eva & Wegener, Benjamin, 2022. "Is natural language processing the cheap charlie of analyzing cheap talk? A horse race between classifiers on experimental communication data," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 96(C).
    9. Ronald Bosman & Heike Hennig-Schmidt & Frans Winden, 2006. "Exploring group decision making in a power-to-take experiment," Experimental Economics, Springer;Economic Science Association, vol. 9(1), pages 35-51, April.
    10. Ito, Arata & Sato, Masahiro & Ota, Rui, 2025. "A novel content-based approach to measuring monetary policy uncertainty using fine-tuned LLMs," Finance Research Letters, Elsevier, vol. 75(C).
    11. van Dijk, Frans & Sonnemans, Joep & Bauw, Eddy, 2014. "Judicial error by groups and individuals," Journal of Economic Behavior & Organization, Elsevier, vol. 108(C), pages 224-235.
    12. Auerswald, Heike & Schmidt, Carsten & Thum, Marcel & Torsvik, Gaute, 2018. "Teams in a public goods experiment with punishment," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 72(C), pages 28-39.
    13. Benjamin Wegener, 2021. "How to Analyze Communication Data from Laboratory Experiments Without Being a Machine Learning Specialist," Journal of Economics and Behavioral Studies, AMH International, vol. 13(1), pages 32-56.
    14. Jacob K. Goeree & Leeat Yariv, 2009. "An experimental study of jury deliberation," IEW - Working Papers 438, Institute for Empirical Research in Economics - University of Zurich.
    15. Andres, Maximilian & Bruttel, Lisa & Friedrichsen, Jana, 2023. "How communication makes the difference between a cartel and tacit collusion: A machine learning approach," European Economic Review, Elsevier, vol. 152(C).
    16. Boris Ginzburg & JosÔøΩ-Alberto Guerra, 2017. "When Ignorance is Bliss: Theory and Experiment on Collective Learning," Documentos CEDE 15377, Universidad de los Andes, Facultad de Economía, CEDE.
    17. Nick Feltovich & Yasuyo Hamaguchi, 2018. "The Effect of Whistle‐Blowing Incentives on Collusion: An Experimental Study of Leniency Programs," Southern Economic Journal, John Wiley & Sons, vol. 84(4), pages 1024-1049, April.
    18. Elten, Jonas van & Penczynski, Stefan P., 2020. "Coordination games with asymmetric payoffs: An experimental study with intra-group communication," Journal of Economic Behavior & Organization, Elsevier, vol. 169(C), pages 158-188.
    19. David J. Cooper & Ian Krajbich & Charles N. Noussair, 2019. "Choice-Process Data in Experimental Economics," Journal of the Economic Science Association, Springer;Economic Science Association, vol. 5(1), pages 1-13, August.
    20. Zakaria Babutsidze & Nobuyuki Hanaki & Adam Zylbersztejn, 2020. "Nonverbal content and swift trust: An experiment on digital communication," Working Papers 2008, Groupe d'Analyse et de Théorie Economique Lyon St-Étienne (GATE Lyon St-Étienne), Université de Lyon.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:uea:wcbess:24-01. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Cara Liggins (email available below). General contact details of provider: https://edirc.repec.org/data/esueauk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.