IDEAS home Printed from https://ideas.repec.org/a/sae/somere/v50y2021i1p202-237.html
   My bibliography  Save this article

The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods

Author

Listed:
  • Laura K. Nelson
  • Derek Burk
  • Marcel Knudsen
  • Leslie McCall

Abstract

Advances in computer science and computational linguistics have yielded new, and faster, computational approaches to structuring and analyzing textual data. These approaches perform well on tasks like information extraction, but their ability to identify complex, socially constructed, and unsettled theoretical concepts—a central goal of sociological content analysis—has not been tested. To fill this gap, we compare the results produced by three common computer-assisted approaches—dictionary, supervised machine learning (SML), and unsupervised machine learning—to those produced through a rigorous hand-coding analysis of inequality in the news ( N = 1,253 articles). Although we find that SML methods perform best in replicating hand-coded results, we document and clarify the strengths and weaknesses of each approach, including how they can complement one another. We argue that content analysts in the social sciences would do well to keep all these approaches in their toolkit, deploying them purposefully according to the task at hand.

Suggested Citation

  • Laura K. Nelson & Derek Burk & Marcel Knudsen & Leslie McCall, 2021. "The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods," Sociological Methods & Research, , vol. 50(1), pages 202-237, February.
  • Handle: RePEc:sae:somere:v:50:y:2021:i:1:p:202-237
    DOI: 10.1177/0049124118769114
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/0049124118769114
    Download Restriction: no

    File URL: https://libkey.io/10.1177/0049124118769114?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Grimmer, Justin, 2010. "A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases," Political Analysis, Cambridge University Press, vol. 18(1), pages 1-35, January.
    2. Margaret Roberts & Brandon Stewart & Tingley, Dustin & Edoardo Airoldi, 2013. "The structural topic model and applied social science," Working Paper 132666, Harvard University OpenScholar.
    3. Daniel J. Hopkins & Gary King, 2010. "A Method of Automated Nonparametric Content Analysis for Social Science," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 229-247, January.
    4. King, Gary & Pan, Jennifer & Roberts, Margaret E., 2013. "How Censorship in China Allows Government Criticism but Silences Collective Expression," American Political Science Review, Cambridge University Press, vol. 107(2), pages 326-343, May.
    5. Kenneth Benoit & Michael Laver & Slava Mikhaylov, 2009. "Treating Words as Data with Error: Uncertainty in Text Statements of Policy Positions," American Journal of Political Science, John Wiley & Sons, vol. 53(2), pages 495-513, April.
    6. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bart Bonikowski & Yuchen Luo & Oscar Stuhler, 2022. "Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in U.S. Presidential Campaigns (1952–2020) with Neural Language Models," Sociological Methods & Research, , vol. 51(4), pages 1721-1787, November.
    2. Daniel Caballero-Julia & Philippe Campillo, 2021. "Epistemological Considerations of Text Mining: Implications for Systematic Literature Review," Mathematics, MDPI, vol. 9(16), pages 1-26, August.
    3. AJ Alvero & Jasmine Pal & Katelyn M. Moussavian, 2022. "Linguistic, cultural, and narrative capital: computational and human readings of transfer admissions essays," Journal of Computational Social Science, Springer, vol. 5(2), pages 1709-1734, November.
    4. Bernhardt, Lea & Dewenter, Ralf & Thomas, Tobias, 2023. "Measuring partisan media bias in US newscasts from 2001 to 2012," European Journal of Political Economy, Elsevier, vol. 78(C).
    5. Konstantin Gavras & Jan Karem Höhne & Annelies G. Blom & Harald Schoen, 2022. "Innovating the collection of open‐ended answers: The linguistic and content characteristics of written and oral answers to political attitude questions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(3), pages 872-890, July.
    6. Alex Luscombe & Kevin Dick & Kevin Walby, 2022. "Algorithmic thinking in the public interest: navigating technical, legal, and ethical hurdles to web scraping in the social sciences," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(3), pages 1023-1044, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    2. David Bholat & Stephen Hans & Pedro Santos & Cheryl Schonhardt-Bailey, 2015. "Text mining for central banks," Handbooks, Centre for Central Banking Studies, Bank of England, number 33, April.
    3. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    4. Rauh, Christian, 2018. "Validating a sentiment dictionary for German political language—a workbench note," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 15(4), pages 319-343.
    5. Martin Haselmayer & Marcelo Jenny, 2017. "Sentiment analysis of political communication: combining a dictionary approach with crowdcoding," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(6), pages 2623-2646, November.
    6. Arina Wischnewsky & David‐Jan Jansen & Matthias Neuenkirch, 2021. "Financial stability and the Fed: Evidence from congressional hearings," Economic Inquiry, Western Economic Association International, vol. 59(3), pages 1192-1214, July.
    7. Margaret E. Roberts & Brandon M. Stewart & Richard A. Nielsen, 2020. "Adjusting for Confounding with Text Matching," American Journal of Political Science, John Wiley & Sons, vol. 64(4), pages 887-903, October.
    8. Lehotský, Lukáš & Černoch, Filip & Osička, Jan & Ocelík, Petr, 2019. "When climate change is missing: Media discourse on coal mining in the Czech Republic," Energy Policy, Elsevier, vol. 129(C), pages 774-786.
    9. Greene, Zac & Ceron, Andrea & Schumacher, Gijs & Fazekas, Zoltan, 2016. "The Nuts and Bolts of Automated Text Analysis. Comparing Different Document Pre-Processing Techniques in Four Countries," OSF Preprints ghxj8, Center for Open Science.
    10. Alexander Herzog & Slava Mikhaylov, 2010. "Estimating Government Discretion in Fiscal Policy Making," The Institute for International Integration Studies Discussion Paper Series iiisdp339, IIIS, revised Jul 2010.
    11. Laura K. Nelson, 2020. "Computational Grounded Theory: A Methodological Framework," Sociological Methods & Research, , vol. 49(1), pages 3-42, February.
    12. Enwei Zhu & Jing Wu & Hongyu Liu & Keyang Li, 2023. "A Sentiment Index of the Housing Market in China: Text Mining of Narratives on Social Media," The Journal of Real Estate Finance and Economics, Springer, vol. 66(1), pages 77-118, January.
    13. Constantine Boussalis & Travis G. Coan & Mirya R. Holman, 2018. "Climate change communication from cities in the USA," Climatic Change, Springer, vol. 149(2), pages 173-187, July.
    14. Pádraig Carmody, 2011. "An Informationalised Economy in Africa? The Impact of New ICT on the Wood Products Industry in Durban, South Africa," The Institute for International Integration Studies Discussion Paper Series iiisdp383, IIIS.
    15. Andrea Ceron & Luigi Curini & Stefano M. Iacus, 2019. "ISIS at Its Apogee: The Arabic Discourse on Twitter and What We Can Learn From That About ISIS Support and Foreign Fighters," SAGE Open, , vol. 9(1), pages 21582440187, March.
    16. Alexander Herzog & Slava Mikhaylov, 2010. "A new Database of Parliamentary Debates in Ireland, 1922--2008," The Institute for International Integration Studies Discussion Paper Series iiisdp338, IIIS, revised Jul 2010.
    17. Rodrigo Zamith & Seth C. Lewis, 2015. "Content Analysis and the Algorithmic Coder," The ANNALS of the American Academy of Political and Social Science, , vol. 659(1), pages 307-318, May.
    18. Berk Wheelock, Lauren & Pachamanova, Dessislava A., 2022. "Acceptable set topic modeling," European Journal of Operational Research, Elsevier, vol. 299(2), pages 653-673.
    19. Leopoldo Fergusson & Carlos Molina, 2020. "Facebook Causes Protests," HiCN Working Papers 323, Households in Conflict Network.
    20. Gu, Chen & Kurov, Alexander & Wolfe, Marketa Halova, 2018. "Relief Rallies after FOMC Announcements as a Resolution of Uncertainty," Journal of Empirical Finance, Elsevier, vol. 49(C), pages 1-18.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:somere:v:50:y:2021:i:1:p:202-237. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.