IDEAS home Printed from https://ideas.repec.org/a/wly/amposc/v54y2010i1p229-247.html
   My bibliography  Save this article

A Method of Automated Nonparametric Content Analysis for Social Science

Author

Listed:
  • Daniel J. Hopkins
  • Gary King

Abstract

The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.

Suggested Citation

  • Daniel J. Hopkins & Gary King, 2010. "A Method of Automated Nonparametric Content Analysis for Social Science," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 229-247, January.
  • Handle: RePEc:wly:amposc:v:54:y:2010:i:1:p:229-247
    DOI: 10.1111/j.1540-5907.2009.00428.x
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/j.1540-5907.2009.00428.x
    Download Restriction: no

    File URL: https://libkey.io/10.1111/j.1540-5907.2009.00428.x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Simon, Adam F. & Xenos, Michael, 2004. "Dimensional Reduction of Word-Frequency Data as a Substitute for Intersubjective Content Analysis," Political Analysis, Cambridge University Press, vol. 12(1), pages 63-75, January.
    2. Laver, Michael & Benoit, Kenneth & Garry, John, 2003. "Extracting Policy Positions from Political Texts Using Words as Data," American Political Science Review, Cambridge University Press, vol. 97(2), pages 311-331, May.
    3. King, Gary & Lowe, Will, 2003. "An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design," International Organization, Cambridge University Press, vol. 57(3), pages 617-642, July.
    4. King, Gary & Zeng, Langche, 2006. "The Dangers of Extreme Counterfactuals," Political Analysis, Cambridge University Press, vol. 14(2), pages 131-159, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Scharkow, 2013. "Thematic content analysis using supervised machine learning: An empirical evaluation using German online news," Quality & Quantity: International Journal of Methodology, Springer, vol. 47(2), pages 761-773, February.
    2. Gianluca Vagnani & Michele Simoni, 2016. "Technological uncertainty, market orientation and firms? economic performance," MERCATI & COMPETITIVIT?, FrancoAngeli Editore, vol. 2016(2), pages 143-167.
    3. Kevin M. Quinn & Burt L. Monroe & Michael Colaresi & Michael H. Crespin & Dragomir R. Radev, 2010. "How to Analyze Political Attention with Minimal Assumptions and Costs," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 209-228, January.
    4. Alan Brier & Bruno Hopp, 2011. "Computer assisted text analysis in the social sciences," Quality & Quantity: International Journal of Methodology, Springer, vol. 45(1), pages 103-128, January.
    5. Armèn Hakhverdian, 2009. "Capturing Government Policy on the Left–Right Scale: Evidence from the United Kingdom, 1956–2006," Political Studies, Political Studies Association, vol. 57(4), pages 720-745, December.
    6. Bono, Pierre-Henri & David, Quentin & Desbordes, Rodolphe & Py, Loriane, 2022. "Metro infrastructure and metropolitan attractiveness," Regional Science and Urban Economics, Elsevier, vol. 93(C).
    7. Simon Hug & Tobias Schulz, 2007. "Referendums in the EU’s constitution building process," The Review of International Organizations, Springer, vol. 2(2), pages 177-218, June.
    8. Sheng, Yu & Xu, Xinpeng, 2019. "The productivity impact of climate change: Evidence from Australia's Millennium drought," Economic Modelling, Elsevier, vol. 76(C), pages 182-191.
    9. Pongsak Luangaram & Yuthana Sethapramote, 2016. "Central Bank Communication and Monetary Policy Effectiveness: Evidence from Thailand," PIER Discussion Papers 20, Puey Ungphakorn Institute for Economic Research.
    10. Sam R. Bell & K. Chad Clay & Amanda Murdie, 2019. "Join the Chorus, Avoid the Spotlight: The Effect of Neighborhood and Social Dynamics on Human Rights Organization Shaming," Journal of Conflict Resolution, Peace Science Society (International), vol. 63(1), pages 167-193, January.
    11. Rybinski, Krzysztof, 2020. "The forecasting power of the multi-language narrative of sell-side research: A machine learning evaluation," Finance Research Letters, Elsevier, vol. 34(C).
    12. Dennis Shen & Peng Ding & Jasjeet Sekhon & Bin Yu, 2022. "Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data," Papers 2207.14481, arXiv.org, revised Oct 2022.
    13. Desbordes, Rodolphe & Vicard, Vincent, 2009. "Foreign direct investment and bilateral investment treaties: An international political perspective," Journal of Comparative Economics, Elsevier, vol. 37(3), pages 372-386, September.
    14. Mellace, Giovanni & Ventura, Marco, 2019. "Intended and unintended effects of public incentives for innovation. Quasi-experimental evidence from Italy," Discussion Papers on Economics 9/2019, University of Southern Denmark, Department of Economics.
    15. Hamza Bennani, 2012. "National influences inside the ECB: an assessment from central bankers' statements," Working Papers hal-00992646, HAL.
    16. Kenneth Benoit & Michael Laver & Christine Arnold & Paul Pennings & Madeleine O. Hosli, 2005. "Measuring National Delegate Positions at the Convention on the Future of Europe Using Computerized Word Scoring," European Union Politics, , vol. 6(3), pages 291-313, September.
    17. Weiss, Max & Zoorob, Michael, 2021. "Political frames of public health crises: Discussing the opioid epidemic in the US Congress," Social Science & Medicine, Elsevier, vol. 281(C).
    18. Yang, Chao & Huang, Cui, 2022. "Quantitative mapping of the evolution of AI policy distribution, targets and focuses over three decades in China," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    19. Matthew Gentzkow & Jesse M. Shapiro & Matt Taddy, 2019. "Measuring Group Differences in High‐Dimensional Choices: Method and Application to Congressional Speech," Econometrica, Econometric Society, vol. 87(4), pages 1307-1340, July.
    20. William D. Berry & Jacqueline H. R. DeMeritt & Justin Esarey, 2010. "Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential?," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 248-266, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:amposc:v:54:y:2010:i:1:p:229-247. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://doi.org/10.1111/(ISSN)1540-5907 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.