IDEAS home Printed from https://ideas.repec.org/a/sae/anname/v659y2015i1p78-94.html
   My bibliography  Save this article

Data-Driven Content Analysis of Social Media

Author

Listed:
  • H. Andrew Schwartz
  • Lyle H. Ungar

Abstract

Researchers have long measured people’s thoughts, feelings, and personalities using carefully designed survey questions, which are often given to a relatively small number of volunteers. The proliferation of social media, such as Twitter and Facebook, offers alternative measurement approaches: automatic content coding at unprecedented scales and the statistical power to do open-vocabulary exploratory analysis. We describe a range of automatic and partially automatic content analysis techniques and illustrate how their use on social media generates insights into subjective well-being, health, gender differences, and personality.

Suggested Citation

  • H. Andrew Schwartz & Lyle H. Ungar, 2015. "Data-Driven Content Analysis of Social Media," The ANNALS of the American Academy of Political and Social Science, , vol. 659(1), pages 78-94, May.
  • Handle: RePEc:sae:anname:v:659:y:2015:i:1:p:78-94
    DOI: 10.1177/0002716215569197
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/0002716215569197
    Download Restriction: no

    File URL: https://libkey.io/10.1177/0002716215569197?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
    2. Monroe, Burt L. & Colaresi, Michael P. & Quinn, Kevin M., 2008. "Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict," Political Analysis, Cambridge University Press, vol. 16(4), pages 372-403.
    3. Jeremy Ginsberg & Matthew H. Mohebbi & Rajan S. Patel & Lynnette Brammer & Mark S. Smolinski & Larry Brilliant, 2009. "Detecting influenza epidemics using search engine query data," Nature, Nature, vol. 457(7232), pages 1012-1014, February.
    4. Laver, Michael & Benoit, Kenneth & Garry, John, 2003. "Extracting Policy Positions from Political Texts Using Words as Data," American Political Science Review, Cambridge University Press, vol. 97(2), pages 311-331, May.
    5. Kramer, Gerald H., 1983. "The Ecological Fallacy Revisited: Aggregate- versus Individual-level Findings on Economics and Elections, and Sociotropic Voting," American Political Science Review, Cambridge University Press, vol. 77(1), pages 92-111, March.
    6. Berinsky, Adam J. & Huber, Gregory A. & Lenz, Gabriel S., 2012. "Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk," Political Analysis, Cambridge University Press, vol. 20(3), pages 351-368, July.
    7. H Andrew Schwartz & Johannes C Eichstaedt & Margaret L Kern & Lukasz Dziurzynski & Stephanie M Ramones & Megha Agrawal & Achal Shah & Michal Kosinski & David Stillwell & Martin E P Seligman & Lyle H U, 2013. "Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-16, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Martin Haselmayer & Marcelo Jenny, 2017. "Sentiment analysis of political communication: combining a dictionary approach with crowdcoding," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(6), pages 2623-2646, November.
    2. Gavin Abercrombie & Riza Batista-Navarro, 2020. "Sentiment and position-taking analysis of parliamentary debates: a systematic literature review," Journal of Computational Social Science, Springer, vol. 3(1), pages 245-270, April.
    3. Weiss, Max & Zoorob, Michael, 2021. "Political frames of public health crises: Discussing the opioid epidemic in the US Congress," Social Science & Medicine, Elsevier, vol. 281(C).
    4. Greene, Zac & Ceron, Andrea & Schumacher, Gijs & Fazekas, Zoltan, 2016. "The Nuts and Bolts of Automated Text Analysis. Comparing Different Document Pre-Processing Techniques in Four Countries," OSF Preprints ghxj8, Center for Open Science.
    5. Salvatore Giorgi & David B. Yaden & Johannes C. Eichstaedt & Robert D. Ashford & Anneke E.K. Buffone & H. Andrew Schwartz & Lyle H. Ungar & Brenda Curtis, 2020. "Cultural Differences in Tweeting about Drinking Across the US," IJERPH, MDPI, vol. 17(4), pages 1-14, February.
    6. H Andrew Schwartz & Johannes C Eichstaedt & Margaret L Kern & Lukasz Dziurzynski & Stephanie M Ramones & Megha Agrawal & Achal Shah & Michal Kosinski & David Stillwell & Martin E P Seligman & Lyle H U, 2013. "Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-16, September.
    7. Kostovicova Denisa & Kerr Rachel & Sokolić Ivor & Fairey Tiffany & Redwood Henry & Subotić Jelena, 2022. "The “Digital Turn” in Transitional Justice Research: Evaluating Image and Text as Data in the Western Balkans," Comparative Southeast European Studies, De Gruyter, vol. 70(1), pages 24-46, March.
    8. Pierre-Marc Daigneault & Dominic Duval & Louis M. Imbeau, 2018. "Supervised scaling of semi-structured interview transcripts to characterize the ideology of a social policy reform," Quality & Quantity: International Journal of Methodology, Springer, vol. 52(5), pages 2151-2162, September.
    9. Seraphine F. Maerz & Carsten Q. Schneider, 2020. "Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(2), pages 517-545, April.
    10. Rebecca Cordell & Kristian Skrede Gleditsch & Florian G Kern & Laura Saavedra-Lux, 2020. "Measuring institutional variation across American Indian constitutions using automated content analysis," Journal of Peace Research, Peace Research Institute Oslo, vol. 57(6), pages 777-788, November.
    11. van Loon, Austin, 2022. "Three Families of Automated Text Analysis," SocArXiv htnej, Center for Open Science.
    12. Elliott Ash & Germain Gauthier & Philine Widmer, 2021. "RELATIO: Text Semantics Capture Political and Economic Narratives," Papers 2108.01720, arXiv.org, revised Apr 2022.
    13. Born, Andreas & Janssen, Aljoscha, 2020. "Does a District-Vote Matter for the Behavior of Politicians? A Textual Analysis of Parliamentary Speeches," Working Paper Series 1320, Research Institute of Industrial Economics.
    14. Sanders James & Lisi Giulio & Schonhardt-Bailey Cheryl, 2017. "Themes and Topics in Parliamentary Oversight Hearings: A New Direction in Textual Data Analysis," Statistics, Politics and Policy, De Gruyter, vol. 8(2), pages 153-194, December.
    15. Soojin Oh Park & Nail Hassairi, 2021. "What predicts legislative success of early care and education policies?: Applications of machine learning and Natural Language Processing in a cross-state early childhood policy analysis," PLOS ONE, Public Library of Science, vol. 16(2), pages 1-36, February.
    16. Brenda Curtis & Salvatore Giorgi & Anneke E K Buffone & Lyle H Ungar & Robert D Ashford & Jessie Hemmons & Dan Summers & Casey Hamilton & H Andrew Schwartz, 2018. "Can Twitter be used to predict county excessive alcohol consumption rates?," PLOS ONE, Public Library of Science, vol. 13(4), pages 1-16, April.
    17. Alexander Herzog & Slava Mikhaylov, 2010. "A new Database of Parliamentary Debates in Ireland, 1922--2008," The Institute for International Integration Studies Discussion Paper Series iiisdp338, IIIS, revised Jul 2010.
    18. Stefano Pagliari & Meredith Wilf, 2021. "Regulatory novelty after financial crises: Evidence from international banking and securities standards, 1975–2016," Regulation & Governance, John Wiley & Sons, vol. 15(3), pages 933-951, July.
    19. Sanders, James & Lisi, Giulio & Schonhardt-Bailey, Cheryl, 2018. "Themes and topics in parliamentary oversight hearings: a new direction in textual data analysis," LSE Research Online Documents on Economics 87624, London School of Economics and Political Science, LSE Library.
    20. Rybinski, Krzysztof, 2020. "The forecasting power of the multi-language narrative of sell-side research: A machine learning evaluation," Finance Research Letters, Elsevier, vol. 34(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:anname:v:659:y:2015:i:1:p:78-94. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.