IDEAS home Printed from https://ideas.repec.org/a/spr/soinre/v169y2023i1d10.1007_s11205-023-03147-0.html
   My bibliography  Save this article

A Natural Language Processing Analysis of Newspapers Coverage of Hong Kong Protests Between 1998 and 2020

Author

Listed:
  • Giovanna Maria Dora Dore

    (Johns Hopkins University)

Abstract

This article investigates how the SCMP, the China Daily-and western-based newspapers cover protests in Hong Kong in an effort to identify changes in journalistic practices between 1998 and 2020. It combines natural language processing (NLP) with a qualitative investigation of a novel corpus of newspaper articles spanning 22 years. It enlists topic modeling to contrast the treatment of protests in Hong Kong diachronically and across news sources. Through comparison of lexical frequency and lexical usage it showcases preferences and discrepancies in the use of protest-relevant keywords in the newspapers’ articles. Embedding neighborhood comparisons strengthens our understanding of how words are used differently between the SCMP, the China Daily and western-based newspapers, and also how the context of protest-related keywords may differ across news sources over time. Finally, computational sentiment analysis measures the tone and connotations of articles. The article fills a gap in the literature on Hong Kong media and its methodology broadens the application of NLP techniques to the social sciences.

Suggested Citation

  • Giovanna Maria Dora Dore, 2023. "A Natural Language Processing Analysis of Newspapers Coverage of Hong Kong Protests Between 1998 and 2020," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 169(1), pages 143-166, September.
  • Handle: RePEc:spr:soinre:v:169:y:2023:i:1:d:10.1007_s11205-023-03147-0
    DOI: 10.1007/s11205-023-03147-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11205-023-03147-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11205-023-03147-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
    2. Daniel D. Lee & H. Sebastian Seung, 1999. "Learning the parts of objects by non-negative matrix factorization," Nature, Nature, vol. 401(6755), pages 788-791, October.
    3. Gehlbach, Scott & Sonin, Konstantin, 2014. "Government control of the media," Journal of Public Economics, Elsevier, vol. 118(C), pages 163-171.
    4. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    5. Kevin M. Quinn & Burt L. Monroe & Michael Colaresi & Michael H. Crespin & Dragomir R. Radev, 2010. "How to Analyze Political Attention with Minimal Assumptions and Costs," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 209-228, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. van Loon, Austin, 2022. "Three Families of Automated Text Analysis," SocArXiv htnej, Center for Open Science.
    2. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    3. Dehler-Holland, Joris & Schumacher, Kira & Fichtner, Wolf, 2021. "Topic Modeling Uncovers Shifts in Media Framing of the German Renewable Energy Act," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 2(1).
    4. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    5. Mohamed M. Mostafa, 2023. "A one-hundred-year structural topic modeling analysis of the knowledge structure of international management research," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3905-3935, August.
    6. Zhang, Han, 2021. "How Using Machine Learning Classification as a Variable in Regression Leads to Attenuation Bias and What to Do About It," SocArXiv 453jk, Center for Open Science.
    7. Diego Kozlowski & Viktoriya Semeshenko & Andrea Molinari, 2021. "Latent Dirichlet allocation model for world trade analysis," PLOS ONE, Public Library of Science, vol. 16(2), pages 1-18, February.
    8. Yang Bao & Anindya Datta, 2014. "Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures," Management Science, INFORMS, vol. 60(6), pages 1371-1391, June.
    9. Dehler-Holland, Joris & Okoh, Marvin & Keles, Dogan, 2022. "Assessing technology legitimacy with topic models and sentiment analysis – The case of wind power in Germany," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    10. Bastian Schaefermeier & Gerd Stumme & Tom Hanika, 2021. "Topic space trajectories," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5759-5795, July.
    11. Gadat, Sébastien & Villeneuve, Stéphane, 2023. "Parsimonious Wasserstein Text-mining," TSE Working Papers 23-1471, Toulouse School of Economics (TSE).
    12. Enna Hirata & Daisuke Watanabe & Athanasios Chalmoukis & Maria Lambrou, 2024. "A Topic Modeling Approach to Determine Supply Chain Management Priorities Enabled by Digital Twin Technology," Sustainability, MDPI, vol. 16(9), pages 1-15, April.
    13. Tobias Koopmann & Maximilian Stubbemann & Matthias Kapa & Michael Paris & Guido Buenstorf & Tom Hanika & Andreas Hotho & Robert Jäschke & Gerd Stumme, 2021. "Proximity dimensions and the emergence of collaboration: a HypTrails study on German AI research," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9847-9868, December.
    14. Lino Wehrheim, 2019. "Economic history goes digital: topic modeling the Journal of Economic History," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 13(1), pages 83-125, January.
    15. D. Thorleuchter & D. Van Den Poel, 2013. "Weak Signal Identification with Semantic Web Mining," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 13/860, Ghent University, Faculty of Economics and Business Administration.
    16. Lehotský, Lukáš & Černoch, Filip & Osička, Jan & Ocelík, Petr, 2019. "When climate change is missing: Media discourse on coal mining in the Czech Republic," Energy Policy, Elsevier, vol. 129(C), pages 774-786.
    17. Greene, Zac & Ceron, Andrea & Schumacher, Gijs & Fazekas, Zoltan, 2016. "The Nuts and Bolts of Automated Text Analysis. Comparing Different Document Pre-Processing Techniques in Four Countries," OSF Preprints ghxj8, Center for Open Science.
    18. Ash, Elliott & Gauthier, Germain & Widmer, Philine, 2024. "Relatio: Text Semantics Capture Political and Economic Narratives," Political Analysis, Cambridge University Press, vol. 32(1), pages 115-132, January.
    19. Florence Ertel & Simon Donig & Markus Eckl & Sebastian Gassner & Daniel Göler & Malte Rehbein, 2024. "Using web archives for an explorative study of the web presence of German parties during the European election 2019," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(1), pages 603-625, February.
    20. Born, Andreas & Janssen, Aljoscha, 2020. "Does a District-Vote Matter for the Behavior of Politicians? A Textual Analysis of Parliamentary Speeches," Working Paper Series 1320, Research Institute of Industrial Economics.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:soinre:v:169:y:2023:i:1:d:10.1007_s11205-023-03147-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.