IDEAS home Printed from https://ideas.repec.org/a/spr/stmapp/v29y2020i4d10.1007_s10260-019-00504-7.html
   My bibliography  Save this article

Monitoring rare categories in sentiment and opinion analysis: a Milan mega event on Twitter platform

Author

Listed:
  • Anna Calissano

    (Politecnico di Milano)

  • Simone Vantini

    (Politecnico di Milano)

  • Marika Arena

    (Politecnico di Milano)

Abstract

This paper proposes a new aggregated classification scheme aimed to support the implementation of semantic text analysis methods in contexts characterized by the presence of rare text categories. The proposed approach starts from the aggregate supervised text classifier developed by Hopkins and King and moves forward, relying on rare event sampling methods. In detail, it enables the analyst to enlarge the number of estimated sentiment categories, both preserving the estimation accuracy and reducing the working time to unconditionally increase the size of the training set. The approach is applied to study the daily evolution of the web reputation of one of the last mega-event taking place in Europe: Expo Milano. The corpus consists of more than one million tweets in both Italian and English, discussing about the event. The analysis provides an interesting portrayal of the evolution of the Expo stakeholders’ opinions over time and allows the identification of the main drivers of the Expo reputation. The algorithm will be implemented as a running option in the next release of the R package ReadMe.

Suggested Citation

  • Anna Calissano & Simone Vantini & Marika Arena, 2020. "Monitoring rare categories in sentiment and opinion analysis: a Milan mega event on Twitter platform," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(4), pages 787-812, December.
  • Handle: RePEc:spr:stmapp:v:29:y:2020:i:4:d:10.1007_s10260-019-00504-7
    DOI: 10.1007/s10260-019-00504-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10260-019-00504-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10260-019-00504-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
    2. Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
    3. Margaret E. Roberts & Brandon M. Stewart & Edoardo M. Airoldi, 2016. "A Model of Text for Experimentation in the Social Sciences," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 988-1003, July.
    4. Martin, Lanny W. & Vanberg, Georg, 2008. "A Robust Transformation Procedure for Interpreting Political Text," Political Analysis, Cambridge University Press, vol. 16(1), pages 93-100, January.
    5. Jonathan B. Slapin & Sven‐Oliver Proksch, 2008. "A Scaling Model for Estimating Time‐Series Party Positions from Texts," American Journal of Political Science, John Wiley & Sons, vol. 52(3), pages 705-722, July.
    6. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    7. Laver, Michael & Benoit, Kenneth & Garry, John, 2003. "Extracting Policy Positions from Political Texts Using Words as Data," American Political Science Review, Cambridge University Press, vol. 97(2), pages 311-331, May.
    8. Daniel J. Hopkins & Gary King, 2010. "A Method of Automated Nonparametric Content Analysis for Social Science," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 229-247, January.
    9. Lowe, Will, 2008. "Understanding Wordscores," Political Analysis, Cambridge University Press, vol. 16(4), pages 356-371.
    10. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    11. Michael Salter-Townshend & Thomas Murphy, 2014. "Mixtures of biased sentiment analysers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(1), pages 85-103, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pierre-Marc Daigneault & Dominic Duval & Louis M. Imbeau, 2018. "Supervised scaling of semi-structured interview transcripts to characterize the ideology of a social policy reform," Quality & Quantity: International Journal of Methodology, Springer, vol. 52(5), pages 2151-2162, September.
    2. Martin Haselmayer & Marcelo Jenny, 2017. "Sentiment analysis of political communication: combining a dictionary approach with crowdcoding," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(6), pages 2623-2646, November.
    3. van Loon, Austin, 2022. "Three Families of Automated Text Analysis," SocArXiv htnej, Center for Open Science.
    4. Pongsak Luangaram & Yuthana Sethapramote, 2016. "Central Bank Communication and Monetary Policy Effectiveness: Evidence from Thailand," PIER Discussion Papers 20, Puey Ungphakorn Institute for Economic Research.
    5. Hanna Bäck & Marc Debus & Wolfgang C. Müller, 2016. "Intra-party diversity and ministerial selection in coalition governments," Public Choice, Springer, vol. 166(3), pages 355-378, March.
    6. Diaf, Sami & Döpke, Jörg & Fritsche, Ulrich & Rockenbach, Ida, 2022. "Sharks and minnows in a shoal of words: Measuring latent ideological positions based on text mining techniques," European Journal of Political Economy, Elsevier, vol. 75(C).
    7. Adriana Bunea & Raimondas Ibenskas, 2015. "Quantitative text analysis and the study of EU lobbying and interest groups," European Union Politics, , vol. 16(3), pages 429-455, September.
    8. Rebecca Cordell & Kristian Skrede Gleditsch & Florian G Kern & Laura Saavedra-Lux, 2020. "Measuring institutional variation across American Indian constitutions using automated content analysis," Journal of Peace Research, Peace Research Institute Oslo, vol. 57(6), pages 777-788, November.
    9. Caroline Le Pennec, 2020. "Strategic Campaign Communication: Evidence from 30,000 Candidate Manifestos," SoDa Laboratories Working Paper Series 2020-05, Monash University, SoDa Laboratories.
    10. Auffenberg, Jennie & Marcinkiewicz, Kamil, 2013. "Wer gestaltet, wer verwaltet Reformen im öffentlichen Dienst? Ein Methodenvergleich zur Analyse von Arbeitsbeziehungen in Reformprozessen anhand der Polizei Brandenburg," TranState Working Papers 170, University of Bremen, Collaborative Research Center 597: Transformations of the State.
    11. Greene, Zac & Ceron, Andrea & Schumacher, Gijs & Fazekas, Zoltan, 2016. "The Nuts and Bolts of Automated Text Analysis. Comparing Different Document Pre-Processing Techniques in Four Countries," OSF Preprints ghxj8, Center for Open Science.
    12. Born, Andreas & Janssen, Aljoscha, 2020. "Does a District-Vote Matter for the Behavior of Politicians? A Textual Analysis of Parliamentary Speeches," Working Paper Series 1320, Research Institute of Industrial Economics.
    13. Marc Debus, 2009. "Pre-electoral commitments and government formation," Public Choice, Springer, vol. 138(1), pages 45-64, January.
    14. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    15. Andres Algaba & David Ardia & Keven Bluteau & Samuel Borms & Kris Boudt, 2020. "Econometrics Meets Sentiment: An Overview Of Methodology And Applications," Journal of Economic Surveys, Wiley Blackwell, vol. 34(3), pages 512-547, July.
    16. Merz, Nicolas & Regel, Sven & Lewandowski, Jirka, 2016. "The Manifesto Corpus: A new resource for research on political parties and quantitative text analysis," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 3(2 (April-), pages 1-8.
    17. H Andrew Schwartz & Johannes C Eichstaedt & Margaret L Kern & Lukasz Dziurzynski & Stephanie M Ramones & Megha Agrawal & Achal Shah & Michal Kosinski & David Stillwell & Martin E P Seligman & Lyle H U, 2013. "Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-16, September.
    18. Alexander Herzog & Slava Mikhaylov, 2010. "A new Database of Parliamentary Debates in Ireland, 1922--2008," The Institute for International Integration Studies Discussion Paper Series iiisdp338, IIIS, revised Jul 2010.
    19. Marcinkiewicz, Kamil & Auffenberg, Jennie & Kittel, Bernhard, 2012. "Politikpositionen im Reformprozess des öffentlichen Dienstes: Zur Übertragbarkeit der quantitativen Textanalyse," TranState Working Papers 162, University of Bremen, Collaborative Research Center 597: Transformations of the State.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stmapp:v:29:y:2020:i:4:d:10.1007_s10260-019-00504-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.