IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0155036.html
   My bibliography  Save this article

Multilingual Twitter Sentiment Classification: The Role of Human Annotators

Author

Listed:
  • Igor Mozetič
  • Miha Grčar
  • Jasmina Smailović

Abstract

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.

Suggested Citation

  • Igor Mozetič & Miha Grčar & Jasmina Smailović, 2016. "Multilingual Twitter Sentiment Classification: The Role of Human Annotators," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-26, May.
  • Handle: RePEc:plo:pone00:0155036
    DOI: 10.1371/journal.pone.0155036
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155036
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0155036&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0155036?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Fabiana Zollo & Petra Kralj Novak & Michela Del Vicario & Alessandro Bessi & Igor Mozetič & Antonio Scala & Guido Caldarelli & Walter Quattrociocchi, 2015. "Emotional Dynamics in the Age of Misinformation," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-22, September.
    2. Gabriele Ranco & Darko Aleksovski & Guido Caldarelli & Miha Grčar & Igor Mozetič, 2015. "The Effects of Twitter Sentiment on Stock Price Returns," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-21, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Vuk Batanović & Miloš Cvetanović & Boško Nikolić, 2020. "A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-30, November.
    2. Peter Gabrovšek & Darko Aleksovski & Igor Mozetič & Miha Grčar, 2017. "Twitter sentiment around the Earnings Announcement events," PLOS ONE, Public Library of Science, vol. 12(2), pages 1-21, February.
    3. Paweł Matuszewski, 2023. "How to prepare data for the automatic classification of politically related beliefs expressed on Twitter? The consequences of researchers’ decisions on the number of coders, the algorithm learning pro," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(1), pages 301-321, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yousaf, Imran & Youssef, Manel & Goodell, John W., 2022. "Quantile connectedness between sentiment and financial markets: Evidence from the S&P 500 twitter sentiment index," International Review of Financial Analysis, Elsevier, vol. 83(C).
    2. Matteo Iacopini & Carlo R.M.A. Santagiustina, 2021. "Filtering the intensity of public concern from social media count data with jumps," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1283-1302, October.
    3. Darko Cherepnalkoski & Andreas Karpf & Igor Mozetič & Miha Grčar, 2016. "Cohesion and Coalition Formation in the European Parliament: Roll-Call Votes and Twitter Activities," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-27, November.
    4. Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Digital Finance, Springer, vol. 2(1), pages 1-13, September.
    5. Marlene Amstad & Leonardo Gambacorta & Chao He & Dora Xia, 2021. "Trade sentiment and the stock market: new evidence based on big data textual analysis of Chinese media," BIS Working Papers 917, Bank for International Settlements.
    6. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    7. Agrrawal, Pankaj & Agarwal, Rajat, 2023. "A Longer-Term evaluation of Information releases by Influential market Agents and the Semi-strong market Efficiency," EconStor Preprints 273555, ZBW - Leibniz Information Centre for Economics.
    8. Gabriele Ranco & Darko Aleksovski & Guido Caldarelli & Miha Grčar & Igor Mozetič, 2015. "The Effects of Twitter Sentiment on Stock Price Returns," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-21, September.
    9. Ahelegbey, Daniel Felix & Cerchiello, Paola & Scaramozzino, Roberta, 2022. "Network based evidence of the financial impact of Covid-19 pandemic," International Review of Financial Analysis, Elsevier, vol. 81(C).
    10. Soudeep Deb, 2023. "Analyzing airlines stock price volatility during COVID‐19 pandemic through internet search data," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 28(2), pages 1497-1513, April.
    11. Arcuri, Maria Cristina & Gandolfi, Gino & Russo, Ivan, 2023. "Does fake news impact stock returns? Evidence from US and EU stock markets," Journal of Economics and Business, Elsevier, vol. 125.
    12. Jimei Shen & Zhehu Yuan & Yifan Jin, 2022. "AlphaMLDigger: A Novel Machine Learning Solution to Explore Excess Return on Investment," Papers 2206.11072, arXiv.org, revised Dec 2022.
    13. Frank Z. Xing & Erik Cambria & Lorenzo Malandri & Carlo Vercellis, 2018. "Discovering Bayesian Market Views for Intelligent Asset Allocation," Papers 1802.09911, arXiv.org, revised Jun 2018.
    14. Bessi, Alessandro, 2017. "On the statistical properties of viral misinformation in online social media," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 469(C), pages 459-470.
    15. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    16. Paul A. Griffin & Mohammedi Padaria, 2017. "Is Financial Analysis Doomed? The Birth of “Reactive Valuation†Analysis," Accounting and Finance Research, Sciedu Press, vol. 6(3), pages 1-39, August.
    17. Muhammad Kamran Khan & Jian-Zhou Teng & Muhammad Imran Khan, 2019. "Asymmetric impact of oil prices on stock returns in Shanghai stock exchange: Evidence from asymmetric ARDL model," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-14, June.
    18. Mahmoudi, Nader & Docherty, Paul & Melia, Adrian, 2022. "Firm-level investor sentiment and corporate announcement returns," Journal of Banking & Finance, Elsevier, vol. 144(C).
    19. Stefan Claus & Massimo Stella, 2022. "Natural Language Processing and Cognitive Networks Identify UK Insurers’ Trends in Investor Day Transcripts," Future Internet, MDPI, vol. 14(10), pages 1-18, October.
    20. Klaus, Jürgen & Koser, Christoph, 2021. "Measuring Trump: The Volfefe Index and its impact on European financial markets," Finance Research Letters, Elsevier, vol. 38(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0155036. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.