IDEAS home Printed from https://ideas.repec.org/p/hal/cesptp/hal-03205149.html
   My bibliography  Save this paper

Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages

Author

Listed:
  • Thomas Renault

    (CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique)

Abstract

We use a large dataset of one million messages sent on the microblogging platform StockTwits to evaluate the performance of a wide range of preprocessing methods and machine learning algorithms for sentiment analysis in finance. We find that adding bigrams and emojis significantly improve sentiment classification performance. However, more complex and time-consuming machine learning methods, such as random forests or neural networks, do not improve the accuracy of the classification. We also provide empirical evidence that the preprocessing method and the size of the dataset have a strong impact on the correlation between investor sentiment and stock returns. While investor sentiment and stock returns are highly correlated, we do not find that investor sentiment derived from messages sent on social media helps in predicting large capitalization stocks return at a daily frequency.

Suggested Citation

  • Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-03205149, HAL.
  • Handle: RePEc:hal:cesptp:hal-03205149
    DOI: 10.1007/s42521-019-00014-x
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a search for a similarly titled item that would be available.

    Other versions of this item:

    References listed on IDEAS

    as
    1. Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
    2. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    3. Thomas Renault, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-03205113, HAL.
    4. Price, S. McKay & Doran, James S. & Peterson, David R. & Bliss, Barbara A., 2012. "Earnings conference calls and stock returns: The incremental informativeness of textual tone," Journal of Banking & Finance, Elsevier, vol. 36(4), pages 992-1011.
    5. Leung, Henry & Ton, Thai, 2015. "The impact of internet stock message boards on cross-sectional returns of small-capitalization stocks," Journal of Banking & Finance, Elsevier, vol. 55(C), pages 37-55.
    6. Gabriele Ranco & Darko Aleksovski & Guido Caldarelli & Miha Grčar & Igor Mozetič, 2015. "The Effects of Twitter Sentiment on Stock Price Returns," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-21, September.
    7. Chen, Cathy Yi-Hsuan & Després, Roméo & Guo, Li & Renault, Thomas, 2019. "What makes cryptocurrencies special? Investor sentiment and return predictability during the bubble," IRTG 1792 Discussion Papers 2019-016, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    8. Timm O. Sprenger & Philipp G. Sandner & Andranik Tumasjan & Isabell M. Welpe, 2014. "News or Noise? Using Twitter to Identify and Understand Company-specific News Flow," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 41(7-8), pages 791-830, September.
    9. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    10. Paul C. Tetlock & Maytal Saar‐Tsechansky & Sofus Macskassy, 2008. "More Than Words: Quantifying Language to Measure Firms' Fundamentals," Journal of Finance, American Finance Association, vol. 63(3), pages 1437-1467, June.
    11. Feng Li, 2010. "The Information Content of Forward‐Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach," Journal of Accounting Research, Wiley Blackwell, vol. 48(5), pages 1049-1102, December.
    12. Ahmad, Khurshid & Han, JingGuang & Hutson, Elaine & Kearney, Colm & Liu, Sha, 2016. "Media-expressed negative tone and firm-level stock returns," Journal of Corporate Finance, Elsevier, vol. 37(C), pages 152-172.
    13. Diego García, 2013. "Sentiment during Recessions," Journal of Finance, American Finance Association, vol. 68(3), pages 1267-1300, June.
    14. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    15. repec:bla:jfinan:v:59:y:2004:i:3:p:1259-1294 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. ?ikolaos A. Kyriazis, 2021. "Impacts of Stock Indices, Oil, and Twitter Sentiment on Major Cryptocurrencies during the COVID-19 First Wave," Bulletin of Applied Economics, Risk Market Journals, vol. 8(2), pages 133-146.
    2. Béatrice BOULU-RESHEF & Catherine BRUNEAU & Maxime NICOLAS & Thomas RENAULT, 2022. "An Experimental Analysis of Investor Sentiment," LEO Working Papers / DR LEO 2940, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    3. Nicolas, Maxime L.D., 2022. "Estimating a model of herding behavior on social networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 604(C).
    4. Audrino, Francesco & Offner, Eric A., 2024. "The impact of macroeconomic news sentiment on interest rates," International Review of Financial Analysis, Elsevier, vol. 94(C).
    5. Yuqi Nie & Yaxuan Kong & Xiaowen Dong & John M. Mulvey & H. Vincent Poor & Qingsong Wen & Stefan Zohren, 2024. "A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges," Papers 2406.11903, arXiv.org.
    6. Ahmed Bouteska & Taimur Sharif & Mohammad Zoynul Abedin, 2024. "Does investor sentiment create value for asset pricing? An empirical investigation of the KOSPI‐listed firms," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 29(3), pages 3487-3509, July.
    7. Andrew Todd & James Bowden & Yashar Moshfeghi, 2024. "Text‐based sentiment analysis in finance: Synthesising the existing literature and exploring future directions," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 31(1), March.
    8. Mazzotta, Stefano, 2022. "Immigration narrative sentiment from TV news and the stock market," Journal of Behavioral and Experimental Finance, Elsevier, vol. 34(C).
    9. Bowden, James & Gemayel, Roland, 2022. "Sentiment and trading decisions in an ambiguous environment: A study on cryptocurrency traders," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 80(C).
    10. Yulius Hari & Maharani Kusuma Putri & Darmanto, 2024. "Analysis and Development of Information System for Cyberbullying Tendency on Twitter Social Media Using the Naïve Bayes Approach," International Journal of Research and Innovation in Social Science, International Journal of Research and Innovation in Social Science (IJRISS), vol. 8(6), pages 1551-1557, June.
    11. Qing Liu & Hosung Son, 2024. "Data selection and collection for constructing investor sentiment from social media," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-13, December.
    12. Liu, Keyan & Zhou, Jianan & Dong, Dayong, 2021. "Improving stock price prediction using the long short-term memory model combined with online social networks," Journal of Behavioral and Experimental Finance, Elsevier, vol. 30(C).
    13. Ben Hasselgren & Christos Chrysoulas & Nikolaos Pitropakis & William J. Buchanan, 2022. "Using Social Media & Sentiment Analysis to Make Investment Decisions," Future Internet, MDPI, vol. 15(1), pages 1-23, December.
    14. Md Shamim Hossain & Mst Farjana Rahman, 2023. "Customer Sentiment Analysis and Prediction of Insurance Products’ Reviews Using Machine Learning Approaches," FIIB Business Review, , vol. 12(4), pages 386-402, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daniele Ballinari & Simon Behrendt, 2021. "How to gauge investor behavior? A comparison of online investor sentiment measures," Digital Finance, Springer, vol. 3(2), pages 169-204, June.
    2. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    3. Miwa, Kotaro, 2022. "The informational role of analysts’ textual statements," Research in International Business and Finance, Elsevier, vol. 59(C).
    4. Christina Bannier & Thomas Pauls & Andreas Walter, 2019. "Content analysis of business communication: introducing a German dictionary," Journal of Business Economics, Springer, vol. 89(1), pages 79-123, February.
    5. Li, Xiao, 2020. "When financial literacy meets textual analysis: A conceptual review," Journal of Behavioral and Experimental Finance, Elsevier, vol. 28(C).
    6. Enwei Zhu & Jing Wu & Hongyu Liu & Keyang Li, 2023. "A Sentiment Index of the Housing Market in China: Text Mining of Narratives on Social Media," The Journal of Real Estate Finance and Economics, Springer, vol. 66(1), pages 77-118, January.
    7. Andrew Todd & James Bowden & Yashar Moshfeghi, 2024. "Text‐based sentiment analysis in finance: Synthesising the existing literature and exploring future directions," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 31(1), March.
    8. Yan Luo & Linying Zhou, 2020. "Textual tone in corporate financial disclosures: a survey of the literature," International Journal of Disclosure and Governance, Palgrave Macmillan, vol. 17(2), pages 101-110, September.
    9. Renato Camodeca & Alex Almici & Umberto Sagliaschi, 2018. "Sustainability Disclosure in Integrated Reporting: Does It Matter to Investors? A Cheap Talk Approach," Sustainability, MDPI, vol. 10(12), pages 1-34, November.
    10. Szymon Lis, 2022. "Investor Sentiment in Asset Pricing Models: A Review," Working Papers 2022-14, Faculty of Economic Sciences, University of Warsaw.
    11. Tom Marty & Bruce Vanstone & Tobias Hahn, 2020. "News media analytics in finance: a survey," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 60(2), pages 1385-1434, June.
    12. Tim Loughran & Bill Mcdonald, 2016. "Textual Analysis in Accounting and Finance: A Survey," Journal of Accounting Research, Wiley Blackwell, vol. 54(4), pages 1187-1230, September.
    13. Rui Fan & Oleksandr Talavera & Vu Tran, 2020. "Social media bots and stock markets," European Financial Management, European Financial Management Association, vol. 26(3), pages 753-777, June.
    14. Ardia, David & Bluteau, Keven & Boudt, Kris, 2022. "Media abnormal tone, earnings announcements, and the stock market," Journal of Financial Markets, Elsevier, vol. 61(C).
    15. Fang, Hao & Chung, Chien-Ping & Lu, Yang-Cheng & Lee, Yen-Hsien & Wang, Wen-Hao, 2021. "The impacts of investors' sentiments on stock returns using fintech approaches," International Review of Financial Analysis, Elsevier, vol. 77(C).
    16. Mohammad Alomari & Abdel Razzaq Al rababa’a & Ghaith El-Nader & Ahmad Alkhataybeh, 2021. "Who’s behind the wheel? The role of social and media news in driving the stock–bond correlation," Review of Quantitative Finance and Accounting, Springer, vol. 57(3), pages 959-1007, October.
    17. Bassyouny, Hesham & Abdelfattah, Tarek & Tao, Lei, 2022. "Narrative disclosure tone: A review and areas for future research," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 49(C).
    18. Alomari, Mohammad & Al Rababa’a, Abdel Razzaq & El-Nader, Ghaith & Alkhataybeh, Ahmad & Ur Rehman, Mobeen, 2021. "Examining the effects of news and media sentiments on volatility and correlation: Evidence from the UK," The Quarterly Review of Economics and Finance, Elsevier, vol. 82(C), pages 280-297.
    19. Ingrid E. Fisher & Margaret R. Garnsey & Mark E. Hughes, 2016. "Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 23(3), pages 157-214, July.
    20. Eierle, Brigitte & Klamer, Sebastian & Muck, Matthias, 2022. "Does it really pay off for investors to consider information from social media?," International Review of Financial Analysis, Elsevier, vol. 81(C).

    More about this item

    Keywords

    Social media; StockTwits; Sentiment analysis; Machine learning; Asset pricing;
    All these keywords.

    JEL classification:

    • G10 - Financial Economics - - General Financial Markets - - - General (includes Measurement and Data)
    • G12 - Financial Economics - - General Financial Markets - - - Asset Pricing; Trading Volume; Bond Interest Rates
    • G14 - Financial Economics - - General Financial Markets - - - Information and Market Efficiency; Event Studies; Insider Trading

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:cesptp:hal-03205149. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.