IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-03205149.html

Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages

Author

Listed:
  • Thomas Renault

    (CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique)

Abstract

We use a large dataset of one million messages sent on the microblogging platform StockTwits to evaluate the performance of a wide range of preprocessing methods and machine learning algorithms for sentiment analysis in finance. We find that adding bigrams and emojis significantly improve sentiment classification performance. However, more complex and time-consuming machine learning methods, such as random forests or neural networks, do not improve the accuracy of the classification. We also provide empirical evidence that the preprocessing method and the size of the dataset have a strong impact on the correlation between investor sentiment and stock returns. While investor sentiment and stock returns are highly correlated, we do not find that investor sentiment derived from messages sent on social media helps in predicting large capitalization stocks return at a daily frequency.

Suggested Citation

  • Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Post-Print hal-03205149, HAL.
  • Handle: RePEc:hal:journl:hal-03205149
    DOI: 10.1007/s42521-019-00014-x
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a
    for a similarly titled item that would be available.

    Other versions of this item:

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. ?ikolaos A. Kyriazis, 2021. "Impacts of Stock Indices, Oil, and Twitter Sentiment on Major Cryptocurrencies during the COVID-19 First Wave," Bulletin of Applied Economics, Risk Market Journals, vol. 8(2), pages 133-146.
    2. Mazzotta, Stefano, 2022. "Immigration narrative sentiment from TV news and the stock market," Journal of Behavioral and Experimental Finance, Elsevier, vol. 34(C).
    3. Yulius Hari & Maharani Kusuma Putri & Darmanto, 2024. "Analysis and Development of Information System for Cyberbullying Tendency on Twitter Social Media Using the Naïve Bayes Approach," International Journal of Research and Innovation in Social Science, International Journal of Research and Innovation in Social Science (IJRISS), vol. 8(6), pages 1551-1557, June.
    4. Aziz Ullah & He Biao & Assad Ullah, 2024. "Unveiling the Nexus Between Crises, Investor Sentiment, and Volatility of Tourism-Related Stocks: Empirical Findings From Pakistan," SAGE Open, , vol. 14(3), pages 21582440241, August.
    5. Todd, Andrew & Bowden, James & Cummins, Mark & Su, Yang, 2025. "A multimodal sentiment classifier for financial decision making," International Review of Financial Analysis, Elsevier, vol. 105(C).
    6. Nicolas, Maxime L.D., 2022. "Estimating a model of herding behavior on social networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 604(C).
    7. Yuqi Nie & Yaxuan Kong & Xiaowen Dong & John M. Mulvey & H. Vincent Poor & Qingsong Wen & Stefan Zohren, 2024. "A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges," Papers 2406.11903, arXiv.org.
    8. Li, Jing, 2025. "Corporate governance, fraud learning cycles, and financial fraud detection: Evidence from Chinese listed firms," Research in International Business and Finance, Elsevier, vol. 76(C).
    9. Béatrice Boulu-Reshef & Catherine Bruneau & Maxime Nicolas & Thomas Renault, 2022. "An Experimental Analysis of Investor Sentiment," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-04222561, HAL.
    10. Peng, Yaohao & de Moraes Souza, João Gabriel, 2024. "Chaos, overfitting and equilibrium: To what extent can machine learning beat the financial market?," International Review of Financial Analysis, Elsevier, vol. 95(PB).
    11. Md Shamim Hossain & Mst Farjana Rahman, 2023. "Customer Sentiment Analysis and Prediction of Insurance Products’ Reviews Using Machine Learning Approaches," FIIB Business Review, , vol. 12(4), pages 386-402, December.
    12. Bredice, Marilena & Formisano, Anna Vittoria & Kullafi, Sara & Palma, Pasquale, 2025. "Access to credit and fintech: A lexicon-based sentiment analysis application on Twitter data," Research in International Business and Finance, Elsevier, vol. 77(PA).
    13. Andrew Todd & James Bowden & Yashar Moshfeghi, 2024. "Text‐based sentiment analysis in finance: Synthesising the existing literature and exploring future directions," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 31(1), March.
    14. Qing Liu & Hosung Son, 2024. "Data selection and collection for constructing investor sentiment from social media," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 11(1), pages 1-13, December.
    15. Ben Hasselgren & Christos Chrysoulas & Nikolaos Pitropakis & William J. Buchanan, 2022. "Using Social Media & Sentiment Analysis to Make Investment Decisions," Future Internet, MDPI, vol. 15(1), pages 1-23, December.
    16. Audrino, Francesco & Offner, Eric A., 2024. "The impact of macroeconomic news sentiment on interest rates," International Review of Financial Analysis, Elsevier, vol. 94(C).
    17. Ahmed Bouteska & Taimur Sharif & Mohammad Zoynul Abedin, 2024. "Does investor sentiment create value for asset pricing? An empirical investigation of the KOSPI‐listed firms," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 29(3), pages 3487-3509, July.
    18. Bowden, James & Gemayel, Roland, 2022. "Sentiment and trading decisions in an ambiguous environment: A study on cryptocurrency traders," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 80(C).
    19. Liu, Keyan & Zhou, Jianan & Dong, Dayong, 2021. "Improving stock price prediction using the long short-term memory model combined with online social networks," Journal of Behavioral and Experimental Finance, Elsevier, vol. 30(C).
    20. Marc-Aurèle Divernois & Damir Filipović, 2024. "StockTwits classified sentiment and stock returns," Digital Finance, Springer, vol. 6(2), pages 249-281, June.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • G10 - Financial Economics - - General Financial Markets - - - General (includes Measurement and Data)
    • G12 - Financial Economics - - General Financial Markets - - - Asset Pricing; Trading Volume; Bond Interest Rates
    • G14 - Financial Economics - - General Financial Markets - - - Information and Market Efficiency; Event Studies; Insider Trading

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-03205149. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.