IDEAS home Printed from https://ideas.repec.org/a/spr/digfin/v2y2020i1d10.1007_s42521-019-00014-x.html
   My bibliography  Save this article

Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages

Author

Listed:
  • Thomas Renault

    (Université Paris 1 Panthéon-Sorbonne, CES & LabEx RéFi, Maison des Sciences Économiques)

Abstract

We use a large dataset of one million messages sent on the microblogging platform StockTwits to evaluate the performance of a wide range of preprocessing methods and machine learning algorithms for sentiment analysis in finance. We find that adding bigrams and emojis significantly improve sentiment classification performance. However, more complex and time-consuming machine learning methods, such as random forests or neural networks, do not improve the accuracy of the classification. We also provide empirical evidence that the preprocessing method and the size of the dataset have a strong impact on the correlation between investor sentiment and stock returns. While investor sentiment and stock returns are highly correlated, we do not find that investor sentiment derived from messages sent on social media helps in predicting large capitalization stocks return at a daily frequency.

Suggested Citation

  • Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Digital Finance, Springer, vol. 2(1), pages 1-13, September.
  • Handle: RePEc:spr:digfin:v:2:y:2020:i:1:d:10.1007_s42521-019-00014-x
    DOI: 10.1007/s42521-019-00014-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s42521-019-00014-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s42521-019-00014-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. Leung, Henry & Ton, Thai, 2015. "The impact of internet stock message boards on cross-sectional returns of small-capitalization stocks," Journal of Banking & Finance, Elsevier, vol. 55(C), pages 37-55.
    2. Paul C. Tetlock & Maytal Saar‐Tsechansky & Sofus Macskassy, 2008. "More Than Words: Quantifying Language to Measure Firms' Fundamentals," Journal of Finance, American Finance Association, vol. 63(3), pages 1437-1467, June.
    3. Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
    4. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    5. Timm O. Sprenger & Philipp G. Sandner & Andranik Tumasjan & Isabell M. Welpe, 2014. "News or Noise? Using Twitter to Identify and Understand Company-specific News Flow," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 41(7-8), pages 791-830, September.
    6. Werner Antweiler & Murray Z. Frank, 2004. "Is All That Talk Just Noise? The Information Content of Internet Stock Message Boards," Journal of Finance, American Finance Association, vol. 59(3), pages 1259-1294, June.
    7. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    8. Ahmad, Khurshid & Han, JingGuang & Hutson, Elaine & Kearney, Colm & Liu, Sha, 2016. "Media-expressed negative tone and firm-level stock returns," Journal of Corporate Finance, Elsevier, vol. 37(C), pages 152-172.
    9. Diego García, 2013. "Sentiment during Recessions," Journal of Finance, American Finance Association, vol. 68(3), pages 1267-1300, June.
    10. Thomas Renault, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-03205113, HAL.
    11. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    12. Gabriele Ranco & Darko Aleksovski & Guido Caldarelli & Miha Grčar & Igor Mozetič, 2015. "The Effects of Twitter Sentiment on Stock Price Returns," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-21, September.
    13. Price, S. McKay & Doran, James S. & Peterson, David R. & Bliss, Barbara A., 2012. "Earnings conference calls and stock returns: The incremental informativeness of textual tone," Journal of Banking & Finance, Elsevier, vol. 36(4), pages 992-1011.
    14. Feng Li, 2010. "The Information Content of Forward‐Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach," Journal of Accounting Research, Wiley Blackwell, vol. 48(5), pages 1049-1102, December.
    15. Chen, Cathy Yi-Hsuan & Després, Roméo & Guo, Li & Renault, Thomas, 2019. "What makes cryptocurrencies special? Investor sentiment and return predictability during the bubble," IRTG 1792 Discussion Papers 2019-016, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. ?ikolaos A. Kyriazis, 2021. "Impacts of Stock Indices, Oil, and Twitter Sentiment on Major Cryptocurrencies during the COVID-19 First Wave," Bulletin of Applied Economics, Risk Market Journals, vol. 8(2), pages 133-146.
    2. Béatrice BOULU-RESHEF & Catherine BRUNEAU & Maxime NICOLAS & Thomas RENAULT, 2022. "An Experimental Analysis of Investor Sentiment," LEO Working Papers / DR LEO 2940, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    3. Nicolas, Maxime L.D., 2022. "Estimating a model of herding behavior on social networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 604(C).
    4. Mazzotta, Stefano, 2022. "Immigration narrative sentiment from TV news and the stock market," Journal of Behavioral and Experimental Finance, Elsevier, vol. 34(C).
    5. Bowden, James & Gemayel, Roland, 2022. "Sentiment and trading decisions in an ambiguous environment: A study on cryptocurrency traders," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 80(C).
    6. Liu, Keyan & Zhou, Jianan & Dong, Dayong, 2021. "Improving stock price prediction using the long short-term memory model combined with online social networks," Journal of Behavioral and Experimental Finance, Elsevier, vol. 30(C).
    7. Ben Hasselgren & Christos Chrysoulas & Nikolaos Pitropakis & William J. Buchanan, 2022. "Using Social Media & Sentiment Analysis to Make Investment Decisions," Future Internet, MDPI, vol. 15(1), pages 1-23, December.
    8. Md Shamim Hossain & Mst Farjana Rahman, 2023. "Customer Sentiment Analysis and Prediction of Insurance Products’ Reviews Using Machine Learning Approaches," FIIB Business Review, , vol. 12(4), pages 386-402, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daniele Ballinari & Simon Behrendt, 2021. "How to gauge investor behavior? A comparison of online investor sentiment measures," Digital Finance, Springer, vol. 3(2), pages 169-204, June.
    2. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    3. Miwa, Kotaro, 2022. "The informational role of analysts’ textual statements," Research in International Business and Finance, Elsevier, vol. 59(C).
    4. Christina Bannier & Thomas Pauls & Andreas Walter, 2019. "Content analysis of business communication: introducing a German dictionary," Journal of Business Economics, Springer, vol. 89(1), pages 79-123, February.
    5. Li, Xiao, 2020. "When financial literacy meets textual analysis: A conceptual review," Journal of Behavioral and Experimental Finance, Elsevier, vol. 28(C).
    6. Enwei Zhu & Jing Wu & Hongyu Liu & Keyang Li, 2023. "A Sentiment Index of the Housing Market in China: Text Mining of Narratives on Social Media," The Journal of Real Estate Finance and Economics, Springer, vol. 66(1), pages 77-118, January.
    7. Ingrid E. Fisher & Margaret R. Garnsey & Mark E. Hughes, 2016. "Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 23(3), pages 157-214, July.
    8. Eierle, Brigitte & Klamer, Sebastian & Muck, Matthias, 2022. "Does it really pay off for investors to consider information from social media?," International Review of Financial Analysis, Elsevier, vol. 81(C).
    9. Szymon Lis, 2022. "Investor Sentiment in Asset Pricing Models: A Review," Working Papers 2022-14, Faculty of Economic Sciences, University of Warsaw.
    10. Tom Marty & Bruce Vanstone & Tobias Hahn, 2020. "News media analytics in finance: a survey," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 60(2), pages 1385-1434, June.
    11. Tim Loughran & Bill Mcdonald, 2016. "Textual Analysis in Accounting and Finance: A Survey," Journal of Accounting Research, Wiley Blackwell, vol. 54(4), pages 1187-1230, September.
    12. Chen, Cathy Yi-Hsuan & Després, Roméo & Guo, Li & Renault, Thomas, 2019. "What makes cryptocurrencies special? Investor sentiment and return predictability during the bubble," IRTG 1792 Discussion Papers 2019-016, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    13. Rui Fan & Oleksandr Talavera & Vu Tran, 2020. "Social media bots and stock markets," European Financial Management, European Financial Management Association, vol. 26(3), pages 753-777, June.
    14. Fang, Hao & Chung, Chien-Ping & Lu, Yang-Cheng & Lee, Yen-Hsien & Wang, Wen-Hao, 2021. "The impacts of investors' sentiments on stock returns using fintech approaches," International Review of Financial Analysis, Elsevier, vol. 77(C).
    15. Chouliaras, Andreas, 2015. "The Pessimism Factor: SEC EDGAR Form 10-K Textual Analysis and Stock Returns," MPRA Paper 65585, University Library of Munich, Germany.
    16. Yan Luo & Linying Zhou, 2020. "Textual tone in corporate financial disclosures: a survey of the literature," International Journal of Disclosure and Governance, Palgrave Macmillan, vol. 17(2), pages 101-110, September.
    17. Ahmed, Yousry & Elshandidy, Tamer, 2016. "The effect of bidder conservatism on M&A decisions: Text-based evidence from US 10-K filings," International Review of Financial Analysis, Elsevier, vol. 46(C), pages 176-190.
    18. Ahmad, Khurshid & Han, JingGuang & Hutson, Elaine & Kearney, Colm & Liu, Sha, 2016. "Media-expressed negative tone and firm-level stock returns," Journal of Corporate Finance, Elsevier, vol. 37(C), pages 152-172.
    19. Steven Heston & Nitish R. Sinha, 2016. "News versus Sentiment : Predicting Stock Returns from News Stories," Finance and Economics Discussion Series 2016-048, Board of Governors of the Federal Reserve System (U.S.).
    20. Renato Camodeca & Alex Almici & Umberto Sagliaschi, 2018. "Sustainability Disclosure in Integrated Reporting: Does It Matter to Investors? A Cheap Talk Approach," Sustainability, MDPI, vol. 10(12), pages 1-34, November.

    More about this item

    Keywords

    Social media; StockTwits; Sentiment analysis; Machine learning; Asset pricing;
    All these keywords.

    JEL classification:

    • G10 - Financial Economics - - General Financial Markets - - - General (includes Measurement and Data)
    • G12 - Financial Economics - - General Financial Markets - - - Asset Pricing; Trading Volume; Bond Interest Rates
    • G14 - Financial Economics - - General Financial Markets - - - Information and Market Efficiency; Event Studies; Insider Trading

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:digfin:v:2:y:2020:i:1:d:10.1007_s42521-019-00014-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.