IDEAS home Printed from https://ideas.repec.org/p/zbw/iwqwdp/142018.html
   My bibliography  Save this paper

Separating the signal from the noise - financial machine learning for Twitter

Author

Listed:
  • Schnaubelt, Matthias
  • Fischer, Thomas G.
  • Krauss, Christopher

Abstract

Most statistical arbitrage strategies in the academic literature soley rely on price time series. By contrast, alternative data sources are of growing importance for professional investors. We contribute to bridging this gap by assessing the price-predictive value of more than nine million tweets on intraday returns of the S&P 500 constituents. For this purpose, we design a machine learning pipeline addressing specific challenges inherent to this task. At first, we engineer domain-specific features along three categories, i.e., directional indicators, relevance indicators and meta features. Next, we leverage a random forest to extract the relationship between these features and subsequent stock returns in a low signal-to-noise setting. For performance evaluation, we run a rigorous eventbased backtesting study across all tweets and stocks. We find annualized returns of 6.4 percent and a Sharpe ratio of 2.2 after transaction costs. Finally, we illuminate the machine learning black box and unveil sources of profitability: First, results are both driven and limited by the temporal clustering of tweets, i.e., the majority of profits stem from tweets clustered closely together in time, corresponding to high-event situations. Second, the importance of included features follows an economic rationale, e.g., tweets with positive sentiment tend to yield positive returns and vice versa. Third, we find that stocks of medium market capitalization and from the consumer and technology sectors contribute most to our results, which we interpret as a trade-off between tweet coverage and tweet relevance.

Suggested Citation

  • Schnaubelt, Matthias & Fischer, Thomas G. & Krauss, Christopher, 2018. "Separating the signal from the noise - financial machine learning for Twitter," FAU Discussion Papers in Economics 14/2018, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
  • Handle: RePEc:zbw:iwqwdp:142018
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/191256/1/1045719498.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," European Journal of Operational Research, Elsevier, vol. 259(2), pages 689-702.
    2. Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
    3. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    4. Evan Gatev & William N. Goetzmann & K. Geert Rouwenhorst, 2006. "Pairs Trading: Performance of a Relative-Value Arbitrage Rule," The Review of Financial Studies, Society for Financial Studies, vol. 19(3), pages 797-827.
    5. Christopher Krauss & Anh Do & Nicolas Huck, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," Post-Print hal-01768895, HAL.
    6. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    7. Paul C. Tetlock & Maytal Saar‐Tsechansky & Sofus Macskassy, 2008. "More Than Words: Quantifying Language to Measure Firms' Fundamentals," Journal of Finance, American Finance Association, vol. 63(3), pages 1437-1467, June.
    8. Leung, Mark T. & Daouk, Hazem & Chen, An-Sing, 2000. "Forecasting stock indices: a comparison of classification and level estimation models," International Journal of Forecasting, Elsevier, vol. 16(2), pages 173-190.
    9. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    10. Timm O. Sprenger & Philipp G. Sandner & Andranik Tumasjan & Isabell M. Welpe, 2014. "News or Noise? Using Twitter to Identify and Understand Company-specific News Flow," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 41(7-8), pages 791-830, September.
    11. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    12. Fama, Eugene F, 1970. "Efficient Capital Markets: A Review of Theory and Empirical Work," Journal of Finance, American Finance Association, vol. 25(2), pages 383-417, May.
    13. Marco Avellaneda & Jeong-Hyun Lee, 2010. "Statistical arbitrage in the US equities market," Quantitative Finance, Taylor & Francis Journals, vol. 10(7), pages 761-782.
    14. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Schnaubelt, Matthias & Fischer, Thomas G. & Krauss, Christopher, 2020. "Separating the signal from the noise – Financial machine learning for Twitter," Journal of Economic Dynamics and Control, Elsevier, vol. 114(C).
    2. Flori, Andrea & Regoli, Daniele, 2021. "Revealing Pairs-trading opportunities with long short-term memory networks," European Journal of Operational Research, Elsevier, vol. 295(2), pages 772-791.
    3. Schnaubelt, Matthias & Seifert, Oleg, 2020. "Valuation ratios, surprises, uncertainty or sentiment: How does financial machine learning predict returns from earnings announcements?," FAU Discussion Papers in Economics 04/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    4. Thomas Günter Fischer & Christopher Krauss & Alexander Deinert, 2019. "Statistical Arbitrage in Cryptocurrency Markets," JRFM, MDPI, vol. 12(1), pages 1-15, February.
    5. Kamaladdin Fataliyev & Aneesh Chivukula & Mukesh Prasad & Wei Liu, 2021. "Stock Market Analysis with Text Data: A Review," Papers 2106.12985, arXiv.org, revised Jul 2021.
    6. Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Digital Finance, Springer, vol. 2(1), pages 1-13, September.
    7. Fischer, Thomas & Krauss, Christopher, 2017. "Deep learning with long short-term memory networks for financial market predictions," FAU Discussion Papers in Economics 11/2017, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    8. Goodell, John W. & Kumar, Satish & Lim, Weng Marc & Pattnaik, Debidutta, 2021. "Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis," Journal of Behavioral and Experimental Finance, Elsevier, vol. 32(C).
    9. Mao, Huina & Counts, Scott & Bollen, Johan, 2015. "Quantifying the effects of online bullishness on international financial markets," Statistics Paper Series 09, European Central Bank.
    10. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    11. Frank, Murray Z. & Sanati, Ali, 2018. "How does the stock market absorb shocks?," Journal of Financial Economics, Elsevier, vol. 129(1), pages 136-153.
    12. Mao, Huina & Counts, Scott & Bollen, Johan, 2015. "Quantifying the effects of online bullishness on international financial markets," Statistics Paper Series 9, European Central Bank.
    13. Ariel Neufeld & Julian Sester & Daiying Yin, 2022. "Detecting data-driven robust statistical arbitrage strategies with deep neural networks," Papers 2203.03179, arXiv.org, revised Feb 2024.
    14. Daniele Ballinari & Simon Behrendt, 2021. "How to gauge investor behavior? A comparison of online investor sentiment measures," Digital Finance, Springer, vol. 3(2), pages 169-204, June.
    15. Huck, Nicolas, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," European Journal of Operational Research, Elsevier, vol. 278(1), pages 330-342.
    16. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    17. Han, Chulwoo & He, Zhaodong & Toh, Alenson Jun Wei, 2023. "Pairs trading via unsupervised learning," European Journal of Operational Research, Elsevier, vol. 307(2), pages 929-947.
    18. Kentaro Imajo & Kentaro Minami & Katsuya Ito & Kei Nakagawa, 2020. "Deep Portfolio Optimization via Distributional Prediction of Residual Factors," Papers 2012.07245, arXiv.org.
    19. Béatrice BOULU-RESHEF & Catherine BRUNEAU & Maxime NICOLAS & Thomas RENAULT, 2022. "An Experimental Analysis of Investor Sentiment," LEO Working Papers / DR LEO 2940, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    20. Ahmad, Khurshid & Han, JingGuang & Hutson, Elaine & Kearney, Colm & Liu, Sha, 2016. "Media-expressed negative tone and firm-level stock returns," Journal of Corporate Finance, Elsevier, vol. 37(C), pages 152-172.

    More about this item

    Keywords

    finance; statistical arbitrage; machine learning; random forests; trading strategy backtesting; social media;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:iwqwdp:142018. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/vierlde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.