Separating the signal from the noise – Financial machine learning for Twitter

My bibliography Save this article

Separating the signal from the noise – Financial machine learning for Twitter

Author

Listed:

Schnaubelt, Matthias
Fischer, Thomas G.
Krauss, Christopher

Registered:

Abstract

Most statistical arbitrage strategies in the academic literature solely rely on price time series. By contrast, alternative data sources are of growing importance for professional investors. We contribute to bridging this gap by assessing the price-predictive value of millions of tweets on intraday returns of the S&P 500 constituents from 2014 and 2015. For this purpose, we design a machine learning system addressing specific challenges inherent to this task. At first, building on the literature of financial dictionaries, we engineer domain-specific features along three categories, i.e., directional indicators, relevance indicators and meta features. Next, we leverage a random forest to extract the relationship between these features and subsequent stock returns in a low signal-to-noise setting. For performance evaluation, we run a rigorous event-based backtesting study across all tweets and stocks. We find annualized returns of 6.4 percent and a Sharpe ratio of 2.2 after transaction costs. Finally, we illuminate the machine learning black box and unveil sources of profitability: First, results are both driven and limited by the temporal clustering of tweets, i.e., the majority of profits stem from tweets clustered closely together in time, corresponding to high-event situations. Second, the importance of included features follows an economic rationale, e.g., tweets with positive sentiment tend to yield positive returns and vice versa. Third, we find that stocks of medium market capitalization and from the consumer and technology sectors contribute most to our results, which we interpret as a trade-off between tweet coverage and tweet relevance.

Suggested Citation

Schnaubelt, Matthias & Fischer, Thomas G. & Krauss, Christopher, 2020. "Separating the signal from the noise – Financial machine learning for Twitter," Journal of Economic Dynamics and Control, Elsevier, vol. 114(C).

Handle: RePEc:eee:dyncon:v:114:y:2020:i:c:s0165188920300634
DOI: 10.1016/j.jedc.2020.103895

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Teresa L. Ju & Yao Chin Lin & Nhu-Hang Ha, 2014. "Proactive Assessment for Collaboration Success," SAGE Open, , vol. 4(3), pages 21582440145, July.
Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," European Journal of Operational Research, Elsevier, vol. 259(2), pages 689-702.
- Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2016. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," FAU Discussion Papers in Economics 03/2016, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
- Christopher Krauss & Xuan Anh Do & Nicolas Huck, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," Post-Print hal-01515120, HAL.
Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
Xu, Wei & Chen, Yuehuan & Coleman, Conrad & Coleman, Thomas F., 2018. "Moment matching machine learning methods for risk management of large variable annuity portfolios," Journal of Economic Dynamics and Control, Elsevier, vol. 87(C), pages 1-20.
Thomas Günter Fischer & Christopher Krauss & Alexander Deinert, 2019. "Statistical Arbitrage in Cryptocurrency Markets," JRFM, MDPI, vol. 12(1), pages 1-15, February.
Bekiros, Stelios D., 2010. "Heterogeneous trading strategies with adaptive fuzzy Actor-Critic reinforcement learning: A behavioral approach," Journal of Economic Dynamics and Control, Elsevier, vol. 34(6), pages 1153-1170, June.
Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
Evan Gatev & William N. Goetzmann & K. Geert Rouwenhorst, 2006. "Pairs Trading: Performance of a Relative-Value Arbitrage Rule," Review of Financial Studies, Society for Financial Studies, vol. 19(3), pages 797-827.
- Evan Gatev & William N. Goetzmann & K. Geert Rouwenhorst, 1998. "Pairs Trading: Performance of a Relative Value Arbitrage Rule," Yale School of Management Working Papers ysm26, Yale School of Management.
- William Goetzmann & Evan g. Gatev & K. Geert Rouwenhorst, 1998. "Pairs Trading: Performance of a Relative Value Arbitrage Rule," Yale School of Management Working Papers ysm3, Yale School of Management.
- William N. Goetzmann & Evan Geov Gatev & K. Geert Rouwenhorst, 1998. "Pairs Trading: Performance of a Relative Value Arbitrage Rule," Yale School of Management Working Papers ysm109, Yale School of Management.
- Evan G. Gatev & William N. Goetzmann & K. Geert Rouwenhorst, 1999. "Pairs Trading: Performance of a Relative Value Arbitrage Rule," NBER Working Papers 7032, National Bureau of Economic Research, Inc.
Christopher Krauss & Anh Do & Nicolas Huck, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," Post-Print hal-01768895, HAL.
Jegadeesh, Narasimhan & Livnat, Joshua, 2006. "Revenue surprises and stock returns," Journal of Accounting and Economics, Elsevier, vol. 41(1-2), pages 147-171, April.
Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
Paul C. Tetlock & Maytal Saar‐Tsechansky & Sofus Macskassy, 2008. "More Than Words: Quantifying Language to Measure Firms' Fundamentals," Journal of Finance, American Finance Association, vol. 63(3), pages 1437-1467, June.
Matthew Gentzkow & Bryan Kelly & Matt Taddy, 2019. "Text as Data," Journal of Economic Literature, American Economic Association, vol. 57(3), pages 535-574, September.
Leung, Mark T. & Daouk, Hazem & Chen, An-Sing, 2000. "Forecasting stock indices: a comparison of classification and level estimation models," International Journal of Forecasting, Elsevier, vol. 16(2), pages 173-190.
Huck, Nicolas, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," European Journal of Operational Research, Elsevier, vol. 278(1), pages 330-342.
Zheng Tracy Ke & Bryan T. Kelly & Dacheng Xiu, 2019. "Predicting Returns With Text Data," NBER Working Papers 26186, National Bureau of Economic Research, Inc.
Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
Timm O. Sprenger & Philipp G. Sandner & Andranik Tumasjan & Isabell M. Welpe, 2014. "News or Noise? Using Twitter to Identify and Understand Company-specific News Flow," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 41(7-8), pages 791-830, September.
Clifford S. Asness & Tobias J. Moskowitz & Lasse Heje Pedersen, 2013. "Value and Momentum Everywhere," Journal of Finance, American Finance Association, vol. 68(3), pages 929-985, June.
Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
Nicolas Huck, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," Post-Print hal-02143971, HAL.
Fama, Eugene F, 1970. "Efficient Capital Markets: A Review of Theory and Empirical Work," Journal of Finance, American Finance Association, vol. 25(2), pages 383-417, May.
Marco Avellaneda & Jeong-Hyun Lee, 2010. "Statistical arbitrage in the US equities market," Quantitative Finance, Taylor & Francis Journals, vol. 10(7), pages 761-782.
Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
Allen H. Huang & Reuven Lehavy & Amy Y. Zang & Rong Zheng, 2018. "Analyst Information Discovery and Interpretation Roles: A Topic Modeling Approach," Management Science, INFORMS, vol. 64(6), pages 2833-2855, June.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Frank, Johannes, 2023. "Forecasting realized volatility in turbulent times using temporal fusion transformers," FAU Discussion Papers in Economics 03/2023, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
Schnaubelt, Matthias, 2022. "Deep reinforcement learning for the optimal placement of cryptocurrency limit orders," European Journal of Operational Research, Elsevier, vol. 296(3), pages 993-1006.
Thomas Dierckx & Jesse Davis & Wim Schoutens, 2022. "Nowcasting Stock Implied Volatility with Twitter," Papers 2301.00248, arXiv.org.
Xiaohong Shen & Gaoshan Wang & Yue Wang & Alfred Peris, 2021. "The Influence of Research Reports on Stock Returns: The Mediating Effect of Machine-Learning-Based Investor Sentiment," Discrete Dynamics in Nature and Society, Hindawi, vol. 2021, pages 1-14, December.
Schnaubelt, Matthias & Seifert, Oleg, 2020. "Valuation ratios, surprises, uncertainty or sentiment: How does financial machine learning predict returns from earnings announcements?," FAU Discussion Papers in Economics 04/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
Herrera, Gabriel Paes & Constantino, Michel & Su, Jen-Je & Naranpanawa, Athula, 2022. "Renewable energy stocks forecast using Twitter investor sentiment and deep learning," Energy Economics, Elsevier, vol. 114(C).

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Schnaubelt, Matthias & Fischer, Thomas G. & Krauss, Christopher, 2018. "Separating the signal from the noise - financial machine learning for Twitter," FAU Discussion Papers in Economics 14/2018, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
Schnaubelt, Matthias & Seifert, Oleg, 2020. "Valuation ratios, surprises, uncertainty or sentiment: How does financial machine learning predict returns from earnings announcements?," FAU Discussion Papers in Economics 04/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
Flori, Andrea & Regoli, Daniele, 2021. "Revealing Pairs-trading opportunities with long short-term memory networks," European Journal of Operational Research, Elsevier, vol. 295(2), pages 772-791.
Fabian Waldow & Matthias Schnaubelt & Christopher Krauss & Thomas Günter Fischer, 2021. "Machine Learning in Futures Markets," JRFM, MDPI, vol. 14(3), pages 1-14, March.
Han, Chulwoo & He, Zhaodong & Toh, Alenson Jun Wei, 2023. "Pairs trading via unsupervised learning," European Journal of Operational Research, Elsevier, vol. 307(2), pages 929-947.
Alexander Jakob Dautel & Wolfgang Karl Härdle & Stefan Lessmann & Hsin-Vonn Seow, 2020. "Forex exchange rate forecasting using deep recurrent neural networks," Digital Finance, Springer, vol. 2(1), pages 69-96, September.
- Dautel, Alexander J. & Härdle, Wolfgang Karl & Lessmann, Stefan & Seow, Hsin-Vonn, 2019. "Forex Exchange Rate Forecasting Using Deep Recurrent Neural Networks," IRTG 1792 Discussion Papers 2019-008, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
- Dautel, Alexander Jakob & Härdle, Wolfgang Karl & Lessmann, Stefan & Seow, Hsin-Vonn, 2020. "Forex exchange rate forecasting using deep recurrent neural networks," IRTG 1792 Discussion Papers 2020-006, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
Erdinc Akyildirim & Ahmet Goncu & Alper Hekimoglu & Duc Khuong Nguyen & Ahmet Sensoy, 2023. "Statistical arbitrage: factor investing approach," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(4), pages 1295-1331, December.
- Akyildirim, Erdinc & Goncu, Ahmet & Hekimoglu, Alper & Nguyen, Duc Khuong & Sensoy, Ahmet, 2021. "Statistical arbitrage: Factor investing approach," MPRA Paper 105766, University Library of Munich, Germany.
- Erdinc Akyildirim & Ahmet Goncu & Alper Hekimoglu & Duc Khuong Nguyen & Ahmet Sensoy, 2021. "Statistical Arbitrage: Factor Investing Approach," Working Papers 2021-003, Department of Research, Ipag Business School.
Charles W. Calomiris & Nida Çakır Melek & Harry Mamaysky, 2021. "Predicting the Oil Market," NBER Working Papers 29379, National Bureau of Economic Research, Inc.
Thomas Günter Fischer & Christopher Krauss & Alexander Deinert, 2019. "Statistical Arbitrage in Cryptocurrency Markets," JRFM, MDPI, vol. 12(1), pages 1-15, February.
Rubesam, Alexandre, 2022. "Machine learning portfolios with equal risk contributions: Evidence from the Brazilian market," Emerging Markets Review, Elsevier, vol. 51(PB).
- Alexandre Rubesam, 2022. "Machine learning portfolios with equal risk contributions: Evidence from the Brazilian market," Post-Print hal-03707365, HAL.
Kasper Johansson & Thomas Schmelzer & Stephen Boyd, 2024. "Finding Moving-Band Statistical Arbitrages via Convex-Concave Optimization," Papers 2402.08108, arXiv.org.
Kamaladdin Fataliyev & Aneesh Chivukula & Mukesh Prasad & Wei Liu, 2021. "Stock Market Analysis with Text Data: A Review," Papers 2106.12985, arXiv.org, revised Jul 2021.
Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Digital Finance, Springer, vol. 2(1), pages 1-13, September.
- Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-03205149, HAL.
- Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Post-Print hal-03205149, HAL.
Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
- Thomas Renault, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Post-Print hal-03205113, HAL.
Rama Cont & Mihai Cucuringu & Chao Zhang, 2021. "Cross-Impact of Order Flow Imbalance in Equity Markets," Papers 2112.13213, arXiv.org, revised Jun 2023.
Simon Fritzsch & Philipp Scharner & Gregor Weiß, 2021. "Estimating the relation between digitalization and the market value of insurers," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 88(3), pages 529-567, September.
Kolesnikova, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2019. "Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting," IRTG 1792 Discussion Papers 2019-023, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
Mao, Huina & Counts, Scott & Bollen, Johan, 2015. "Quantifying the effects of online bullishness on international financial markets," Statistics Paper Series 9, European Central Bank.
Ardia, David & Bluteau, Keven & Boudt, Kris, 2022. "Media abnormal tone, earnings announcements, and the stock market," Journal of Financial Markets, Elsevier, vol. 61(C).
- David Ardia & Keven Bluteau & Kris Boudt, 2021. "Media abnormal tone, earnings announcements, and the stock market," Papers 2110.10800, arXiv.org.

More about this item

Keywords

Finance; Statistical arbitrage; Machine learning; Natural language processing;
All these keywords.

JEL classification:

C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
G11 - Financial Economics - - General Financial Markets - - - Portfolio Choice; Investment Decisions
G14 - Financial Economics - - General Financial Markets - - - Information and Market Efficiency; Event Studies; Insider Trading
G17 - Financial Economics - - General Financial Markets - - - Financial Forecasting and Simulation

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:dyncon:v:114:y:2020:i:c:s0165188920300634. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/jedc .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Separating the signal from the noise – Financial machine learning for Twitter

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

JEL classification:

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data