IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2106.12985.html
   My bibliography  Save this paper

Stock Market Analysis with Text Data: A Review

Author

Listed:
  • Kamaladdin Fataliyev
  • Aneesh Chivukula
  • Mukesh Prasad
  • Wei Liu

Abstract

Stock market movements are influenced by public and private information shared through news articles, company reports, and social media discussions. Analyzing these vast sources of data can give market participants an edge to make profit. However, the majority of the studies in the literature are based on traditional approaches that come short in analyzing unstructured, vast textual data. In this study, we provide a review on the immense amount of existing literature of text-based stock market analysis. We present input data types and cover main textual data sources and variations. Feature representation techniques are then presented. Then, we cover the analysis techniques and create a taxonomy of the main stock market forecast models. Importantly, we discuss representative work in each category of the taxonomy, analyzing their respective contributions. Finally, this paper shows the findings on unaddressed open problems and gives suggestions for future work. The aim of this study is to survey the main stock market analysis models, text representation techniques for financial market prediction, shortcomings of existing techniques, and propose promising directions for future research.

Suggested Citation

  • Kamaladdin Fataliyev & Aneesh Chivukula & Mukesh Prasad & Wei Liu, 2021. "Stock Market Analysis with Text Data: A Review," Papers 2106.12985, arXiv.org, revised Jul 2021.
  • Handle: RePEc:arx:papers:2106.12985
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2106.12985
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," European Journal of Operational Research, Elsevier, vol. 259(2), pages 689-702.
    2. Paul C. Tetlock & Maytal Saar‐Tsechansky & Sofus Macskassy, 2008. "More Than Words: Quantifying Language to Measure Firms' Fundamentals," Journal of Finance, American Finance Association, vol. 63(3), pages 1437-1467, June.
    3. Sang Il Lee & Seong Joon Yoo, 2017. "Threshold-Based Portfolio: The Role of the Threshold and Its Applications," Papers 1709.09822, arXiv.org, revised Aug 2018.
    4. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    5. LeBaron, Blake & Arthur, W. Brian & Palmer, Richard, 1999. "Time series properties of an artificial stock market," Journal of Economic Dynamics and Control, Elsevier, vol. 23(9-10), pages 1487-1516, September.
    6. Deepak Gupta & Mahardhika Pratama & Zhenyuan Ma & Jun Li & Mukesh Prasad, 2019. "Financial time series forecasting using twin support vector regression," PLOS ONE, Public Library of Science, vol. 14(3), pages 1-27, March.
    7. Pai, Ping-Feng & Lin, Chih-Sheng, 2005. "A hybrid ARIMA and support vector machines model in stock price forecasting," Omega, Elsevier, vol. 33(6), pages 497-505, December.
    8. De Long, J Bradford & Andrei Shleifer & Lawrence H. Summers & Robert J. Waldmann, 1990. "Noise Trader Risk in Financial Markets," Journal of Political Economy, University of Chicago Press, vol. 98(4), pages 703-738, August.
    9. Zhang, Ningning & Lin, Aijing & Shang, Pengjian, 2017. "Multidimensional k-nearest neighbor model based on EEMD for financial time series forecasting," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 477(C), pages 161-173.
    10. Fuli Feng & Huimin Chen & Xiangnan He & Ji Ding & Maosong Sun & Tat-Seng Chua, 2018. "Enhancing Stock Movement Prediction with Adversarial Training," Papers 1810.09936, arXiv.org, revised Jun 2019.
    11. Xi Zhang & Yunjia Zhang & Senzhang Wang & Yuntao Yao & Binxing Fang & Philip S. Yu, 2018. "Improving Stock Market Prediction via Heterogeneous Information Fusion," Papers 1801.00588, arXiv.org.
    12. Christopher Krauss & Anh Do & Nicolas Huck, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," Post-Print hal-01768895, HAL.
    13. Kizilaslan, Recep & Freund, Steven & Iseri, Ali, 2016. "A data analytic approach to forecasting daily stock returns in an emerging marketAuthor-Name: Oztekin, Asil," European Journal of Operational Research, Elsevier, vol. 253(3), pages 697-710.
    14. Zhiqiang Guo & Huaiqing Wang & Quan Liu & Jie Yang, 2014. "A Feature Fusion Based Forecasting Model for Financial Time Series," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-13, June.
    15. Franses, Philip Hans & Ghijsels, Hendrik, 1999. "Additive outliers, GARCH and forecasting volatility," International Journal of Forecasting, Elsevier, vol. 15(1), pages 1-9, February.
    16. Feng Li, 2010. "The Information Content of Forward‐Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach," Journal of Accounting Research, Wiley Blackwell, vol. 48(5), pages 1049-1102, December.
    17. Herwartz, Helmut, 2017. "Stock return prediction under GARCH — An empirical assessment," International Journal of Forecasting, Elsevier, vol. 33(3), pages 569-580.
    18. Sarantis, Nicholas, 2001. "Nonlinearities, cyclical behaviour and predictability in stock markets: international evidence," International Journal of Forecasting, Elsevier, vol. 17(3), pages 459-482.
    19. Rounaghi, Mohammad Mahdi & Nassir Zadeh, Farzaneh, 2016. "Investigation of market efficiency and Financial Stability between S&P 500 and London Stock Exchange: Monthly and yearly Forecasting of Time Series Stock Returns using ARMA model," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 456(C), pages 10-21.
    20. Dev Shah & Haruna Isah & Farhana Zulkernine, 2019. "Stock Market Analysis: A Review and Taxonomy of Prediction Techniques," IJFS, MDPI, vol. 7(2), pages 1-22, May.
    21. Kristin M. Tolle & Hsinchun Chen, 2000. "Comparing noun phrasing techniques for use with medical digital library tools," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 51(4), pages 352-370.
    22. Balakrishnan, Ramji & Qiu, Xin Ying & Srinivasan, Padmini, 2010. "On the predictive ability of narrative disclosures in annual reports," European Journal of Operational Research, Elsevier, vol. 202(3), pages 789-801, May.
    23. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Omer Berat Sezer & Mehmet Ugur Gudelek & Ahmet Murat Ozbayoglu, 2019. "Financial Time Series Forecasting with Deep Learning : A Systematic Literature Review: 2005-2019," Papers 1911.13288, arXiv.org.
    2. Schnaubelt, Matthias & Fischer, Thomas G. & Krauss, Christopher, 2018. "Separating the signal from the noise - financial machine learning for Twitter," FAU Discussion Papers in Economics 14/2018, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    3. Ahmet Murat Ozbayoglu & Mehmet Ugur Gudelek & Omer Berat Sezer, 2020. "Deep Learning for Financial Applications : A Survey," Papers 2002.05786, arXiv.org.
    4. Schnaubelt, Matthias & Fischer, Thomas G. & Krauss, Christopher, 2020. "Separating the signal from the noise – Financial machine learning for Twitter," Journal of Economic Dynamics and Control, Elsevier, vol. 114(C).
    5. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    6. David F. Larcker & Anastasia A. Zakolyukina, 2012. "Detecting Deceptive Discussions in Conference Calls," Journal of Accounting Research, Wiley Blackwell, vol. 50(2), pages 495-540, May.
    7. Ingrid E. Fisher & Margaret R. Garnsey & Mark E. Hughes, 2016. "Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 23(3), pages 157-214, July.
    8. Kim, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2020. "Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting," European Journal of Operational Research, Elsevier, vol. 283(1), pages 217-234.
    9. Uddin, Ajim & Yu, Dantong, 2020. "Latent factor model for asset pricing," Journal of Behavioral and Experimental Finance, Elsevier, vol. 27(C).
    10. Liu, Jun & Wu, Kai & Zhou, Ming, 2023. "News tone, investor sentiment, and liquidity premium," International Review of Economics & Finance, Elsevier, vol. 84(C), pages 167-181.
    11. Lukas Ryll & Sebastian Seidens, 2019. "Evaluating the Performance of Machine Learning Algorithms in Financial Market Forecasting: A Comprehensive Survey," Papers 1906.07786, arXiv.org, revised Jul 2019.
    12. Kolesnikova, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2019. "Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting," IRTG 1792 Discussion Papers 2019-023, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    13. Richard Frankel & Jared Jennings & Joshua Lee, 2022. "Disclosure Sentiment: Machine Learning vs. Dictionary Methods," Management Science, INFORMS, vol. 68(7), pages 5514-5532, July.
    14. Manogna R L & Aswini Kumar Mishra, 2021. "Forecasting spot prices of agricultural commodities in India: Application of deep‐learning models," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 28(1), pages 72-83, January.
    15. Kraus, Mathias & Feuerriegel, Stefan & Oztekin, Asil, 2020. "Deep learning in business analytics and operations research: Models, applications and managerial implications," European Journal of Operational Research, Elsevier, vol. 281(3), pages 628-641.
    16. Schnaubelt, Matthias & Seifert, Oleg, 2020. "Valuation ratios, surprises, uncertainty or sentiment: How does financial machine learning predict returns from earnings announcements?," FAU Discussion Papers in Economics 04/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    17. Yan Luo & Linying Zhou, 2020. "Textual tone in corporate financial disclosures: a survey of the literature," International Journal of Disclosure and Governance, Palgrave Macmillan, vol. 17(2), pages 101-110, September.
    18. Caporale, Guglielmo Maria & Spagnolo, Fabio & Spagnolo, Nicola, 2016. "Macro news and stock returns in the Euro area: A VAR-GARCH-in-mean analysis," International Review of Financial Analysis, Elsevier, vol. 45(C), pages 180-188.
    19. Yang-Cheng Lu & Yu-Chen Wei, 2013. "The Chinese News Sentiment around Earnings Announcements," Journal for Economic Forecasting, Institute for Economic Forecasting, vol. 0(3), pages 44-58, October.
    20. Ahmed, Yousry & Elshandidy, Tamer, 2016. "The effect of bidder conservatism on M&A decisions: Text-based evidence from US 10-K filings," International Review of Financial Analysis, Elsevier, vol. 46(C), pages 176-190.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2106.12985. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.