IDEAS home Printed from https://ideas.repec.org/a/mup/actaun/actaun_2018066061431.html
   My bibliography  Save this article

Analysis of the Association between Topics in Online Documents and Stock Price Movements

Author

Listed:
  • František Dařena

    (Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 61300 Brno, Czech Republic)

  • Jan Přichystal

    (Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 61300 Brno, Czech Republic)

Abstract

This paper aims at discovering the topics hidden in the newspaper articles that have an impact on movements of stock prices of the corresponding companies. Document topics are characterized by combinations of specific words in documents and are shared across a document collection. We describe the process of discovering the topics, the creation of a mapping of the topics to stock price movements, and quantifying and evaluating the results. As the method for finding and quantifying the association, we use machine learning-based classification. We achieved an accuracy of stock price movement predictions higher than 70 %. A feature selection procedure was applied to the features characterizing the topics in order to facilitate the process of assigning a label to the topic by a human expert.

Suggested Citation

  • František Dařena & Jan Přichystal, 2018. "Analysis of the Association between Topics in Online Documents and Stock Price Movements," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 66(6), pages 1431-1439.
  • Handle: RePEc:mup:actaun:actaun_2018066061431
    DOI: 10.11118/actaun201866061431
    as

    Download full text from publisher

    File URL: http://acta.mendelu.cz/doi/10.11118/actaun201866061431.html
    Download Restriction: free of charge

    File URL: http://acta.mendelu.cz/doi/10.11118/actaun201866061431.pdf
    Download Restriction: free of charge

    File URL: https://libkey.io/10.11118/actaun201866061431?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bukovina, Jaroslav, 2016. "Social media big data and capital markets—An overview," Journal of Behavioral and Experimental Finance, Elsevier, vol. 11(C), pages 18-26.
    2. Siganos, Antonios & Vagenas-Nanos, Evangelos & Verwijmeren, Patrick, 2017. "Divergence of sentiment and stock market trading," Journal of Banking & Finance, Elsevier, vol. 78(C), pages 130-141.
    3. Blau, Benjamin M. & Griffith, Todd G., 2016. "Price clustering and the stability of stock prices," Journal of Business Research, Elsevier, vol. 69(10), pages 3933-3942.
    4. Kearney, Colm & Liu, Sha, 2014. "Textual sentiment in finance: A survey of methods and models," International Review of Financial Analysis, Elsevier, vol. 33(C), pages 171-185.
    5. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    6. Felix Ming Fai Wong & Zhenming Liu & Mung Chiang, 2014. "Stock Market Prediction from WSJ: Text Mining via Sparse Matrix Factorization," Papers 1406.7330, arXiv.org.
    7. S. le Cessie & J. C. van Houwelingen, 1992. "Ridge Estimators in Logistic Regression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(1), pages 191-201, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pavel Netolický & Jonáš Petrovský & František Dařena, 2018. "Text-Mining in Streams of Textual Data Using Time Series Applied to Stock Market," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 66(6), pages 1573-1580.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Frantisek Darena & Jonas Petrovsky & Jan Zizka & Jan Prichystal, 2016. "Analyzing the correlation between online texts and stock price movements at micro-level using machine learning," MENDELU Working Papers in Business and Economics 2016-67, Mendel University in Brno, Faculty of Business and Economics.
    2. Patrick Houlihan & Germán G. Creamer, 2021. "Leveraging Social Media to Predict Continuation and Reversal in Asset Prices," Computational Economics, Springer;Society for Computational Economics, vol. 57(2), pages 433-453, February.
    3. Yan Luo & Linying Zhou, 2020. "Textual tone in corporate financial disclosures: a survey of the literature," International Journal of Disclosure and Governance, Palgrave Macmillan, vol. 17(2), pages 101-110, September.
    4. Bennani, Hamza, 2018. "Media coverage and ECB policy-making: Evidence from an augmented Taylor rule," Journal of Macroeconomics, Elsevier, vol. 57(C), pages 26-38.
    5. David Bholat & Stephen Hans & Pedro Santos & Cheryl Schonhardt-Bailey, 2015. "Text mining for central banks," Handbooks, Centre for Central Banking Studies, Bank of England, number 33, April.
    6. Ahmed, Yousry & Elshandidy, Tamer, 2016. "The effect of bidder conservatism on M&A decisions: Text-based evidence from US 10-K filings," International Review of Financial Analysis, Elsevier, vol. 46(C), pages 176-190.
    7. Muhammad Farhan Malik & Yuan George Shan & Jamie Yixing Tong, 2022. "Do auditors price litigious tone?," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 62(S1), pages 1715-1760, April.
    8. Ahmad, Khurshid & Han, JingGuang & Hutson, Elaine & Kearney, Colm & Liu, Sha, 2016. "Media-expressed negative tone and firm-level stock returns," Journal of Corporate Finance, Elsevier, vol. 37(C), pages 152-172.
    9. Diego F. Téllez & Jesús M. Godoy, 2017. "Mission Power and Firm Financial Performance," Documentos de Trabajo CIEF 15655, Universidad EAFIT.
    10. Al-Nasseri, Alya & Menla Ali, Faek & Tucker, Allan, 2021. "Investor sentiment and the dispersion of stock returns: Evidence based on the social network of investors," International Review of Financial Analysis, Elsevier, vol. 78(C).
    11. Yuting Chen & Don Bredin & Valerio Potì & Roman Matkovskyy, 2022. "COVID risk narratives: a computational linguistic approach to the econometric identification of narrative risk during a pandemic," Digital Finance, Springer, vol. 4(1), pages 17-61, March.
    12. Sun, Andrew & Lachanski, Michael & Fabozzi, Frank J., 2016. "Trade the tweet: Social media text mining and sparse matrix factorization for stock market prediction," International Review of Financial Analysis, Elsevier, vol. 48(C), pages 272-281.
    13. Kumar, Rahul & Deb, Soumya Guha & Mukherjee, Shubhadeep, 2020. "Do words reveal the latent truth? Identifying communication patterns of corporate losers," Journal of Behavioral and Experimental Finance, Elsevier, vol. 26(C).
    14. Picault, Matthieu & Pinter, Julien & Renault, Thomas, 2022. "Media sentiment on monetary policy: Determinants and relevance for inflation expectations," Journal of International Money and Finance, Elsevier, vol. 124(C).
    15. Renato Camodeca & Alex Almici & Umberto Sagliaschi, 2018. "Sustainability Disclosure in Integrated Reporting: Does It Matter to Investors? A Cheap Talk Approach," Sustainability, MDPI, vol. 10(12), pages 1-34, November.
    16. Vegard Høghaug Larsen & Leif Anders Thorsrud, 2022. "Asset returns, news topics, and media effects," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(3), pages 838-868, July.
    17. Shuangyan Li & Guangrui Wang & Yongli Luo, 2022. "Tone of language, financial disclosure, and earnings management: a textual analysis of form 20-F," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-24, December.
    18. Massimo Ferrari Minesso & Frederik Kurcz & Maria Sole Pagliari, 2022. "Do words hurt more than actions? The impact of trade tensions on financial markets," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1138-1159, September.
    19. Picault, Matthieu & Renault, Thomas, 2017. "Words are not all created equal: A new measure of ECB communication," Journal of International Money and Finance, Elsevier, vol. 79(C), pages 136-156.
    20. Ricardo Correa & Keshav Garud & Juan M Londono & Nathan Mislang, 2021. "Sentiment in Central Banks’ Financial Stability Reports," Review of Finance, European Finance Association, vol. 25(1), pages 85-120.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:mup:actaun:actaun_2018066061431. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Ivo Andrle (email available below). General contact details of provider: https://mendelu.cz/en/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.