IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2201.01770.html
   My bibliography  Save this paper

NumHTML: Numeric-Oriented Hierarchical Transformer Model for Multi-task Financial Forecasting

Author

Listed:
  • Linyi Yang
  • Jiazheng Li
  • Ruihai Dong
  • Yue Zhang
  • Barry Smyth

Abstract

Financial forecasting has been an important and active area of machine learning research because of the challenges it presents and the potential rewards that even minor improvements in prediction accuracy or forecasting may entail. Traditionally, financial forecasting has heavily relied on quantitative indicators and metrics derived from structured financial statements. Earnings conference call data, including text and audio, is an important source of unstructured data that has been used for various prediction tasks using deep earning and related approaches. However, current deep learning-based methods are limited in the way that they deal with numeric data; numbers are typically treated as plain-text tokens without taking advantage of their underlying numeric structure. This paper describes a numeric-oriented hierarchical transformer model to predict stock returns, and financial risk using multi-modal aligned earnings calls data by taking advantage of the different categories of numbers (monetary, temporal, percentages etc.) and their magnitude. We present the results of a comprehensive evaluation of NumHTML against several state-of-the-art baselines using a real-world publicly available dataset. The results indicate that NumHTML significantly outperforms the current state-of-the-art across a variety of evaluation metrics and that it has the potential to offer significant financial gains in a practical trading context.

Suggested Citation

  • Linyi Yang & Jiazheng Li & Ruihai Dong & Yue Zhang & Barry Smyth, 2022. "NumHTML: Numeric-Oriented Hierarchical Transformer Model for Multi-task Financial Forecasting," Papers 2201.01770, arXiv.org.
  • Handle: RePEc:arx:papers:2201.01770
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2201.01770
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. David F. Larcker & Anastasia A. Zakolyukina, 2012. "Detecting Deceptive Discussions in Conference Calls," Journal of Accounting Research, Wiley Blackwell, vol. 50(2), pages 495-540, May.
    2. Bollerslev, Tim & Patton, Andrew J. & Quaedvlieg, Rogier, 2016. "Exploiting the errors: A simple approach for improved volatility forecasting," Journal of Econometrics, Elsevier, vol. 192(1), pages 1-18.
    3. Pitkäjärvi, Aleksi & Suominen, Matti & Vaittinen, Lauri, 2020. "Cross-asset signals and time series momentum," Journal of Financial Economics, Elsevier, vol. 136(1), pages 63-85.
    4. Zhi Da & Joseph Engelberg & Pengjie Gao, 2015. "Editor's Choice The Sum of All FEARS Investor Sentiment and Asset Prices," The Review of Financial Studies, Society for Financial Studies, vol. 28(1), pages 1-32.
    5. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    6. Moskowitz, Tobias J. & Ooi, Yao Hua & Pedersen, Lasse Heje, 2012. "Time series momentum," Journal of Financial Economics, Elsevier, vol. 104(2), pages 228-250.
    7. Manela, Asaf & Moreira, Alan, 2017. "News implied volatility and disaster concerns," Journal of Financial Economics, Elsevier, vol. 123(1), pages 137-162.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kelvin J. L. Koa & Yunshan Ma & Ritchie Ng & Tat-Seng Chua, 2023. "Diffusion Variational Autoencoder for Tackling Stochasticity in Multi-Step Regression Stock Price Prediction," Papers 2309.00073, arXiv.org, revised Oct 2023.
    2. Linyi Yang & Yingpeng Ma & Yue Zhang, 2023. "Measuring Consistency in Text-based Financial Forecasting Models," Papers 2305.08524, arXiv.org, revised Jun 2023.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Obaid, Khaled & Pukthuanthong, Kuntara, 2022. "A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news," Journal of Financial Economics, Elsevier, vol. 144(1), pages 273-297.
    2. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    3. Su, Zhi & Lu, Man & Yin, Libo, 2018. "Oil prices and news-based uncertainty: Novel evidence," Energy Economics, Elsevier, vol. 72(C), pages 331-340.
    4. Guofu Zhou, 2018. "Measuring Investor Sentiment," Annual Review of Financial Economics, Annual Reviews, vol. 10(1), pages 239-259, November.
    5. Matthew Bamber & Santhosh Abraham, 2020. "On the “Realities” of Investor‐Manager Interactivity: Baudrillard, Hyperreality, and Management Q&A Sessions†," Contemporary Accounting Research, John Wiley & Sons, vol. 37(2), pages 1290-1325, June.
    6. Chen, Yuan & Han, Dongmei & Zhou, Xiaofeng, 2023. "Mining the emotional information in the audio of earnings conference calls : A deep learning approach for sentiment analysis of securities analysts' follow-up behavior," International Review of Financial Analysis, Elsevier, vol. 88(C).
    7. Yang Liu & Liyan Han & Libo Yin, 2018. "Does news uncertainty matter for commodity futures markets? Heterogeneity in energy and non‐energy sectors," Journal of Futures Markets, John Wiley & Sons, Ltd., vol. 38(10), pages 1246-1261, October.
    8. Müller, Karsten, 2020. "German forecasters' narratives: How informative are German business cycle forecast reports?," Working Papers 23, German Research Foundation's Priority Programme 1859 "Experience and Expectation. Historical Foundations of Economic Behaviour", Humboldt University Berlin.
    9. Eghbal Rahimikia & Stefan Zohren & Ser-Huang Poon, 2021. "Realised Volatility Forecasting: Machine Learning via Financial Word Embedding," Papers 2108.00480, arXiv.org, revised Mar 2023.
    10. Himounet, Nicolas, 2022. "Searching the nature of uncertainty: Macroeconomic and financial risks VS geopolitical and pandemic risks," International Economics, Elsevier, vol. 170(C), pages 1-31.
    11. Dim, Chukwuma & Koerner, Kevin & Wolski, Marcin & Zwart, Sanne, 2022. "Hot off the press: News-implied sovereign default risk," EIB Working Papers 2022/06, European Investment Bank (EIB).
    12. Liebmann, Michael & Orlov, Alexei G. & Neumann, Dirk, 2016. "The tone of financial news and the perceptions of stock and CDS traders," International Review of Financial Analysis, Elsevier, vol. 46(C), pages 159-175.
    13. Sihvonen, Markus, 2021. "Yield curve momentum," Research Discussion Papers 15/2021, Bank of Finland.
    14. Al-Nasseri, Alya & Menla Ali, Faek & Tucker, Allan, 2021. "Investor sentiment and the dispersion of stock returns: Evidence based on the social network of investors," International Review of Financial Analysis, Elsevier, vol. 78(C).
    15. Xi Fu & Xiaoxi Wu & Zhifang Zhang, 2021. "The Information Role of Earnings Conference Call Tone: Evidence from Stock Price Crash Risk," Journal of Business Ethics, Springer, vol. 173(3), pages 643-660, October.
    16. Bannier, Christina E. & Pauls, Thomas & Walter, Andreas, 2017. "CEO-speeches and stock returns," VfS Annual Conference 2017 (Vienna): Alternative Structures for Money and Banking 168192, Verein für Socialpolitik / German Economic Association.
    17. James P. Ryans, 2021. "Textual classification of SEC comment letters," Review of Accounting Studies, Springer, vol. 26(1), pages 37-80, March.
    18. Liu, Yuanyuan & Niu, Zibo & Suleman, Muhammad Tahir & Yin, Libo & Zhang, Hongwei, 2022. "Forecasting the volatility of crude oil futures: The role of oil investor attention and its regime switching characteristics under a high-frequency framework," Energy, Elsevier, vol. 238(PA).
    19. Fraiberger, Samuel P. & Lee, Do & Puy, Damien & Ranciere, Romain, 2021. "Media sentiment and international asset prices," Journal of International Economics, Elsevier, vol. 133(C).
    20. Alejandro Bernales & Marcela Valenzuela & Ilknur Zer, 2023. "Effects of Information Overload on Financial Markets: How Much Is Too Much?," International Finance Discussion Papers 1372, Board of Governors of the Federal Reserve System (U.S.).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2201.01770. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.