IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v73y2022i9p1314-1335.html
   My bibliography  Save this article

SEntFiN 1.0: Entity‐aware sentiment analysis for financial news

Author

Listed:
  • Ankur Sinha
  • Satishwar Kedas
  • Rishu Kumar
  • Pekka Malo

Abstract

Fine‐grained financial sentiment analysis on news headlines is a challenging task requiring human‐annotated datasets to achieve high performance. Limited studies have tried to address the sentiment extraction task in a setting where multiple entities are present in a news headline. In an effort to further research in this area, we make publicly available SEntFiN 1.0, a human‐annotated dataset of 10,753 news headlines with entity‐sentiment annotations, of which 2,847 headlines contain multiple entities, often with conflicting sentiments. We augment our dataset with a database of over 1,000 financial entities and their various representations in news media amounting to over 5,000 phrases. We propose a framework that enables the extraction of entity‐relevant sentiments using a feature‐based approach rather than an expression‐based approach. For sentiment extraction, we utilize 12 different learning schemes utilizing lexicon‐based and pretrained sentence representations and five classification approaches. Our experiments indicate that lexicon‐based N‐gram ensembles are above par with pretrained word embedding schemes such as GloVe. Overall, RoBERTa and finBERT (domain‐specific BERT) achieve the highest average accuracy of 94.29% and F1‐score of 93.27%. Further, using over 210,000 entity‐sentiment predictions, we validate the economic effect of sentiments on aggregate market movements over a long duration.

Suggested Citation

  • Ankur Sinha & Satishwar Kedas & Rishu Kumar & Pekka Malo, 2022. "SEntFiN 1.0: Entity‐aware sentiment analysis for financial news," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(9), pages 1314-1335, September.
  • Handle: RePEc:bla:jinfst:v:73:y:2022:i:9:p:1314-1335
    DOI: 10.1002/asi.24634
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.24634
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.24634?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Warner, Jerold B. & Watts, Ross L. & Wruck, Karen H., 1988. "Stock prices and top management changes," Journal of Financial Economics, Elsevier, vol. 20(1-2), pages 461-492, January.
    2. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    3. repec:bla:jfinan:v:59:y:2004:i:3:p:1259-1294 is not listed on IDEAS
    4. Kearney, Colm & Liu, Sha, 2014. "Textual sentiment in finance: A survey of methods and models," International Review of Financial Analysis, Elsevier, vol. 33(C), pages 171-185.
    5. Chambers, Ae & Penman, Sh, 1984. "Timeliness Of Reporting And The Stock-Price Reaction To Earnings Announcements," Journal of Accounting Research, Wiley Blackwell, vol. 22(1), pages 21-47.
    6. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    7. Gabriele Ranco & Darko Aleksovski & Guido Caldarelli & Miha Grčar & Igor Mozetič, 2015. "The Effects of Twitter Sentiment on Stock Price Returns," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-21, September.
    8. Duo Qin, 2011. "Rise Of Var Modelling Approach," Journal of Economic Surveys, Wiley Blackwell, vol. 25(1), pages 156-174, February.
    9. Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Costola, Michele & Hinz, Oliver & Nofer, Michael & Pelizzon, Loriana, 2023. "Machine learning sentiment analysis, COVID-19 news and stock market reactions," Research in International Business and Finance, Elsevier, vol. 64(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sinha, Ankur & Kedas, Satishwar & Kumar, Rishu & Malo, Pekka, 2019. "Buy, Sell or Hold: Entity-Aware Classification of Business News," IIMA Working Papers WP 2019-04-02, Indian Institute of Management Ahmedabad, Research and Publication Department.
    2. Ingrid E. Fisher & Margaret R. Garnsey & Mark E. Hughes, 2016. "Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 23(3), pages 157-214, July.
    3. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    4. Andres Algaba & David Ardia & Keven Bluteau & Samuel Borms & Kris Boudt, 2020. "Econometrics Meets Sentiment: An Overview Of Methodology And Applications," Journal of Economic Surveys, Wiley Blackwell, vol. 34(3), pages 512-547, July.
    5. Wehrheim, Lino, 2021. "The sound of silence: On the (in)visibility of economists in the media," Working Papers 30, German Research Foundation's Priority Programme 1859 "Experience and Expectation. Historical Foundations of Economic Behaviour", Humboldt University Berlin.
    6. Yan Luo & Linying Zhou, 2020. "Textual tone in corporate financial disclosures: a survey of the literature," International Journal of Disclosure and Governance, Palgrave Macmillan, vol. 17(2), pages 101-110, September.
    7. Bennani, Hamza, 2018. "Media coverage and ECB policy-making: Evidence from an augmented Taylor rule," Journal of Macroeconomics, Elsevier, vol. 57(C), pages 26-38.
    8. Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," Finance Research Letters, Elsevier, vol. 62(PB).
    9. David Bholat & Stephen Hans & Pedro Santos & Cheryl Schonhardt-Bailey, 2015. "Text mining for central banks," Handbooks, Centre for Central Banking Studies, Bank of England, number 33, April.
    10. Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2022. "Media-expressed tone, option characteristics, and stock return predictability," Journal of Economic Dynamics and Control, Elsevier, vol. 134(C).
    11. Ahmed, Yousry & Elshandidy, Tamer, 2016. "The effect of bidder conservatism on M&A decisions: Text-based evidence from US 10-K filings," International Review of Financial Analysis, Elsevier, vol. 46(C), pages 176-190.
    12. Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Digital Finance, Springer, vol. 2(1), pages 1-13, September.
    13. Ahmad, Khurshid & Han, JingGuang & Hutson, Elaine & Kearney, Colm & Liu, Sha, 2016. "Media-expressed negative tone and firm-level stock returns," Journal of Corporate Finance, Elsevier, vol. 37(C), pages 152-172.
    14. Yuting Chen & Don Bredin & Valerio Potì & Roman Matkovskyy, 2022. "COVID risk narratives: a computational linguistic approach to the econometric identification of narrative risk during a pandemic," Digital Finance, Springer, vol. 4(1), pages 17-61, March.
    15. Picault, Matthieu & Pinter, Julien & Renault, Thomas, 2022. "Media sentiment on monetary policy: Determinants and relevance for inflation expectations," Journal of International Money and Finance, Elsevier, vol. 124(C).
    16. Renato Camodeca & Alex Almici & Umberto Sagliaschi, 2018. "Sustainability Disclosure in Integrated Reporting: Does It Matter to Investors? A Cheap Talk Approach," Sustainability, MDPI, vol. 10(12), pages 1-34, November.
    17. Vegard Høghaug Larsen & Leif Anders Thorsrud, 2022. "Asset returns, news topics, and media effects," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(3), pages 838-868, July.
    18. David M. Goldberg & Nohel Zaman & Arin Brahma & Mariano Aloiso, 2022. "Are mortgage loan closing delay risks predictable? A predictive analysis using text mining on discussion threads," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(3), pages 419-437, March.
    19. Picault, Matthieu & Renault, Thomas, 2017. "Words are not all created equal: A new measure of ECB communication," Journal of International Money and Finance, Elsevier, vol. 79(C), pages 136-156.
    20. Maciej Wujec, 2021. "Analysis of the Financial Information Contained in the Texts of Current Reports: A Deep Learning Approach," JRFM, MDPI, vol. 14(12), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:73:y:2022:i:9:p:1314-1335. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.