IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2508.07408.html
   My bibliography  Save this paper

Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading

Author

Listed:
  • Yueyi Wang
  • Qiyao Wei

Abstract

In this study, we wish to showcase the unique utility of large language models (LLMs) in financial semantic annotation and alpha signal discovery. Leveraging a corpus of company-related tweets, we use an LLM to automatically assign multi-label event categories to high-sentiment-intensity tweets. We align these labeled sentiment signals with forward returns over 1-to-7-day horizons to evaluate their statistical efficacy and market tradability. Our experiments reveal that certain event labels consistently yield negative alpha, with Sharpe ratios as low as -0.38 and information coefficients exceeding 0.05, all statistically significant at the 95\% confidence level. This study establishes the feasibility of transforming unstructured social media text into structured, multi-label event variables. A key contribution of this work is its commitment to transparency and reproducibility; all code and methodologies are made publicly available. Our results provide compelling evidence that social media sentiment is a valuable, albeit noisy, signal in financial forecasting and underscore the potential of open-source frameworks to democratize algorithmic trading research.

Suggested Citation

  • Yueyi Wang & Qiyao Wei, 2025. "Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading," Papers 2508.07408, arXiv.org.
  • Handle: RePEc:arx:papers:2508.07408
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2508.07408
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Zheng Tracy Ke & Bryan T. Kelly & Dacheng Xiu, 2019. "Predicting Returns With Text Data," NBER Working Papers 26186, National Bureau of Economic Research, Inc.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eghbal Rahimikia & Stefan Zohren & Ser-Huang Poon, 2021. "Realised Volatility Forecasting: Machine Learning via Financial Word Embedding," Papers 2108.00480, arXiv.org, revised Nov 2024.
    2. Shunyao Wang & Ming Cheng & Christina Dan Wang, 2025. "NewsNet-SDF: Stochastic Discount Factor Estimation with Pretrained Language Model News Embeddings via Adversarial Networks," Papers 2505.06864, arXiv.org.
    3. Thanos Konstantinidis & Giorgos Iacovides & Mingxue Xu & Tony G. Constantinides & Danilo Mandic, 2024. "FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications," Papers 2403.12285, arXiv.org.
    4. Xiao-Yang Liu & Guoxuan Wang & Hongyang Yang & Daochen Zha, 2023. "FinGPT: Democratizing Internet-scale Data for Financial Large Language Models," Papers 2307.10485, arXiv.org, revised Nov 2023.
    5. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    6. Mardoqueo Arteaga, 2024. "Credit market expectations and the business cycle: evidence from a textual analysis approach," Economics Bulletin, AccessEcon, vol. 44(3), pages 1242-1253.
    7. Massimo Ferrari Minesso & Laura Lebastard & Helena Mezo, 2023. "Text-Based Recession Probabilities," IMF Economic Review, Palgrave Macmillan;International Monetary Fund, vol. 71(2), pages 415-438, June.
    8. Ge, S., 2020. "Text-Based Linkages and Local Risk Spillovers in the Equity Market," Cambridge Working Papers in Economics 20115, Faculty of Economics, University of Cambridge.
    9. Massimo Ferrari Minesso & Frederik Kurcz & Maria Sole Pagliari, 2022. "Do words hurt more than actions? The impact of trade tensions on financial markets," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1138-1159, September.
    10. Ge, Shuyi & Li, Shaoran & Linton, Oliver, 2023. "News-implied linkages and local dependency in the equity market," Journal of Econometrics, Elsevier, vol. 235(2), pages 779-815.
    11. Qinkai Chen, 2021. "Stock Movement Prediction with Financial News using Contextualized Embedding from BERT," Papers 2107.08721, arXiv.org.
    12. Aysan, Ahmet Faruk & Caporin, Massimiliano & Cepni, Oguzhan, 2024. "Not all words are equal: Sentiment and jumps in the cryptocurrency market," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 91(C).
    13. Ge, Shuyi & Li, Shaoran & Zheng, Hanyu, 2025. "Diamond cuts diamond: News co-mention momentum spillover prevails in China," Journal of Banking & Finance, Elsevier, vol. 171(C).
    14. Marie Bessec & Julien Fouquau, 2024. "A Green Wave in Media: A Change of Tack in Stock Markets," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 86(5), pages 1026-1057, October.
    15. Luiz Renato Lima & Lucas Lúcio Godeiro, 2023. "Equity‐premium prediction: Attention is all you need," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(1), pages 105-122, January.
    16. Stéphane Goutte & Viet Hoang Le & Fei Liu & Hans-Jörg Mettenheim, Von, 2023. "Esg Investing: A Sentiment Analysis Approach," Working Papers halshs-03917335, HAL.
    17. Brière, Marie & Huynh, Karen & Laudy, Olav & Pouget, Sébastien, 2023. "Stock market reaction to news: Do tense and horizon matter?," Finance Research Letters, Elsevier, vol. 58(PD).
    18. Hansen, Stephen & Davis, Steven & Seminario-Amez, Cristhian, 2020. "Firm-level Risk Exposures and Stock Returns in the Wake of COVID-19," CEPR Discussion Papers 15314, C.E.P.R. Discussion Papers.
    19. Paul M. Anglin & Yanmin Gao, 2023. "Value of Communication and Social Media: An Equilibrium Theory of Messaging," The Journal of Real Estate Finance and Economics, Springer, vol. 66(4), pages 861-903, May.
    20. Mengda Li & Charles-Albert Lehalle, 2021. "Do Word Embeddings Really Understand Loughran-McDonald's Polarities?," Papers 2103.09813, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2508.07408. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.