IDEAS home Printed from https://ideas.repec.org/a/eee/intfor/v42y2026i3p752-773.html

Beyond news headlines and TF-IDF: Enhancing text-based forecasting models with validated collocations and improved attention

Author

Listed:
  • Abeyie, Gabriel Appau

Abstract

This paper proposes a method to improve text-based forecasting models, specifically for crude oil prices. Utilizing advanced techniques, including pattern validation and attention mechanisms, the study demonstrates notable improvements in predictive power over traditional approaches. A key finding is that considering the full text of news articles, rather than limiting the analysis to headlines, yields significant gains in forecasting accuracy. Furthermore, the model featuring verb-noun and noun-verb collocation pattern validation consistently outperforms benchmarks and models based solely on news headlines across various forecasting horizons. The results suggest that the presence of collocations such as ‘price fell’, ‘prices tumbled’, and ‘price dropped’ in crude-oil-related news articles is associated with lower oil price returns. Additionally, integrating macroeconomic data with text-based features enhances predictive performance, demonstrating that combining structured economic indicators with textual features improves forecasting accuracy.

Suggested Citation

  • Abeyie, Gabriel Appau, 2026. "Beyond news headlines and TF-IDF: Enhancing text-based forecasting models with validated collocations and improved attention," International Journal of Forecasting, Elsevier, vol. 42(3), pages 752-773.
  • Handle: RePEc:eee:intfor:v:42:y:2026:i:3:p:752-773
    DOI: 10.1016/j.ijforecast.2025.09.002
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016920702500086X
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ijforecast.2025.09.002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:intfor:v:42:y:2026:i:3:p:752-773. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/ijforecast .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.