IDEAS home Printed from https://ideas.repec.org/p/boc/bocoec/1101.html

fev-bench: A Realistic Benchmark for Time Series Forecasting

Author

Listed:
  • Oleksandr Shchur

    (AWS)

  • Abdul Fatir Ansari

    (AWS)

  • Caner Turkmen

    (AWS)

  • Lorenzo Stella

    (AWS)

  • Nick Erickson

    (AWS)

  • Pablo Guerron-Quintana

    (Boston College
    Boston College)

  • Michael Bohlke-Schneider

    (AWS)

  • Yuyang Wang

    (AWS)

Abstract

Benchmark quality is critical for meaningful evaluation and sustained progress in time series forecasting, particularly given the recent rise of pre-trained models. Existing benchmarks often have narrow domain coverage or overlook important real-world settings, such as tasks with covariates. Additionally, their aggregation procedures often lack statistical rigor, making it unclear whether observed performance differences reflect true improvements or random variation. Many benchmarks also fail to provide infrastructure for consistent evaluation or are too rigid to integrate into existing pipelines. To address these gaps, we propose fev-bench, a benchmark comprising 100 forecasting tasks across seven domains, including 46 tasks with covariates. Supporting the benchmark, we introduce fev, a lightweight Python library for benchmarking forecasting models that emphasizes reproducibility and seamless integration with existing workflows. Using fev, fev-bench employs principled aggregation methods with bootstrapped confidence intervals to report model performance along two complementary dimensions: win rates and skill scores. We report results on fev-bench for various pre-trained, statistical and baseline models, and identify promising directions for future research.

Suggested Citation

  • Oleksandr Shchur & Abdul Fatir Ansari & Caner Turkmen & Lorenzo Stella & Nick Erickson & Pablo Guerron-Quintana & Michael Bohlke-Schneider & Yuyang Wang, 2025. "fev-bench: A Realistic Benchmark for Time Series Forecasting," Boston College Working Papers in Economics 1101, Boston College Department of Economics.
  • Handle: RePEc:boc:bocoec:1101
    as

    Download full text from publisher

    File URL: http://fmwww.bc.edu/EC-P/wp1101.pdf
    File Function: main text
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Michael W. McCracken & Serena Ng, 2016. "FRED-MD: A Monthly Database for Macroeconomic Research," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 574-589, October.
    2. Christiano, Lawrence J. & Eichenbaum, Martin & Evans, Charles L., 1999. "Monetary policy shocks: What have we learned and to what end?," Handbook of Macroeconomics, in: J. B. Taylor & M. Woodford (ed.), Handbook of Macroeconomics, edition 1, volume 1, chapter 2, pages 65-148, Elsevier.
    3. Petropoulos, Fotios & Svetunkov, Ivan, 2020. "A simple combination of univariate models," International Journal of Forecasting, Elsevier, vol. 36(1), pages 110-115.
    4. Iain Staffell & Stefan Pfenninger & Nathan Johnson, 2023. "A global model of hourly space heating and cooling demand at multiple spatial scales," Nature Energy, Nature, vol. 8(12), pages 1328-1344, December.
    5. Athanasopoulos, George & Ahmed, Roman A. & Hyndman, Rob J., 2009. "Hierarchical forecasts for Australian domestic tourism," International Journal of Forecasting, Elsevier, vol. 25(1), pages 146-166.
    6. Bojer, Casper Solheim & Meldgaard, Jens Peder, 2021. "Kaggle forecasting competitions: An overlooked learning opportunity," International Journal of Forecasting, Elsevier, vol. 37(2), pages 587-603.
    7. Fildes, Robert & Kolassa, Stephan & Ma, Shaohui, 2022. "Post-script—Retail forecasting: Research and practice," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1319-1324.
    8. Salinas, David & Flunkert, Valentin & Gasthaus, Jan & Januschowski, Tim, 2020. "DeepAR: Probabilistic forecasting with autoregressive recurrent networks," International Journal of Forecasting, Elsevier, vol. 36(3), pages 1181-1191.
    9. Fildes, Robert & Ma, Shaohui & Kolassa, Stephan, 2022. "Retail forecasting: Research and practice," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1283-1318.
    10. Lago, Jesus & Marcjasz, Grzegorz & De Schutter, Bart & Weron, Rafał, 2021. "Forecasting day-ahead electricity prices: A review of state-of-the-art algorithms, best practices and an open-access benchmark," Applied Energy, Elsevier, vol. 293(C).
    11. Hyndman, Rob J. & Koehler, Anne B., 2006. "Another look at measures of forecast accuracy," International Journal of Forecasting, Elsevier, vol. 22(4), pages 679-688.
    12. Makridakis, Spyros & Spiliotis, Evangelos & Assimakopoulos, Vassilios, 2020. "The M4 Competition: 100,000 time series and 61 forecasting methods," International Journal of Forecasting, Elsevier, vol. 36(1), pages 54-74.
    13. Christoph Bergmeir, 2023. "Common Pitfalls and Better Practices in Forecast Evaluation for Data Scientists," Foresight: The International Journal of Applied Forecasting, International Institute of Forecasters, issue 70, pages 5-12, Q3.
    14. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    15. Oleksandr Shchur & Abdul Fatir Ansari & Caner Turkmen & Lorenzo Stella & Nick Erickson & Pablo Guerron-Quintana & Michael Bohlke-Schneider & Yuyang Wang, 2025. "fev-bench: A Realistic Benchmark for Time Series Forecasting," Boston College Working Papers in Economics 1101, Boston College Department of Economics.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Oleksandr Shchur & Abdul Fatir Ansari & Caner Turkmen & Lorenzo Stella & Nick Erickson & Pablo Guerron-Quintana & Michael Bohlke-Schneider & Yuyang Wang, 2025. "fev-bench: A Realistic Benchmark for Time Series Forecasting," Boston College Working Papers in Economics 1101, Boston College Department of Economics.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Makridakis, Spyros & Spiliotis, Evangelos & Assimakopoulos, Vassilios & Chen, Zhi & Gaba, Anil & Tsetlin, Ilia & Winkler, Robert L., 2022. "The M5 uncertainty competition: Results, findings and conclusions," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1365-1385.
    2. Long, Xueying & Bui, Quang & Oktavian, Grady & Schmidt, Daniel F. & Bergmeir, Christoph & Godahewa, Rakshitha & Lee, Seong Per & Zhao, Kaifeng & Condylis, Paul, 2025. "Scalable probabilistic forecasting in retail with gradient boosted trees: A practitioner’s approach," International Journal of Production Economics, Elsevier, vol. 279(C).
    3. Makridakis, Spyros & Spiliotis, Evangelos & Assimakopoulos, Vassilios, 2022. "M5 accuracy competition: Results, findings, and conclusions," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1346-1364.
    4. Spiliotis, Evangelos & Petropoulos, Fotios, 2024. "On the update frequency of univariate forecasting models," European Journal of Operational Research, Elsevier, vol. 314(1), pages 111-121.
    5. Marco Zanotti, 2025. "Do global forecasting models require frequent retraining?," Working Papers 551, University of Milano-Bicocca, Department of Economics.
    6. Marco Zanotti, 2025. "The cost of ensembling: is it always worth combining?," Working Papers 554, University of Milano-Bicocca, Department of Economics.
    7. Wellens, Arnoud P. & Boute, Robert N. & Udenio, Maximiliano, 2024. "Simplifying tree-based methods for retail sales forecasting with explanatory variables," European Journal of Operational Research, Elsevier, vol. 314(2), pages 523-539.
    8. Theodorou, Evangelos & Wang, Shengjie & Kang, Yanfei & Spiliotis, Evangelos & Makridakis, Spyros & Assimakopoulos, Vassilios, 2022. "Exploring the representativeness of the M5 competition data," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1500-1506.
    9. Kang, Yanfei & Cao, Wei & Petropoulos, Fotios & Li, Feng, 2022. "Forecast with forecasts: Diversity matters," European Journal of Operational Research, Elsevier, vol. 301(1), pages 180-190.
    10. Marco Zanotti, 2025. "On the stability of global forecasting models," Working Papers 553, University of Milano-Bicocca, Department of Economics.
    11. Olivares, Kin G. & Challu, Cristian & Marcjasz, Grzegorz & Weron, Rafał & Dubrawski, Artur, 2023. "Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx," International Journal of Forecasting, Elsevier, vol. 39(2), pages 884-900.
    12. Petropoulos, Fotios & Spiliotis, Evangelos, 2025. "Judgmental selection of parameters for simple forecasting models," European Journal of Operational Research, Elsevier, vol. 323(3), pages 780-794.
    13. Katarzyna Maciejowska & Bartosz Uniejewski & Rafa{l} Weron, 2022. "Forecasting Electricity Prices," Papers 2204.11735, arXiv.org.
    14. Athanasopoulos, George & Kourentzes, Nikolaos, 2023. "On the evaluation of hierarchical forecasts," International Journal of Forecasting, Elsevier, vol. 39(4), pages 1502-1511.
    15. Bojer, Casper Solheim & Meldgaard, Jens Peder, 2021. "Kaggle forecasting competitions: An overlooked learning opportunity," International Journal of Forecasting, Elsevier, vol. 37(2), pages 587-603.
    16. Philippe Goulet Coulombe & Mikael Frenette & Karin Klieber, 2023. "From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks," Papers 2311.16333, arXiv.org, revised Apr 2024.
    17. Serafin, Tomasz & Weron, Rafał, 2025. "Loss functions in regression models: Impact on profits and risk in day-ahead electricity trading," Energy Economics, Elsevier, vol. 148(C).
    18. Bojer, Casper Solheim, 2022. "Understanding machine learning-based forecasting methods: A decomposition framework and research opportunities," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1555-1561.
    19. Wang, Xiaoqian & Hyndman, Rob J. & Li, Feng & Kang, Yanfei, 2023. "Forecast combinations: An over 50-year review," International Journal of Forecasting, Elsevier, vol. 39(4), pages 1518-1547.
    20. Makridakis, Spyros & Hyndman, Rob J. & Petropoulos, Fotios, 2020. "Forecasting in social settings: The state of the art," International Journal of Forecasting, Elsevier, vol. 36(1), pages 15-28.

    More about this item

    Keywords

    ;
    ;
    ;

    JEL classification:

    • F34 - International Economics - - International Finance - - - International Lending and Debt Problems
    • C78 - Mathematical and Quantitative Methods - - Game Theory and Bargaining Theory - - - Bargaining Theory; Matching Theory
    • E32 - Macroeconomics and Monetary Economics - - Prices, Business Fluctuations, and Cycles - - - Business Fluctuations; Cycles

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:bocoec:1101. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F Baum (email available below). General contact details of provider: https://edirc.repec.org/data/debocus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.