IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2605.03997.html

Uncertainty Quantification in Forecast Comparisons

Author

Listed:
  • Marc-Oliver Pohle
  • Tanja Zahn
  • Sebastian Lerch

Abstract

Skill scores, which measure the relative improvement of a forecasting method over a benchmark via consistent scoring functions and proper scoring rules, are a standard tool in forecast evaluation, yet their sampling uncertainty is rarely rigorously quantified. With modern forecasting applications being increasingly multivariate and involving evaluations across multiple horizons, variables, spatial locations, and forecasting methods, standard tools like the pairwise Diebold-Mariano forecast accuracy test or pointwise confidence intervals fail to account for the multiple comparison problem, leading to inflated Type I error rates and invalid joint inference. To address the lack of a coherent, statistically rigorous framework for quantifying uncertainty across these multi-dimensional evaluation problems, we introduce simultaneous confidence bands for expected scores and skill scores. Our framework provides a versatile tool for joint inference that is applicable to any forecast type from mean and quantile to full distributional forecasts. We develop a bootstrap implementation and show that our bands are valid under multivariate extensions of the classical Diebold-Mariano assumptions. We demonstrate the practical utility of the approach in two case studies by quantifying the benefits of time-varying parameter models for macroeconomic forecasting, and by comparing data-driven and physics-based models in probabilistic weather forecasting.

Suggested Citation

  • Marc-Oliver Pohle & Tanja Zahn & Sebastian Lerch, 2026. "Uncertainty Quantification in Forecast Comparisons," Papers 2605.03997, arXiv.org.
  • Handle: RePEc:arx:papers:2605.03997
    as

    Download full text from publisher

    File URL: https://arxiv.org/pdf/2605.03997
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Marco Del Negro & Giorgio E. Primiceri, 2015. "Time Varying Structural Vector Autoregressions and Monetary Policy: A Corrigendum," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 82(4), pages 1342-1345.
    2. Gneiting, Tilmann, 2011. "Making and Evaluating Point Forecasts," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 746-762.
    3. Kaifeng Bi & Lingxi Xie & Hengheng Zhang & Xin Chen & Xiaotao Gu & Qi Tian, 2023. "Author Correction: Accurate medium-range global weather forecasting with 3D neural networks," Nature, Nature, vol. 621(7980), pages 45-45, September.
    4. Antonello D'Agostino & Luca Gambetti & Domenico Giannone, 2013. "Macroeconomic forecasting and structural change," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 28(1), pages 82-101, January.
    5. Alexander Dawid & Monica Musio, 2014. "Theory and applications of proper scoring rules," METRON, Springer;Sapienza Università di Roma, vol. 72(2), pages 169-183, August.
    6. Rogier Quaedvlieg, 2021. "Multi-Horizon Forecast Comparison," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 39(1), pages 40-53, January.
    7. Todd E. Clark, 2011. "Real-Time Density Forecasts From Bayesian Vector Autoregressions With Stochastic Volatility," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(3), pages 327-341, July.
    8. Fissler, Tobias & Pesenti, Silvana M., 2023. "Sensitivity measures based on scoring functions," European Journal of Operational Research, Elsevier, vol. 307(3), pages 1408-1423.
    9. Diebold, Francis X & Mariano, Roberto S, 2002. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 134-144, January.
    10. Eben Lazarus & Daniel J. Lewis & James H. Stock & Mark W. Watson, 2018. "HAR Inference: Recommendations for Practice Rejoinder," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(4), pages 574-575, October.
    11. Uwe Hassler & Marc-Oliver Pohle & Tanja Zahn, 2025. "Simultaneous Inference Bands for Autocorrelations," Papers 2503.18560, arXiv.org, revised Aug 2025.
    12. Marc-Oliver Pohle, 2020. "The Murphy Decomposition and the Calibration-Resolution Principle: A New Perspective on Forecast Evaluation," Papers 2005.01835, arXiv.org.
    13. Todd E. Clark & Francesco Ravazzolo, 2015. "Macroeconomic Forecasting Performance under Alternative Specifications of Time‐Varying Volatility," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 30(4), pages 551-575, June.
    14. José Luis Montiel Olea & Mikkel Plagborg‐Møller, 2019. "Simultaneous confidence bands: Theory, implementation, and an application to SVARs," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(1), pages 1-17, January.
    15. Francis X. Diebold, 2015. "Comparing Predictive Accuracy, Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold-Mariano Tests," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(1), pages 1-1, January.
    16. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    17. Kaifeng Bi & Lingxi Xie & Hengheng Zhang & Xin Chen & Xiaotao Gu & Qi Tian, 2023. "Accurate medium-range global weather forecasting with 3D neural networks," Nature, Nature, vol. 619(7970), pages 533-538, July.
    18. Peter R. Hansen & Asger Lunde & James M. Nason, 2011. "The Model Confidence Set," Econometrica, Econometric Society, vol. 79(2), pages 453-497, March.
    19. Eben Lazarus & Daniel J. Lewis & James H. Stock & Mark W. Watson, 2018. "HAR Inference: Recommendations for Practice," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(4), pages 541-559, October.
    20. Alexander Henzi & Johanna F. Ziegel & Tilmann Gneiting, 2021. "Isotonic distributional regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(5), pages 963-993, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David T. Frazier & Donald S. Poskitt, 2025. "Sequential Scoring Rule Evaluation for Forecast Method Selection," Papers 2505.09090, arXiv.org.
    2. Dimitrios P. Louzis, 2019. "Steady‐state modeling and macroeconomic forecasting quality," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(2), pages 285-314, March.
    3. Maximilian Boeck & Massimiliano Marcellino & Michael Pfarrhofer & Tommaso Tornese, 2024. "Predicting Tail-Risks for the Italian Economy," Journal of Business Cycle Research, Springer;Centre for International Research on Economic Tendency Surveys (CIRET), vol. 20(3), pages 339-366, November.
    4. Tallman, Ellis W. & Zaman, Saeed, 2020. "Combining survey long-run forecasts and nowcasts with BVAR forecasts using relative entropy," International Journal of Forecasting, Elsevier, vol. 36(2), pages 373-398.
    5. Alexander Henzi & Johanna F Ziegel, 2022. "Valid sequential inference on probability forecast performance [A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems]," Biometrika, Biometrika Trust, vol. 109(3), pages 647-663.
    6. Todd E. Clark & Michael W. McCracken & Elmar Mertens, 2020. "Modeling Time-Varying Uncertainty of Multiple-Horizon Forecast Errors," The Review of Economics and Statistics, MIT Press, vol. 102(1), pages 17-33, March.
    7. Magnus Reif, 2020. "Macroeconomics, Nonlinearities, and the Business Cycle," ifo Beiträge zur Wirtschaftsforschung, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, number 87, April.
    8. Todd E. Clark & Florian Huber & Gary Koop & Massimiliano Marcellino & Michael Pfarrhofer, 2023. "Tail Forecasting With Multivariate Bayesian Additive Regression Trees," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 64(3), pages 979-1022, August.
    9. Karlsson, Sune & Mazur, Stepan & Nguyen, Hoang, 2023. "Vector autoregression models with skewness and heavy tails," Journal of Economic Dynamics and Control, Elsevier, vol. 146(C).
    10. Berg, Tim O. & Henzel, Steffen R., 2015. "Point and density forecasts for the euro area using Bayesian VARs," International Journal of Forecasting, Elsevier, vol. 31(4), pages 1067-1095.
    11. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    12. Marta Banbura & Andries van Vlodrop, 2018. "Forecasting with Bayesian Vector Autoregressions with Time Variation in the Mean," Tinbergen Institute Discussion Papers 18-025/IV, Tinbergen Institute.
    13. Barbara Rossi, 2021. "Forecasting in the Presence of Instabilities: How We Know Whether Models Predict Well and How to Improve Them," Journal of Economic Literature, American Economic Association, vol. 59(4), pages 1135-1190, December.
    14. Ravazzolo Francesco & Rothman Philip, 2016. "Oil-price density forecasts of US GDP," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 20(4), pages 441-453, September.
    15. Yu Bai & Andrea Carriero & Todd E. Clark & Massimiliano Marcellino, 2022. "Macroeconomic forecasting in a multi‐country context," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1230-1255, September.
    16. Malte Knuppel & Fabian Kruger & Marc-Oliver Pohle, 2022. "Score-based calibration testing for multivariate forecast distributions," Papers 2211.16362, arXiv.org, revised Dec 2023.
    17. Arias, Jonas E. & Rubio-Ramírez, Juan F. & Shin, Minchul, 2023. "Macroeconomic forecasting and variable ordering in multivariate stochastic volatility models," Journal of Econometrics, Elsevier, vol. 235(2), pages 1054-1086.
    18. Martin, Gael M. & Loaiza-Maya, Rubén & Maneesoonthorn, Worapree & Frazier, David T. & Ramírez-Hassan, Andrés, 2022. "Optimal probabilistic forecasts: When do they work?," International Journal of Forecasting, Elsevier, vol. 38(1), pages 384-406.
    19. Markus Heinrich & Magnus Reif, 2020. "Real-Time Forecasting Using Mixed-Frequency VARS with Time-Varying Parameters," CESifo Working Paper Series 8054, CESifo.
    20. Pablo Guerróon‐Quintana & Molin Zhong, 2023. "Macroeconomic forecasting in times of crises," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(3), pages 295-320, April.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2605.03997. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.