Valid sequential inference on probability forecast performance<BR>[A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems]

Valid sequential inference on probability forecast performance
[A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems]

Author

Listed:

Alexander Henzi
Johanna F Ziegel

Abstract

SummaryProbability forecasts for binary events play a central role in many applications. Their quality is commonly assessed with proper scoring rules, which assign forecasts numerical scores such that a correct forecast achieves a minimal expected score. In this paper, we construct e-values for testing the statistical significance of score differences of competing forecasts in sequential settings. E-values have been proposed as an alternative to -values for hypothesis testing, and they can easily be transformed into conservative -values by taking the multiplicative inverse. The e-values proposed in this article are valid in finite samples without any assumptions on the data-generating processes. They also allow optional stopping, so a forecast user may decide to interrupt evaluation, taking into account the available data at any time, and still draw statistically valid inference, which is generally not true for classical -value-based tests. In a case study on post-processing of precipitation forecasts, state-of-the-art forecast dominance tests and e-values lead to the same conclusions.

Suggested Citation

Alexander Henzi & Johanna F Ziegel, 2022. "Valid sequential inference on probability forecast performance [A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems]," Biometrika, Biometrika Trust, vol. 109(3), pages 647-663.

Handle: RePEc:oup:biomet:v:109:y:2022:i:3:p:647-663.

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Gneiting, Tilmann, 2011. "Making and Evaluating Point Forecasts," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 746-762.
Werner Ehm & Tilmann Gneiting & Alexander Jordan & Fabian Krüger, 2016. "Of quantiles and expectiles: consistent scoring functions, Choquet representations and forecast rankings," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(3), pages 505-562, June.
Alexander Henzi & Johanna F. Ziegel & Tilmann Gneiting, 2021. "Isotonic distributional regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(5), pages 963-993, November.
Andrew J. Patton, 2020. "Comparing Possibly Misspecified Forecasts," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 38(4), pages 796-809, October.
Diebold, Francis X & Mariano, Roberto S, 2002. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 134-144, January.
- Diebold, Francis X & Mariano, Roberto S, 1995. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 13(3), pages 253-263, July.
- Francis X. Diebold & Roberto S. Mariano, 1994. "Comparing Predictive Accuracy," NBER Technical Working Papers 0169, National Bureau of Economic Research, Inc.
- Tom Doan, 2025. "DMARIANO: RATS procedure to compute Diebold-Mariano Forecast Comparison Test," Statistical Software Components RTS00055, Boston College Department of Economics.
Raffaella Giacomini & Halbert White, 2006. "Tests of Conditional Predictive Ability," Econometrica, Econometric Society, vol. 74(6), pages 1545-1578, November.
- Giacomini, Raffaella & White, Halbert, 2003. "Tests of Conditional Predictive Ability," University of California at San Diego, Economics Working Paper Series qt5jk0j5jh, Department of Economics, UC San Diego.
- Raffaella Giacomini & Halbert White, 2003. "Tests of Conditional Predictive Ability," Econometrics 0308001, University Library of Munich, Germany.
- Raffaella Giacomini & Halbert White, 2003. "Tests of conditional predictive ability," Boston College Working Papers in Economics 572, Boston College Department of Economics.
Eben Lazarus & Daniel J. Lewis & James H. Stock & Mark W. Watson, 2018. "HAR Inference: Recommendations for Practice," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(4), pages 541-559, October.
Roopesh Ranjan & Tilmann Gneiting, 2010. "Combining probability forecasts," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(1), pages 71-91, January.
Tilmann Gneiting & Fadoua Balabdaoui & Adrian E. Raftery, 2007. "Probabilistic forecasts, calibration and sharpness," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(2), pages 243-268, April.
R. Winkler & Javier Muñoz & José Cervera & José Bernardo & Gail Blattenberger & Joseph Kadane & Dennis Lindley & Allan Murphy & Robert Oliver & David Ríos-Insua, 1996. "Scoring rules and the evaluation of probabilities," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 5(1), pages 1-60, June.
Eben Lazarus & Daniel J. Lewis & James H. Stock & Mark W. Watson, 2018. "HAR Inference: Recommendations for Practice Rejoinder," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(4), pages 574-575, October.
Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
Yen, Yu-Min & Yen, Tso-Jung, 2021. "Testing forecast accuracy of expectiles and quantiles with the extremal consistent loss functions," International Journal of Forecasting, Elsevier, vol. 37(2), pages 733-758.
Glenn Shafer, 2021. "Testing by betting: A strategy for statistical and scientific communication," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(2), pages 407-431, April.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Di Leonforte, Davide Carmelo & Deliu, Nina, 2025. "Dynamic testing of volatility models’ calibration using E-values," Statistics & Probability Letters, Elsevier, vol. 226(C).
Yunda Hao & Peter Grünwald & Tyron Lardy & Long Long & Reuben Adams, 2024. "E-values for k-Sample Tests with Exponential Families," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 86(1), pages 596-636, February.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

David T. Frazier & Donald S. Poskitt, 2025. "Sequential Scoring Rule Evaluation for Forecast Method Selection," Papers 2505.09090, arXiv.org.
Fissler Tobias & Ziegel Johanna F., 2021. "On the elicitability of range value at risk," Statistics & Risk Modeling, De Gruyter, vol. 38(1-2), pages 25-46, January.
Martin, Gael M. & Loaiza-Maya, Rubén & Maneesoonthorn, Worapree & Frazier, David T. & Ramírez-Hassan, Andrés, 2022. "Optimal probabilistic forecasts: When do they work?," International Journal of Forecasting, Elsevier, vol. 38(1), pages 384-406.
- Ruben Loaiza-Maya & Gael M. Martin & David T. Frazier & Worapree Maneesoonthorn & Andres Ramirez Hassan, 2020. "Optimal probabilistic forecasts: When do they work?," Monash Econometrics and Business Statistics Working Papers 33/20, Monash University, Department of Econometrics and Business Statistics.
- Gael M. Martin & Rub'en Loaiza-Maya & David T. Frazier & Worapree Maneesoonthorn & Andr'es Ram'irez Hassan, 2020. "Optimal probabilistic forecasts: When do they work?," Papers 2009.09592, arXiv.org.
Chen, Yi-Ting & Liu, Chu-An & Su, Jiun-Hua, 2025. "Bregman model averaging for forecast combination," Journal of Econometrics, Elsevier, vol. 251(C).
Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
- Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
Tobias Fissler & Yannick Hoga, 2024. "How to Compare Copula Forecasts?," Papers 2410.04165, arXiv.org.
Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
- Kajal Lahiri & Liu Yang, 2012. "Forecasting Binary Outcomes," Discussion Papers 12-09, University at Albany, SUNY, Department of Economics.
Hajo Holzmann & Matthias Eulert, 2014. "The role of the information set for forecasting - with applications to risk management," Papers 1404.7653, arXiv.org.
Tobias Fissler & Jana Hlavinová & Birgit Rudloff, 2021. "Elicitability and identifiability of set-valued measures of systemic risk," Finance and Stochastics, Springer, vol. 25(1), pages 133-165, January.
Yen, Yu-Min & Yen, Tso-Jung, 2021. "Testing forecast accuracy of expectiles and quantiles with the extremal consistent loss functions," International Journal of Forecasting, Elsevier, vol. 37(2), pages 733-758.
Fritzsch, Simon & Timphus, Maike & Weiß, Gregor, 2024. "Marginals versus copulas: Which account for more model risk in multivariate risk forecasting?," Journal of Banking & Finance, Elsevier, vol. 158(C).
Marc-Oliver Pohle, 2020. "The Murphy Decomposition and the Calibration-Resolution Principle: A New Perspective on Forecast Evaluation," Papers 2005.01835, arXiv.org.
Onno Kleen, 2024. "Scaling and measurement error sensitivity of scoring rules for distribution forecasts," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(5), pages 833-849, August.
Coroneo, Laura & Iacone, Fabrizio & Profumo, Fabio, 2024. "Survey density forecast comparison in small samples," International Journal of Forecasting, Elsevier, vol. 40(4), pages 1486-1504.
Tobias Fissler & Yannick Hoga, 2021. "Backtesting Systemic Risk Forecasts using Multi-Objective Elicitability," Papers 2104.10673, arXiv.org, revised Feb 2022.
Knüppel, Malte & Schultefrankenfeld, Guido, 2019. "Assessing the uncertainty in central banks’ inflation outlooks," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1748-1769.
- Knüppel, Malte & Schultefrankenfeld, Guido, 2018. "Assessing the uncertainty in central banks' inflation outlooks," Discussion Papers 56/2018, Deutsche Bundesbank.
Oguzhan Akgun & Alain Pirotte & Giovanni Urga & Zhenlin Yang, 2025. "Testing Clustered Equal Predictive Ability with Unknown Clusters," Papers 2507.14621, arXiv.org, revised Jul 2025.
Yang, Dazhi & van der Meer, Dennis, 2021. "Post-processing in solar forecasting: Ten overarching thinking tools," Renewable and Sustainable Energy Reviews, Elsevier, vol. 140(C).
Granziera, Eleonora & Sekhposyan, Tatevik, 2019. "Predicting relative forecasting performance: An empirical investigation," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1636-1657.
- Granziera, Eleonora & Sekhposyan, Tatevik, 2018. "Predicting relative forecasting performance: An empirical investigation," Bank of Finland Research Discussion Papers 23/2018, Bank of Finland.
Luisa Bisaglia & Matteo Grigoletto, 2021. "A new time-varying model for forecasting long-memory series," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 139-155, March.

More about this item

Keywords

; ; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:biomet:v:109:y:2022:i:3:p:647-663.. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/biomet .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Valid sequential inference on probability forecast performance
[A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems]

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data

Browse Econ Literature

More features

Valid sequential inference on probability forecast performance[A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems]

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data

Valid sequential inference on probability forecast performance
[A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems]