IDEAS home Printed from https://ideas.repec.org/p/zbw/iwqwdp/112019.html
   My bibliography  Save this paper

A comparison of machine learning model validation schemes for non-stationary time series data

Author

Listed:
  • Schnaubelt, Matthias

Abstract

Machine learning is increasingly applied to time series data, as it constitutes an attractive alternative to forecasts based on traditional time series models. For independent and identically distributed observations, cross-validation is the prevalent scheme for estimating out-of-sample performance in both model selection and assessment. For time series data, however, it is unclear whether forwardvalidation schemes, i.e., schemes that keep the temporal order of observations, should be preferred. In this paper, we perform a comprehensive empirical study of eight common validation schemes. We introduce a study design that perturbs global stationarity by introducing a slow evolution of the underlying data-generating process. Our results demonstrate that, even for relatively small perturbations, commonly used cross-validation schemes often yield estimates with the largest bias and variance, and forward-validation schemes yield better estimates of the out-of-sample error. We provide an interpretation of these results in terms of an additional evolution-induced bias and the sample-size dependent estimation error. Using a large-scale financial data set, we demonstrate the practical significance in a replication study of a statistical arbitrage problem. We conclude with some general guidelines on the selection of suitable validation schemes for time series data.

Suggested Citation

  • Schnaubelt, Matthias, 2019. "A comparison of machine learning model validation schemes for non-stationary time series data," FAU Discussion Papers in Economics 11/2019, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
  • Handle: RePEc:zbw:iwqwdp:112019
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/209136/1/1684440068.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bergmeir, Christoph & Hyndman, Rob J. & Koo, Bonsoo, 2018. "A note on the validity of cross-validation for evaluating autoregressive time series prediction," Computational Statistics & Data Analysis, Elsevier, vol. 120(C), pages 70-83.
    2. Andreou, Panayiotis C. & Charalambous, Chris & Martzoukos, Spiros H., 2008. "Pricing and trading European options by combining artificial neural networks and parametric models with implied parameters," European Journal of Operational Research, Elsevier, vol. 185(3), pages 1415-1433, March.
    3. Dominique Guégan, 2007. "Global and local stationary modelling in finance: theory and empirical evidence," Documents de travail du Centre d'Economie de la Sorbonne b07053, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    4. J. Scott Armstrong & Michael C. Grohman, 1972. "A Comparative Study of Methods for Long-Range Market Forecasting," Management Science, INFORMS, vol. 19(2), pages 211-221, October.
    5. Chandler, Gabriel, 2010. "Order selection for heteroscedastic autoregression: A study on concentration," Statistics & Probability Letters, Elsevier, vol. 80(23-24), pages 1904-1910, December.
    6. Callen, Jeffrey L. & Kwan, Clarence C. Y. & Yip, Patrick C. Y. & Yuan, Yufei, 1996. "Neural network forecasting of quarterly accounting earnings," International Journal of Forecasting, Elsevier, vol. 12(4), pages 475-482, December.
    7. Fischer, Thomas & Krauss, Christopher & Treichel, Alex, 2018. "Machine learning for time series forecasting - a simulation study," FAU Discussion Papers in Economics 02/2018, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    8. Rainer Dahlhaus & Liudas Giraitis, 1998. "On the Optimal Segment Length for Parameter Estimates for Locally Stationary Time Series," Journal of Time Series Analysis, Wiley Blackwell, vol. 19(6), pages 629-655, November.
    9. Bergmeir, Christoph & Costantini, Mauro & Benítez, José M., 2014. "On the usefulness of cross-validation for directional forecast evaluation," Computational Statistics & Data Analysis, Elsevier, vol. 76(C), pages 132-143.
    10. Dominique Guegan, 2007. "Global and local stationary modelling in finance: theory and empirical evidence," Post-Print halshs-00187875, HAL.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Filip Stanek, 2021. "Optimal Out-of-Sample Forecast Evaluation under Stationarity," CERGE-EI Working Papers wp712, The Center for Economic Research and Graduate Education - Economics Institute, Prague.
    2. Chatum Sankalpa & Somsak Kittipiyakul & Seksan Laitrakun, 2022. "Forecasting Short-Term Electricity Load Using Validated Ensemble Learning," Energies, MDPI, vol. 15(22), pages 1-30, November.
    3. Sabyasachi Kar & Amaani Bashir & Mayank Jain, 2021. "New Approaches to Forecasting Growth and Inflation: Big Data and Machine Learning," IEG Working Papers 446, Institute of Economic Growth.
    4. Schnaubelt, Matthias, 2020. "Deep reinforcement learning for the optimal placement of cryptocurrency limit orders," FAU Discussion Papers in Economics 05/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    5. Fabian Waldow & Matthias Schnaubelt & Christopher Krauss & Thomas Günter Fischer, 2021. "Machine Learning in Futures Markets," JRFM, MDPI, vol. 14(3), pages 1-14, March.
    6. Jian Guo & Saizhuo Wang & Lionel M. Ni & Heung-Yeung Shum, 2022. "Quant 4.0: Engineering Quantitative Investment with Automated, Explainable and Knowledge-driven Artificial Intelligence," Papers 2301.04020, arXiv.org.
    7. Rahman, Md Mamunur & Nguyen, Ruby & Lu, Liang, 2022. "Multi-level impacts of climate change and supply disruption events on a potato supply chain: An agent-based modeling approach," Agricultural Systems, Elsevier, vol. 201(C).
    8. Schnaubelt, Matthias & Seifert, Oleg, 2020. "Valuation ratios, surprises, uncertainty or sentiment: How does financial machine learning predict returns from earnings announcements?," FAU Discussion Papers in Economics 04/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fischer, Thomas & Krauss, Christopher & Treichel, Alex, 2018. "Machine learning for time series forecasting - a simulation study," FAU Discussion Papers in Economics 02/2018, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    2. Zachary F. Fisher & Younghoon Kim & Barbara L. Fredrickson & Vladas Pipiras, 2022. "Penalized Estimation and Forecasting of Multiple Subject Intensive Longitudinal Data," Psychometrika, Springer;The Psychometric Society, vol. 87(2), pages 1-29, June.
    3. Michael D. Hunter & Haya Fatimah & Marina A. Bornovalova, 2022. "Two Filtering Methods of Forecasting Linear and Nonlinear Dynamics of Intensive Longitudinal Data," Psychometrika, Springer;The Psychometric Society, vol. 87(2), pages 477-505, June.
    4. Filip Stanek, 2021. "Optimal Out-of-Sample Forecast Evaluation under Stationarity," CERGE-EI Working Papers wp712, The Center for Economic Research and Graduate Education - Economics Institute, Prague.
    5. Filip Staněk, 2023. "Optimal out‐of‐sample forecast evaluation under stationarity," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(8), pages 2249-2279, December.
    6. Tashman, Leonard J., 2000. "Out-of-sample tests of forecasting accuracy: an analysis and review," International Journal of Forecasting, Elsevier, vol. 16(4), pages 437-450.
    7. Qi Guo & Bruno Remillard & Anatoliy Swishchuk, 2020. "Multivariate General Compound Point Processes in Limit Order Books," Risks, MDPI, vol. 8(3), pages 1-20, September.
    8. Mariana Oliveira & Luís Torgo & Vítor Santos Costa, 2021. "Evaluation Procedures for Forecasting with Spatiotemporal Data," Mathematics, MDPI, vol. 9(6), pages 1-27, March.
    9. Alessandro Casini & Pierre Perron, 2021. "Change-Point Analysis of Time Series with Evolutionary Spectra," Papers 2106.02031, arXiv.org, revised Jun 2021.
    10. Philippe Goulet Coulombe & Maxime Leroux & Dalibor Stevanovic & Stéphane Surprenant, 2022. "How is machine learning useful for macroeconomic forecasting?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(5), pages 920-964, August.
    11. Olson, Dennis & Mossman, Charles, 2003. "Neural network forecasts of Canadian stock returns using accounting ratios," International Journal of Forecasting, Elsevier, vol. 19(3), pages 453-465.
    12. Hewamalage, Hansika & Bergmeir, Christoph & Bandara, Kasun, 2021. "Recurrent Neural Networks for Time Series Forecasting: Current status and future directions," International Journal of Forecasting, Elsevier, vol. 37(1), pages 388-427.
    13. Gary S. Anderson & Alena Audzeyeva, 2019. "A Coherent Framework for Predicting Emerging Market Credit Spreads with Support Vector Regression," Finance and Economics Discussion Series 2019-074, Board of Governors of the Federal Reserve System (U.S.).
    14. Thomas Despois & Catherine Doz, 2022. "Identifying and interpreting the factors in factor models via sparsity : Different approaches," Working Papers halshs-03626503, HAL.
    15. Ioannis Kyriakou & Parastoo Mousavi & Jens Perch Nielsen & Michael Scholz, 2021. "Short-Term Exuberance and Long-Term Stability: A Simultaneous Optimization of Stock Return Predictions for Short and Long Horizons," Mathematics, MDPI, vol. 9(6), pages 1-19, March.
    16. Richard Schnorrenberger & Aishameriane Schmidt & Guilherme Valle Moura, 2024. "Harnessing Machine Learning for Real-Time Inflation Nowcasting," Working Papers 806, DNB.
    17. Armstrong, J Scott, 1978. "Forecasting with Econometric Methods: Folklore versus Fact," The Journal of Business, University of Chicago Press, vol. 51(4), pages 549-564, October.
    18. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.
    19. Huber, Jakob & Stuckenschmidt, Heiner, 2020. "Daily retail demand forecasting using machine learning with emphasis on calendric special days," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1420-1438.
    20. Georgia Koppe & Hazem Toutounji & Peter Kirsch & Stefanie Lis & Daniel Durstewitz, 2019. "Identifying nonlinear dynamical systems via generative recurrent neural networks with applications to fMRI," PLOS Computational Biology, Public Library of Science, vol. 15(8), pages 1-35, August.

    More about this item

    Keywords

    machine learning; model selection; model validation; time series; cross-validation;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:iwqwdp:112019. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/vierlde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.