IDEAS home Printed from https://ideas.repec.org/p/oxf/wpaper/905.html
   My bibliography  Save this paper

Modelling Non-stationary 'Big Data'

Author

Listed:
  • Jennifer Castle
  • Jurgen Doornik
  • David Hendry

Abstract

Seeking substantive relationships among vast numbers of spurious connections when modelling Big Data requires an appropriate approach. Big Data are useful if they can increase the probability that the data generation process is nested in the postulated model, increase the power of specification and mis-specification tests, and yet do not raise the chances of adventitious significance. Simply choosing the best-fitting equation or trying hundreds of empirical fits and selecting a preferred one–perhaps contradicted by others that go unreported–is not going to lead to a useful outcome. Wide-sense non-stationarity (including both distributional shifts and integrated data) must be taken into account. The paper discusses the use of principal components analysis to identify cointegrating relations as a route to handling that aspect of non-stationary big data, along with saturation to handle distributional shifts, and models the monthly UK unemployment rate, using both macroeconomic and Google Trends data, searching over 3000 explanatory variables and yet identifying a parsimonious, well-specified and theoretically interpretable model specification.

Suggested Citation

  • Jennifer Castle & Jurgen Doornik & David Hendry, 2020. "Modelling Non-stationary 'Big Data'," Economics Series Working Papers 905, University of Oxford, Department of Economics.
  • Handle: RePEc:oxf:wpaper:905
    as

    Download full text from publisher

    File URL: https://www.economics.ox.ac.uk/materials/working_papers/5299/905.pdf
    Download Restriction: no

    References listed on IDEAS

    as
    1. Banerjee, Anindya & Marcellino, Massimiliano & Masten, Igor, 2014. "Forecasting with factor-augmented error correction models," International Journal of Forecasting, Elsevier, vol. 30(3), pages 589-612.
    2. Dijkstra, Theo, 1983. "Some comments on maximum likelihood and partial least squares methods," Journal of Econometrics, Elsevier, vol. 22(1-2), pages 67-90.
    3. Johansen, Soren, 1992. "Determination of Cointegration Rank in the Presence of a Linear Trend," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 54(3), pages 383-397, August.
    4. Takamitsu Kurita & Bent Nielsen, 2019. "Partial Cointegrated Vector Autoregressive Models with Structural Breaks in Deterministic Terms," Econometrics, MDPI, Open Access Journal, vol. 7(4), pages 1-35, October.
    5. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry & Felix Pretis, 2015. "Detecting Location Shifts during Model Selection by Step-Indicator Saturation," Econometrics, MDPI, Open Access Journal, vol. 3(2), pages 1-25, April.
    6. Norman R. Swanson & Weiqi Xiong, 2018. "Big data analytics in economics: What have we learned so far, and where should we go from here?," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 51(3), pages 695-746, August.
    7. Godfrey, Leslie G, 1978. "Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables," Econometrica, Econometric Society, vol. 46(6), pages 1303-1310, November.
    8. Harbo, Ingrid, et al, 1998. "Asymptotic Inference on Cointegrating Rank in Partial Systems," Journal of Business & Economic Statistics, American Statistical Association, vol. 16(4), pages 388-399, October.
    9. Søren Johansen & Rocco Mosconi & Bent Nielsen, 2000. "Cointegration analysis in the presence of structural breaks in the deterministic trend," Econometrics Journal, Royal Economic Society, vol. 3(2), pages 216-249.
    10. David F. Hendry, 2004. "The Nobel Memorial Prize for Clive W. J. Granger," Scandinavian Journal of Economics, Wiley Blackwell, vol. 106(2), pages 187-213, June.
    11. Johansen, Soren, 1995. "Likelihood-Based Inference in Cointegrated Vector Autoregressive Models," OUP Catalogue, Oxford University Press, number 9780198774501.
    12. Castle, Jennifer L. & Clements, Michael P. & Hendry, David F., 2013. "Forecasting by factors, by variables, by both or neither?," Journal of Econometrics, Elsevier, vol. 177(2), pages 305-319.
    13. Ben S. Bernanke & Jean Boivin & Piotr Eliasz, 2005. "Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach," The Quarterly Journal of Economics, Oxford University Press, vol. 120(1), pages 387-422.
    14. Susan Athey, 2018. "The Impact of Machine Learning on Economics," NBER Chapters, in: The Economics of Artificial Intelligence: An Agenda, pages 507-547, National Bureau of Economic Research, Inc.
    15. David F. Hendry, 2001. "Modelling UK inflation, 1875-1991," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 16(3), pages 255-275.
    16. Mario Forni & Marc Hallin & Marco Lippi & Lucrezia Reichlin, 2000. "The Generalized Dynamic-Factor Model: Identification And Estimation," The Review of Economics and Statistics, MIT Press, vol. 82(4), pages 540-554, November.
    17. Sims, Christopher A & Stock, James H & Watson, Mark W, 1990. "Inference in Linear Time Series Models with Some Unit Roots," Econometrica, Econometric Society, vol. 58(1), pages 113-144, January.
    18. Harris, David, 1997. "Principal Components Analysis of Cointegrated Time Series," Econometric Theory, Cambridge University Press, vol. 13(4), pages 529-557, February.
    19. Jurgen A. Doornik & David F. Hendry & Steve Cook, 2015. "Statistical model selection with “Big Data”," Cogent Economics & Finance, Taylor & Francis Journals, vol. 3(1), pages 1045216-104, December.
    20. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    21. Bai, Jushan, 2004. "Estimating cross-section common stochastic trends in nonstationary panel data," Journal of Econometrics, Elsevier, vol. 122(1), pages 137-183, September.
    22. Jurgen A. Doornik & Henrik Hansen, 2008. "An Omnibus Test for Univariate and Multivariate Normality," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 927-939, December.
    23. Søren Johansen & Rocco Mosconi & Bent Nielsen, 2000. "Cointegration analysis in the presence of structural breaks in the deterministic trend," Econometrics Journal, Royal Economic Society, vol. 3(2), pages 216-249.
    Full references (including those not matched with items on IDEAS)

    More about this item

    Keywords

    Cointegration; Big Data; Model Selection; Outliers; Indicator Saturation; Autometrics;

    JEL classification:

    • C51 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Construction and Estimation
    • Q54 - Agricultural and Natural Resource Economics; Environmental and Ecological Economics - - Environmental Economics - - - Climate; Natural Disasters and their Management; Global Warming

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oxf:wpaper:905. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Anne Pouliquen) The email address of this maintainer does not seem to be valid anymore. Please ask Anne Pouliquen to update the entry or send us the correct email address. General contact details of provider: http://edirc.repec.org/data/sfeixuk.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.