IDEAS home Printed from https://ideas.repec.org/p/oxf/wpaper/905.html
   My bibliography  Save this paper

Modelling Non-stationary 'Big Data'

Author

Listed:
  • Jennifer Castle
  • Jurgen Doornik
  • David Hendry

Abstract

Seeking substantive relationships among vast numbers of spurious connections when modelling Big Data requires an appropriate approach. Big Data are useful if they can increase the probability that the data generation process is nested in the postulated model, increase the power of specification and mis-specification tests, and yet do not raise the chances of adventitious significance. Simply choosing the best-fitting equation or trying hundreds of empirical fits and selecting a preferred one–perhaps contradicted by others that go unreported–is not going to lead to a useful outcome. Wide-sense non-stationarity (including both distributional shifts and integrated data) must be taken into account. The paper discusses the use of principal components analysis to identify cointegrating relations as a route to handling that aspect of non-stationary big data, along with saturation to handle distributional shifts, and models the monthly UK unemployment rate, using both macroeconomic and Google Trends data, searching over 3000 explanatory variables and yet identifying a parsimonious, well-specified and theoretically interpretable model specification.

Suggested Citation

  • Jennifer Castle & Jurgen Doornik & David Hendry, 2020. "Modelling Non-stationary 'Big Data'," Economics Series Working Papers 905, University of Oxford, Department of Economics.
  • Handle: RePEc:oxf:wpaper:905
    as

    Download full text from publisher

    File URL: https://ora.ox.ac.uk/objects/uuid:fdad75eb-06e4-4a2c-8401-0b8236cec292
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Banerjee, Anindya & Marcellino, Massimiliano & Masten, Igor, 2014. "Forecasting with factor-augmented error correction models," International Journal of Forecasting, Elsevier, vol. 30(3), pages 589-612.
    2. Dijkstra, Theo, 1983. "Some comments on maximum likelihood and partial least squares methods," Journal of Econometrics, Elsevier, vol. 22(1-2), pages 67-90.
    3. Johansen, Soren, 1992. "Determination of Cointegration Rank in the Presence of a Linear Trend," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 54(3), pages 383-397, August.
    4. Hendry, David F. & Johansen, Søren, 2015. "Model Discovery And Trygve Haavelmo’S Legacy," Econometric Theory, Cambridge University Press, vol. 31(1), pages 93-114, February.
    5. Takamitsu Kurita & Bent Nielsen, 2019. "Partial Cointegrated Vector Autoregressive Models with Structural Breaks in Deterministic Terms," Econometrics, MDPI, Open Access Journal, vol. 7(4), pages 1-35, October.
    6. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry & Felix Pretis, 2015. "Detecting Location Shifts during Model Selection by Step-Indicator Saturation," Econometrics, MDPI, Open Access Journal, vol. 3(2), pages 1-25, April.
    7. Norman R. Swanson & Weiqi Xiong, 2018. "Big data analytics in economics: What have we learned so far, and where should we go from here?," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 51(3), pages 695-746, August.
    8. Godfrey, Leslie G, 1978. "Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables," Econometrica, Econometric Society, vol. 46(6), pages 1303-1310, November.
    9. Harbo, Ingrid, et al, 1998. "Asymptotic Inference on Cointegrating Rank in Partial Systems," Journal of Business & Economic Statistics, American Statistical Association, vol. 16(4), pages 388-399, October.
    10. Søren Johansen & Rocco Mosconi & Bent Nielsen, 2000. "Cointegration analysis in the presence of structural breaks in the deterministic trend," Econometrics Journal, Royal Economic Society, vol. 3(2), pages 216-249.
    11. David F. Hendry, 2004. "The Nobel Memorial Prize for Clive W. J. Granger," Scandinavian Journal of Economics, Wiley Blackwell, vol. 106(2), pages 187-213, June.
    12. Johansen, Soren, 1995. "Likelihood-Based Inference in Cointegrated Vector Autoregressive Models," OUP Catalogue, Oxford University Press, number 9780198774501.
    13. Castle, Jennifer L. & Clements, Michael P. & Hendry, David F., 2013. "Forecasting by factors, by variables, by both or neither?," Journal of Econometrics, Elsevier, vol. 177(2), pages 305-319.
    14. Ben S. Bernanke & Jean Boivin & Piotr Eliasz, 2005. "Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach," The Quarterly Journal of Economics, Oxford University Press, vol. 120(1), pages 387-422.
    15. Susan Athey, 2018. "The Impact of Machine Learning on Economics," NBER Chapters, in: The Economics of Artificial Intelligence: An Agenda, pages 507-547, National Bureau of Economic Research, Inc.
    16. David F. Hendry, 2001. "Modelling UK inflation, 1875-1991," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 16(3), pages 255-275.
    17. Mario Forni & Marc Hallin & Marco Lippi & Lucrezia Reichlin, 2000. "The Generalized Dynamic-Factor Model: Identification And Estimation," The Review of Economics and Statistics, MIT Press, vol. 82(4), pages 540-554, November.
    18. Sims, Christopher A & Stock, James H & Watson, Mark W, 1990. "Inference in Linear Time Series Models with Some Unit Roots," Econometrica, Econometric Society, vol. 58(1), pages 113-144, January.
    19. Harris, David, 1997. "Principal Components Analysis of Cointegrated Time Series," Econometric Theory, Cambridge University Press, vol. 13(4), pages 529-557, February.
    20. Jurgen A. Doornik & David F. Hendry & Steve Cook, 2015. "Statistical model selection with “Big Data”," Cogent Economics & Finance, Taylor & Francis Journals, vol. 3(1), pages 1045216-104, December.
    21. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    22. Bai, Jushan, 2004. "Estimating cross-section common stochastic trends in nonstationary panel data," Journal of Econometrics, Elsevier, vol. 122(1), pages 137-183, September.
    23. Jurgen A. Doornik & Henrik Hansen, 2008. "An Omnibus Test for Univariate and Multivariate Normality," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 927-939, December.
    24. Søren Johansen & Rocco Mosconi & Bent Nielsen, 2000. "Cointegration analysis in the presence of structural breaks in the deterministic trend," Econometrics Journal, Royal Economic Society, vol. 3(2), pages 216-249.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David F. Hendry & Grayham E. Mizon, 2016. "Improving the teaching of econometrics," Cogent Economics & Finance, Taylor & Francis Journals, vol. 4(1), pages 1170096-117, December.
    2. Jennifer Castle & Takamitsu Kurita, 2019. "Modelling and forecasting the dollar-pound exchange rate in the presence of structural breaks," Economics Series Working Papers 866, University of Oxford, Department of Economics.
    3. Castle, Jennifer L. & Kurita, Takamitsu, 2021. "A dynamic econometric analysis of the dollar-pound exchange rate in an era of structural breaks and policy regime shifts," Journal of Economic Dynamics and Control, Elsevier, vol. 128(C).
    4. Takamitsu Kurita & B. Nielsen, 2018. "Partial cointegrated vector autoregressive models with structural breaks in deterministic terms," Economics Papers 2018-W03, Economics Group, Nuffield College, University of Oxford.
    5. Banerjee, Anindya & Marcellino, Massimiliano & Masten, Igor, 2014. "Forecasting with factor-augmented error correction models," International Journal of Forecasting, Elsevier, vol. 30(3), pages 589-612.
    6. Stillwagon, Josh R., 2016. "Non-linear exchange rate relationships: An automated model selection approach with indicator saturation," The North American Journal of Economics and Finance, Elsevier, vol. 37(C), pages 84-109.
    7. Takamitsu Kurita & Bent Nielsen, 2019. "Partial Cointegrated Vector Autoregressive Models with Structural Breaks in Deterministic Terms," Econometrics, MDPI, Open Access Journal, vol. 7(4), pages 1-35, October.
    8. Carlomagno, Guillermo & Espasa, Antoni, 2014. "The pairwise approach to model a large set of disaggregates with common trends," DES - Working Papers. Statistics and Econometrics. WS ws141309, Universidad Carlos III de Madrid. Departamento de Estadística.
    9. Jurgen A. Doornik & David F. Hendry & Steve Cook, 2015. "Statistical model selection with “Big Data”," Cogent Economics & Finance, Taylor & Francis Journals, vol. 3(1), pages 1045216-104, December.
    10. Barigozzi, Matteo & Lippi, Marco & Luciani, Matteo, 2021. "Large-dimensional Dynamic Factor Models: Estimation of Impulse–Response Functions with I(1) cointegrated factors," Journal of Econometrics, Elsevier, vol. 221(2), pages 455-482.
    11. David H. Bernstein & Bent Nielsen, 2019. "Asymptotic Theory for Cointegration Analysis When the Cointegration Rank Is Deficient," Econometrics, MDPI, Open Access Journal, vol. 7(1), pages 1-24, January.
    12. Ericsson, Neil R., 2016. "Eliciting GDP forecasts from the FOMC’s minutes around the financial crisis," International Journal of Forecasting, Elsevier, vol. 32(2), pages 571-583.
    13. Rangan Gupta & Alain Kabundi & Stephen Miller & Josine Uwilingiye, 2014. "Using large data sets to forecast sectoral employment," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 23(2), pages 229-264, June.
    14. Ericsson, Neil R & Hendry, David F & Mizon, Grayham E, 1998. "Exogeneity, Cointegration, and Economic Policy Analysis," Journal of Business & Economic Statistics, American Statistical Association, vol. 16(4), pages 370-387, October.
    15. Andrea Carriero & Francesco Corsello & Massimiliano Marcellino, 2018. "The global component of inflation volatility," Temi di discussione (Economic working papers) 1170, Bank of Italy, Economic Research and International Relations Area.
    16. Ådne Cappelen & Torbjørn Eika, 2017. "Immigration and the Dutch disease. A counterfactual analysis of the Norwegian resource boom 2004-2013," Discussion Papers 860, Statistics Norway, Research Department.
    17. Charles Rahal, 2015. "Housing Market Forecasting with Factor Combinations," Discussion Papers 15-05, Department of Economics, University of Birmingham.
    18. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry, 2020. "Robust Discovery of Regression Models," Economics Papers 2020-W04, Economics Group, Nuffield College, University of Oxford.
    19. Kalaitzi, Athanasia S. & Chamberlain, Trevor W., 2020. "Merchandise exports and economic growth: multivariate time series analysis for the United Arab Emirates," LSE Research Online Documents on Economics 103781, London School of Economics and Political Science, LSE Library.
    20. Charles Rahal, 2015. "Housing Market Forecasting with Factor Combinations," Discussion Papers 15-05r, Department of Economics, University of Birmingham.

    More about this item

    Keywords

    Cointegration; Big Data; Model Selection; Outliers; Indicator Saturation; Autometrics;
    All these keywords.

    JEL classification:

    • C51 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Construction and Estimation
    • Q54 - Agricultural and Natural Resource Economics; Environmental and Ecological Economics - - Environmental Economics - - - Climate; Natural Disasters and their Management; Global Warming

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oxf:wpaper:905. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: . General contact details of provider: https://edirc.repec.org/data/sfeixuk.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Anne Pouliquen The email address of this maintainer does not seem to be valid anymore. Please ask Anne Pouliquen to update the entry or send us the correct address (email available below). General contact details of provider: https://edirc.repec.org/data/sfeixuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.