IDEAS home Printed from https://ideas.repec.org/a/taf/oaefxx/v3y2015i1p1045216.html
   My bibliography  Save this article

Statistical model selection with “Big Data”

Author

Listed:
  • Jurgen A. Doornik
  • David F. Hendry
  • Steve Cook

Abstract

Big Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, ignoring sampling biases and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem), using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem) while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem), using a viable approach that resolves the computational problem of immense numbers of possible models.

Suggested Citation

  • Jurgen A. Doornik & David F. Hendry & Steve Cook, 2015. "Statistical model selection with “Big Data”," Cogent Economics & Finance, Taylor & Francis Journals, vol. 3(1), pages 1045216-104, December.
  • Handle: RePEc:taf:oaefxx:v:3:y:2015:i:1:p:1045216
    DOI: 10.1080/23322039.2015.1045216
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/23322039.2015.1045216
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/23322039.2015.1045216?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. Castle Jennifer L. & Doornik Jurgen A & Hendry David F., 2011. "Evaluating Automatic Model Selection," Journal of Time Series Econometrics, De Gruyter, vol. 3(1), pages 1-33, February.
    2. Hendry, David F. & Massmann, Michael, 2007. "Co-Breaking: Recent Advances and a Synopsis of the Literature," Journal of Business & Economic Statistics, American Statistical Association, vol. 25, pages 33-51, January.
    3. David Hendry & Jurgen A. Doornik & Felix Pretis, 2013. "Step-indicator Saturation," Economics Series Working Papers 658, University of Oxford, Department of Economics.
    4. Hendry, David F. & Johansen, Søren, 2015. "Model Discovery And Trygve Haavelmo’S Legacy," Econometric Theory, Cambridge University Press, vol. 31(1), pages 93-114, February.
    5. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    6. David F. Hendry & Hans-Martin Krolzig, 2005. "The Properties of Automatic "GETS" Modelling," Economic Journal, Royal Economic Society, vol. 115(502), pages 32-61, March.
    7. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry, 2013. "Model Selection in Equations with Many ‘Small’ Effects," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 75(1), pages 6-22, February.
    8. David F. Hendry & Bent Nielsen, 2007. "Preface to Econometric Modeling: A Likelihood Approach," Introductory Chapters, in: Econometric Modeling: A Likelihood Approach, Princeton University Press.
    9. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2012. "Model selection when there are multiple breaks," Journal of Econometrics, Elsevier, vol. 169(2), pages 239-246.
    10. Carlos Santos & David Hendry & Soren Johansen, 2008. "Automatic selection of indicators in a fully saturated regression," Computational Statistics, Springer, vol. 23(2), pages 317-335, April.
    11. Godfrey, Leslie G, 1978. "Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables," Econometrica, Econometric Society, vol. 46(6), pages 1303-1310, November.
    12. Jurgen A. Doornik & Henrik Hansen, 2008. "An Omnibus Test for Univariate and Multivariate Normality," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 927-939, December.
    13. David F. Hendry & Carlos Santos, 2005. "Regression Models with Data‐based Indicator Variables," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 67(5), pages 571-595, October.
    14. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    15. Govaerts, Bernadette & Hendry, David F. & Richard, Jean-Francois, 1994. "Encompassing in stationary linear dynamic models," Journal of Econometrics, Elsevier, vol. 63(1), pages 245-270, July.
    16. David Hendry & Carlos Santos, 2010. "An Automatic Test of Super Exogeneity," Economics Series Working Papers 476, University of Oxford, Department of Economics.
    17. Mizon, Grayham E & Richard, Jean-Francois, 1986. "The Encompassing Principle and Its Application to Testing Non-nested Hypotheses," Econometrica, Econometric Society, vol. 54(3), pages 657-678, May.
    18. Castle, Jennifer L. & Hendry, David F., 2010. "A low-dimension portmanteau test for non-linearity," Journal of Econometrics, Elsevier, vol. 158(2), pages 231-245, October.
    19. Salkever, David S., 1976. "The use of dummy variables to compute predictions, prediction errors, and confidence intervals," Journal of Econometrics, Elsevier, vol. 4(4), pages 393-397, November.
    20. Jurgen A. Doornik, 2008. "Encompassing and Automatic Model Selection," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 915-925, December.
    21. David F. Hendry & Bent Nielsen, 2007. "The Bernoulli model, from Econometric Modeling: A Likelihood Approach," Introductory Chapters, in: Econometric Modeling: A Likelihood Approach, Princeton University Press.
    22. Engle, Robert F, 1982. "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation," Econometrica, Econometric Society, vol. 50(4), pages 987-1007, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bennedsen, Mikkel & Hillebrand, Eric & Koopman, Siem Jan, 2021. "Modeling, forecasting, and nowcasting U.S. CO2 emissions using many macroeconomic predictors," Energy Economics, Elsevier, vol. 96(C).
    2. Castañeda, Gonzalo & Chávez-Juárez, Florian & Guerrero, Omar A., 2018. "How do governments determine policy priorities? Studying development strategies through spillover networks," Journal of Economic Behavior & Organization, Elsevier, vol. 154(C), pages 335-361.
    3. Iregui, Ana María & Núñez, Héctor M. & Otero, Jesús, 2021. "Testing the efficiency of inflation and exchange rate forecast revisions in a changing economic environment," Journal of Economic Behavior & Organization, Elsevier, vol. 187(C), pages 290-314.
    4. David F. Hendry & Grayham E. Mizon, 2016. "Improving the teaching of econometrics," Cogent Economics & Finance, Taylor & Francis Journals, vol. 4(1), pages 1170096-117, December.
    5. Xiaoyi Han & Lung-Fei Lee, 2016. "Bayesian Analysis of Spatial Panel Autoregressive Models With Time-Varying Endogenous Spatial Weight Matrices, Common Factors, and Random Coefficients," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 642-660, October.
    6. Schintler, Laurie A. & Fischer, Manfred M., 2018. "Big Data and Regional Science: Opportunities, Challenges, and Directions for Future Research," Working Papers in Regional Science 2018/02, WU Vienna University of Economics and Business.
    7. Jennifer Castle & Jurgen Doornik & David Hendry, 2020. "Modelling Non-stationary 'Big Data'," Economics Series Working Papers 905, University of Oxford, Department of Economics.
    8. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry, 2020. "Robust Discovery of Regression Models," Economics Papers 2020-W04, Economics Group, Nuffield College, University of Oxford.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David F. Hendry & Grayham E. Mizon, 2016. "Improving the teaching of econometrics," Cogent Economics & Finance, Taylor & Francis Journals, vol. 4(1), pages 1170096-117, December.
    2. David F. Hendry & Felix Pretis, 2013. "Anthropogenic influences on atmospheric CO2," Chapters, in: Roger Fouquet (ed.), Handbook on Energy and Climate Change, chapter 12, pages 287-326, Edward Elgar Publishing.
    3. Jennifer Castle & David Hendry, 2010. "Automatic Selection for Non-linear Models," Economics Series Working Papers 473, University of Oxford, Department of Economics.
    4. Stillwagon, Josh R., 2016. "Non-linear exchange rate relationships: An automated model selection approach with indicator saturation," The North American Journal of Economics and Finance, Elsevier, vol. 37(C), pages 84-109.
    5. Jennifer Castle & David Hendry, 2013. "Semi-automatic Non-linear Model selection," Economics Series Working Papers 654, University of Oxford, Department of Economics.
    6. Josh R. Stillwagon, 2015. "TIPS and the VIX: Non-linear Spillovers from Financial Panic to Breakeven Inflation," Working Papers 1502, Trinity College, Department of Economics.
    7. Hendry, David F., 2018. "Deciding between alternative approaches in macroeconomics," International Journal of Forecasting, Elsevier, vol. 34(1), pages 119-135.
    8. Hendry David F & Mizon Grayham E, 2011. "Econometric Modelling of Time Series with Outlying Observations," Journal of Time Series Econometrics, De Gruyter, vol. 3(1), pages 1-26, February.
    9. Ericsson Neil R., 2016. "Testing for and estimating structural breaks and other nonlinearities in a dynamic monetary sector," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 20(4), pages 377-398, September.
    10. Neil R. Ericsson, 2021. "Dynamic Econometrics in Action: A Biography of David F. Hendry," International Finance Discussion Papers 1311, Board of Governors of the Federal Reserve System (U.S.).
    11. Hendry, David F. & Mizon, Grayham E., 2014. "Unpredictability in economic analysis, econometric modeling and forecasting," Journal of Econometrics, Elsevier, vol. 182(1), pages 186-195.
    12. Andrew B. Martinez, 2020. "Forecast Accuracy Matters for Hurricane Damage," Econometrics, MDPI, Open Access Journal, vol. 8(2), pages 1-24, May.
    13. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry, 2020. "Robust Discovery of Regression Models," Economics Papers 2020-W04, Economics Group, Nuffield College, University of Oxford.
    14. Castle, Jennifer L. & Hendry, David F., 2014. "Model selection in under-specified equations facing breaks," Journal of Econometrics, Elsevier, vol. 178(P2), pages 286-293.
    15. David Hendry & Carlos Santos, 2010. "An Automatic Test of Super Exogeneity," Economics Series Working Papers 476, University of Oxford, Department of Economics.
    16. Castle, Jennifer L. & Hendry, David F., 2009. "The long-run determinants of UK wages, 1860-2004," Journal of Macroeconomics, Elsevier, vol. 31(1), pages 5-28, March.
    17. Ericsson, Neil R., 2016. "Eliciting GDP forecasts from the FOMC’s minutes around the financial crisis," International Journal of Forecasting, Elsevier, vol. 32(2), pages 571-583.
    18. Søren Johansen & Lukasz Gatarek, 2014. "Optimal hedging with the cointegrated vector autoregressive model," CREATES Research Papers 2014-40, Department of Economics and Business Economics, Aarhus University.
    19. Søren Johansen & Bent Nielsen, 2014. "Outlier detection algorithms for least squares time series regression," Economics Papers 2014-W04, Economics Group, Nuffield College, University of Oxford.
    20. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2012. "Model selection when there are multiple breaks," Journal of Econometrics, Elsevier, vol. 169(2), pages 239-246.

    More about this item

    JEL classification:

    • C51 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Construction and Estimation
    • C22 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Time-Series Models; Dynamic Quantile Regressions; Dynamic Treatment Effect Models; Diffusion Processes

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:oaefxx:v:3:y:2015:i:1:p:1045216. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: . General contact details of provider: http://www.tandfonline.com/OAEF20 .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/OAEF20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.