IDEAS home Printed from https://ideas.repec.org/p/oxf/wpaper/735.html
   My bibliography  Save this paper

Statistical Model Selection with 'Big Data'

Author

Listed:
  • David Hendry
  • Jurgen A. Doornik

Abstract

Big Data offer potential benefits for statistical modelling, but confront problems like an excess of false positives, mistaking correlations for causes, ignoring sampling biases, and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem), using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem) while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem), using a viable approach that resolves the computational problem of immense numbers of possible models.

Suggested Citation

  • David Hendry & Jurgen A. Doornik, 2014. "Statistical Model Selection with 'Big Data'," Economics Series Working Papers 735, University of Oxford, Department of Economics.
  • Handle: RePEc:oxf:wpaper:735
    as

    Download full text from publisher

    File URL: https://ora.ox.ac.uk/objects/uuid:9ccd177f-9a53-4b4f-bc27-2387c69e9ae4
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Hendry, David F. & Johansen, Søren, 2015. "Model Discovery And Trygve Haavelmo’S Legacy," Econometric Theory, Cambridge University Press, vol. 31(1), pages 93-114, February.
    2. David F. Hendry & Hans-Martin Krolzig, 2005. "The Properties of Automatic "GETS" Modelling," Economic Journal, Royal Economic Society, vol. 115(502), pages 32-61, March.
    3. David F. Hendry & Bent Nielsen, 2007. "Preface to Econometric Modeling: A Likelihood Approach," Introductory Chapters, in: Econometric Modeling: A Likelihood Approach, Princeton University Press.
    4. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2012. "Model selection when there are multiple breaks," Journal of Econometrics, Elsevier, vol. 169(2), pages 239-246.
    5. Godfrey, Leslie G, 1978. "Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables," Econometrica, Econometric Society, vol. 46(6), pages 1303-1310, November.
    6. David F. Hendry & Carlos Santos, 2005. "Regression Models with Data‐based Indicator Variables," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 67(5), pages 571-595, October.
    7. David Hendry & Carlos Santos, 2010. "An Automatic Test of Super Exogeneity," Economics Series Working Papers 476, University of Oxford, Department of Economics.
    8. Mizon, Grayham E & Richard, Jean-Francois, 1986. "The Encompassing Principle and Its Application to Testing Non-nested Hypotheses," Econometrica, Econometric Society, vol. 54(3), pages 657-678, May.
    9. Castle, Jennifer L. & Hendry, David F., 2010. "A low-dimension portmanteau test for non-linearity," Journal of Econometrics, Elsevier, vol. 158(2), pages 231-245, October.
    10. Salkever, David S., 1976. "The use of dummy variables to compute predictions, prediction errors, and confidence intervals," Journal of Econometrics, Elsevier, vol. 4(4), pages 393-397, November.
    11. Jurgen A. Doornik, 2008. "Encompassing and Automatic Model Selection," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 915-925, December.
    12. David F. Hendry & Bent Nielsen, 2007. "The Bernoulli model, from Econometric Modeling: A Likelihood Approach," Introductory Chapters, in: Econometric Modeling: A Likelihood Approach, Princeton University Press.
    13. Castle Jennifer L. & Doornik Jurgen A & Hendry David F., 2011. "Evaluating Automatic Model Selection," Journal of Time Series Econometrics, De Gruyter, vol. 3(1), pages 1-33, February.
    14. Hendry, David F. & Massmann, Michael, 2007. "Co-Breaking: Recent Advances and a Synopsis of the Literature," Journal of Business & Economic Statistics, American Statistical Association, vol. 25, pages 33-51, January.
    15. David Hendry & Jurgen A. Doornik & Felix Pretis, 2013. "Step-indicator Saturation," Economics Series Working Papers 658, University of Oxford, Department of Economics.
    16. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    17. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry, 2013. "Model Selection in Equations with Many ‘Small’ Effects," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 75(1), pages 6-22, February.
    18. Carlos Santos & David Hendry & Soren Johansen, 2008. "Automatic selection of indicators in a fully saturated regression," Computational Statistics, Springer, vol. 23(2), pages 317-335, April.
    19. Jurgen A. Doornik & Henrik Hansen, 2008. "An Omnibus Test for Univariate and Multivariate Normality," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 927-939, December.
    20. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    21. Govaerts, Bernadette & Hendry, David F. & Richard, Jean-Francois, 1994. "Encompassing in stationary linear dynamic models," Journal of Econometrics, Elsevier, vol. 63(1), pages 245-270, July.
    22. Haldrup, Niels & Meitz, Mika & Saikkonen, Pentti (ed.), 2014. "Essays in Nonlinear Time Series Econometrics," OUP Catalogue, Oxford University Press, number 9780199679959.
    23. Engle, Robert F, 1982. "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation," Econometrica, Econometric Society, vol. 50(4), pages 987-1007, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2021. "Modelling non-stationary ‘Big Data’," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1556-1575.
    2. Rahul Verma & Rajesh Mohnot, 2023. "Relative Impact of the U.S. Energy Market Sentiments on Stocks and ESG Index Returns: Evidence from GCC Countries," International Journal of Energy Economics and Policy, Econjournals, vol. 13(2), pages 290-300, March.
    3. Sara Muhammadullah & Amena Urooj & Faridoon Khan, 2021. "A revisit of the unemployment rate, interest rate, GDP growth and Inflation of Pakistan: Whether Structural break or unit root?," iRASD Journal of Economics, International Research Alliance for Sustainable Development (iRASD), vol. 3(2), pages 80-92, September.
    4. Bantis, Evripidis & Clements, Michael P. & Urquhart, Andrew, 2023. "Forecasting GDP growth rates in the United States and Brazil using Google Trends," International Journal of Forecasting, Elsevier, vol. 39(4), pages 1909-1924.
    5. Verhagen, Mark D., 2021. "Identifying and Improving Functional Form Complexity: A Machine Learning Framework," SocArXiv bka76, Center for Open Science.
    6. Iregui, Ana María & Núñez, Héctor M. & Otero, Jesús, 2021. "Testing the efficiency of inflation and exchange rate forecast revisions in a changing economic environment," Journal of Economic Behavior & Organization, Elsevier, vol. 187(C), pages 290-314.
    7. David F. Hendry & Grayham E. Mizon, 2016. "Improving the teaching of econometrics," Cogent Economics & Finance, Taylor & Francis Journals, vol. 4(1), pages 1170096-117, December.
    8. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    9. Xiaoyi Han & Lung-Fei Lee, 2016. "Bayesian Analysis of Spatial Panel Autoregressive Models With Time-Varying Endogenous Spatial Weight Matrices, Common Factors, and Random Coefficients," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 642-660, October.
    10. Bennedsen, Mikkel & Hillebrand, Eric & Koopman, Siem Jan, 2021. "Modeling, forecasting, and nowcasting U.S. CO2 emissions using many macroeconomic predictors," Energy Economics, Elsevier, vol. 96(C).
    11. Sabyasachi Kar & Amaani Bashir & Mayank Jain, 2021. "New Approaches to Forecasting Growth and Inflation: Big Data and Machine Learning," IEG Working Papers 446, Institute of Economic Growth.
    12. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2023. "Robust Discovery of Regression Models," Econometrics and Statistics, Elsevier, vol. 26(C), pages 31-51.
    13. Cai, Zhengzheng & Zhu, Yanli & Han, Xiaoyi, 2022. "Bayesian analysis of spatial dynamic panel data model with convex combinations of different spatial weight matrices: A reparameterized approach," Economics Letters, Elsevier, vol. 217(C).
    14. Castañeda, Gonzalo & Chávez-Juárez, Florian & Guerrero, Omar A., 2018. "How do governments determine policy priorities? Studying development strategies through spillover networks," Journal of Economic Behavior & Organization, Elsevier, vol. 154(C), pages 335-361.
    15. Schintler, Laurie A. & Fischer, Manfred M., 2018. "Big Data and Regional Science: Opportunities, Challenges, and Directions for Future Research," Working Papers in Regional Science 2018/02, WU Vienna University of Economics and Business.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David F. Hendry & Grayham E. Mizon, 2016. "Improving the teaching of econometrics," Cogent Economics & Finance, Taylor & Francis Journals, vol. 4(1), pages 1170096-117, December.
    2. David F. Hendry & Felix Pretis, 2013. "Anthropogenic influences on atmospheric CO2," Chapters, in: Roger Fouquet (ed.), Handbook on Energy and Climate Change, chapter 12, pages 287-326, Edward Elgar Publishing.
    3. Jennifer Castle & David Hendry, 2013. "Semi-automatic Non-linear Model selection," Economics Series Working Papers 654, University of Oxford, Department of Economics.
    4. Jennifer Castle & David Hendry, 2010. "Automatic Selection for Non-linear Models," Economics Series Working Papers 473, University of Oxford, Department of Economics.
    5. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2023. "Robust Discovery of Regression Models," Econometrics and Statistics, Elsevier, vol. 26(C), pages 31-51.
    6. Ericsson Neil R., 2016. "Testing for and estimating structural breaks and other nonlinearities in a dynamic monetary sector," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 20(4), pages 377-398, September.
    7. Andrew B. Martinez, 2020. "Forecast Accuracy Matters for Hurricane Damage," Econometrics, MDPI, vol. 8(2), pages 1-24, May.
    8. Stillwagon, Josh R., 2016. "Non-linear exchange rate relationships: An automated model selection approach with indicator saturation," The North American Journal of Economics and Finance, Elsevier, vol. 37(C), pages 84-109.
    9. Castle, Jennifer L. & Hendry, David F., 2014. "Model selection in under-specified equations facing breaks," Journal of Econometrics, Elsevier, vol. 178(P2), pages 286-293.
    10. Castle, Jennifer L. & Hendry, David F., 2009. "The long-run determinants of UK wages, 1860-2004," Journal of Macroeconomics, Elsevier, vol. 31(1), pages 5-28, March.
    11. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry & Ragnar Nymoen, 2014. "Misspecification Testing: Non-Invariance of Expectations Models of Inflation," Econometric Reviews, Taylor & Francis Journals, vol. 33(5-6), pages 553-574, August.
    12. Josh R. Stillwagon, 2015. "TIPS and the VIX: Non-linear Spillovers from Financial Panic to Breakeven Inflation," Working Papers 1502, Trinity College, Department of Economics.
    13. Hendry, David F., 2018. "Deciding between alternative approaches in macroeconomics," International Journal of Forecasting, Elsevier, vol. 34(1), pages 119-135.
    14. Jennifer L. Castle & David F. Hendry & Andrew B. Martinez, 2022. "The Historical Role of Energy in UK Inflation and Productivity and Implications for Price Inflation in 2022," Working Papers 2022-001, The George Washington University, Department of Economics, H. O. Stekler Research Program on Forecasting.
    15. Hendry David F & Mizon Grayham E, 2011. "Econometric Modelling of Time Series with Outlying Observations," Journal of Time Series Econometrics, De Gruyter, vol. 3(1), pages 1-26, February.
    16. David Hendry & Carlos Santos, 2010. "An Automatic Test of Super Exogeneity," Economics Series Working Papers 476, University of Oxford, Department of Economics.
    17. Neil R. Ericsson, 2021. "Dynamic Econometrics in Action: A Biography of David F. Hendry," International Finance Discussion Papers 1311, Board of Governors of the Federal Reserve System (U.S.).
    18. Hendry, David F. & Mizon, Grayham E., 2014. "Unpredictability in economic analysis, econometric modeling and forecasting," Journal of Econometrics, Elsevier, vol. 182(1), pages 186-195.
    19. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2021. "Modelling non-stationary ‘Big Data’," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1556-1575.
    20. Sullivan Hué, 2022. "GAM(L)A: An econometric model for interpretable machine learning," French Stata Users' Group Meetings 2022 19, Stata Users Group.

    More about this item

    Keywords

    Big Data; Model Selection; Location Shifts; Autometrics;
    All these keywords.

    JEL classification:

    • C51 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Construction and Estimation
    • C22 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Time-Series Models; Dynamic Quantile Regressions; Dynamic Treatment Effect Models; Diffusion Processes

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oxf:wpaper:735. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Anne Pouliquen (email available below). General contact details of provider: https://edirc.repec.org/data/sfeixuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.