IDEAS home Printed from https://ideas.repec.org/p/oxf/wpaper/735.html
   My bibliography  Save this paper

Statistical Model Selection with 'Big Data'

Author

Listed:
  • David Hendry
  • Jurgen A. Doornik

Abstract

Big Data offer potential benefits for statistical modelling, but confront problems like an excess of false positives, mistaking correlations for causes, ignoring sampling biases, and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem), using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem) while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem), using a viable approach that resolves the computational problem of immense numbers of possible models.

Suggested Citation

  • David Hendry & Jurgen A. Doornik, 2014. "Statistical Model Selection with 'Big Data'," Economics Series Working Papers 735, University of Oxford, Department of Economics.
  • Handle: RePEc:oxf:wpaper:735
    as

    Download full text from publisher

    File URL: http://www.economics.ox.ac.uk/materials/papers/13640/paper735.pdf
    Download Restriction: no

    Other versions of this item:

    References listed on IDEAS

    as
    1. David F. Hendry & Hans-Martin Krolzig, 2005. "The Properties of Automatic "GETS" Modelling," Economic Journal, Royal Economic Society, vol. 115(502), pages 32-61, March.
    2. David F. Hendry & Bent Nielsen, 2007. "Preface to Econometric Modeling: A Likelihood Approach," Introductory Chapters, in: Econometric Modeling: A Likelihood Approach, Princeton University Press.
    3. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2012. "Model selection when there are multiple breaks," Journal of Econometrics, Elsevier, vol. 169(2), pages 239-246.
    4. Godfrey, Leslie G, 1978. "Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables," Econometrica, Econometric Society, vol. 46(6), pages 1303-1310, November.
    5. David F. Hendry & Carlos Santos, 2005. "Regression Models with Data‐based Indicator Variables," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 67(5), pages 571-595, October.
    6. David Hendry & Carlos Santos, 2010. "An Automatic Test of Super Exogeneity," Economics Series Working Papers 476, University of Oxford, Department of Economics.
    7. Mizon, Grayham E & Richard, Jean-Francois, 1986. "The Encompassing Principle and Its Application to Testing Non-nested Hypotheses," Econometrica, Econometric Society, vol. 54(3), pages 657-678, May.
    8. Castle, Jennifer L. & Hendry, David F., 2010. "A low-dimension portmanteau test for non-linearity," Journal of Econometrics, Elsevier, vol. 158(2), pages 231-245, October.
    9. Salkever, David S., 1976. "The use of dummy variables to compute predictions, prediction errors, and confidence intervals," Journal of Econometrics, Elsevier, vol. 4(4), pages 393-397, November.
    10. Jurgen A. Doornik, 2008. "Encompassing and Automatic Model Selection," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 915-925, December.
    11. David F. Hendry & Bent Nielsen, 2007. "The Bernoulli model, from Econometric Modeling: A Likelihood Approach," Introductory Chapters, in: Econometric Modeling: A Likelihood Approach, Princeton University Press.
    12. Castle Jennifer L. & Doornik Jurgen A & Hendry David F., 2011. "Evaluating Automatic Model Selection," Journal of Time Series Econometrics, De Gruyter, vol. 3(1), pages 1-33, February.
    13. Hendry, David F. & Massmann, Michael, 2007. "Co-Breaking: Recent Advances and a Synopsis of the Literature," Journal of Business & Economic Statistics, American Statistical Association, vol. 25, pages 33-51, January.
    14. David Hendry & Jurgen A. Doornik & Felix Pretis, 2013. "Step-indicator Saturation," Economics Series Working Papers 658, University of Oxford, Department of Economics.
    15. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    16. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry, 2013. "Model Selection in Equations with Many ‘Small’ Effects," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 75(1), pages 6-22, February.
    17. Carlos Santos & David Hendry & Soren Johansen, 2008. "Automatic selection of indicators in a fully saturated regression," Computational Statistics, Springer, vol. 23(2), pages 317-335, April.
    18. Jurgen A. Doornik & Henrik Hansen, 2008. "An Omnibus Test for Univariate and Multivariate Normality," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 927-939, December.
    19. Hendry, David F. & Johansen, Søren, 2015. "Model Discovery And Trygve Haavelmo’S Legacy," Econometric Theory, Cambridge University Press, vol. 31(1), pages 93-114, February.
    20. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    21. Govaerts, Bernadette & Hendry, David F. & Richard, Jean-Francois, 1994. "Encompassing in stationary linear dynamic models," Journal of Econometrics, Elsevier, vol. 63(1), pages 245-270, July.
    22. Engle, Robert F, 1982. "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation," Econometrica, Econometric Society, vol. 50(4), pages 987-1007, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. David F. Hendry & Grayham E. Mizon, 2016. "Improving the teaching of econometrics," Cogent Economics & Finance, Taylor & Francis Journals, vol. 4(1), pages 1170096-117, December.
    2. Xiaoyi Han & Lung-Fei Lee, 2016. "Bayesian Analysis of Spatial Panel Autoregressive Models With Time-Varying Endogenous Spatial Weight Matrices, Common Factors, and Random Coefficients," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 642-660, October.
    3. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry, 2020. "Robust Discovery of Regression Models," Economics Papers 2020-W04, Economics Group, Nuffield College, University of Oxford.
    4. Castañeda, Gonzalo & Chávez-Juárez, Florian & Guerrero, Omar A., 2018. "How do governments determine policy priorities? Studying development strategies through spillover networks," Journal of Economic Behavior & Organization, Elsevier, vol. 154(C), pages 335-361.
    5. Schintler, Laurie A. & Fischer, Manfred M., 2018. "Big Data and Regional Science: Opportunities, Challenges, and Directions for Future Research," Working Papers in Regional Science 2018/02, WU Vienna University of Economics and Business.
    6. Jennifer Castle & Jurgen Doornik & David Hendry, 2020. "Modelling Non-stationary 'Big Data'," Economics Series Working Papers 905, University of Oxford, Department of Economics.

    More about this item

    Keywords

    Big Data; Model Selection; Location Shifts; Autometrics;

    JEL classification:

    • C51 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Construction and Estimation
    • C22 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Time-Series Models; Dynamic Quantile Regressions; Dynamic Treatment Effect Models; Diffusion Processes

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oxf:wpaper:735. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Anne Pouliquen) The email address of this maintainer does not seem to be valid anymore. Please ask Anne Pouliquen to update the entry or send us the correct email address. General contact details of provider: http://edirc.repec.org/data/sfeixuk.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.